Skip to content

Blog

MongoDB Atlas Search and Full-Text Indexing: SQL-Style Text Search with Advanced Analytics and Ranking

Modern applications require sophisticated search capabilities that go beyond simple text matching - semantic understanding, relevance scoring, faceted search, auto-completion, and real-time search analytics. Traditional relational databases provide basic full-text search through extensions like PostgreSQL's pg_trgm or MySQL's MATCH AGAINST, but struggle with advanced search features, relevance ranking, and the performance demands of modern search applications.

MongoDB Atlas Search provides enterprise-grade search capabilities built on Apache Lucene, delivering advanced full-text search, semantic search, vector search, and search analytics directly integrated with your MongoDB data. Unlike external search engines that require complex data synchronization, Atlas Search maintains real-time consistency with your database while providing powerful search features typically found only in dedicated search platforms.

The Traditional Search Challenge

Relational database search approaches have significant limitations for modern applications:

-- Traditional SQL full-text search - limited and inefficient

-- PostgreSQL full-text search approach
CREATE TABLE articles (
    article_id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    content TEXT NOT NULL,
    author_id INTEGER REFERENCES users(user_id),
    category VARCHAR(100),
    tags TEXT[],
    published_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    view_count INTEGER DEFAULT 0,

    -- Full-text search vectors
    title_tsvector TSVECTOR,
    content_tsvector TSVECTOR,
    combined_tsvector TSVECTOR
);

-- Create full-text search indexes
CREATE INDEX idx_articles_title_fts ON articles USING GIN(title_tsvector);
CREATE INDEX idx_articles_content_fts ON articles USING GIN(content_tsvector);
CREATE INDEX idx_articles_combined_fts ON articles USING GIN(combined_tsvector);

-- Maintain search vectors with triggers
CREATE OR REPLACE FUNCTION update_article_search_vectors()
RETURNS TRIGGER AS $$
BEGIN
    NEW.title_tsvector := to_tsvector('english', NEW.title);
    NEW.content_tsvector := to_tsvector('english', NEW.content);
    NEW.combined_tsvector := to_tsvector('english', 
        NEW.title || ' ' || NEW.content || ' ' || array_to_string(NEW.tags, ' '));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_update_search_vectors
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION update_article_search_vectors();

-- Basic full-text search query
SELECT 
    a.article_id,
    a.title,
    a.published_date,
    a.view_count,

    -- Simple relevance ranking
    ts_rank(a.combined_tsvector, query) as relevance_score,

    -- Highlight search terms (basic)
    ts_headline('english', a.content, query, 
        'MaxWords=50, MinWords=10, ShortWord=3') as snippet

FROM articles a,
     plainto_tsquery('english', 'machine learning algorithms') as query
WHERE a.combined_tsvector @@ query
ORDER BY ts_rank(a.combined_tsvector, query) DESC
LIMIT 20;

-- Problems with traditional full-text search:
-- 1. Limited language support and stemming capabilities
-- 2. Basic relevance scoring without advanced ranking factors
-- 3. No semantic understanding or synonym handling
-- 4. Limited faceting and aggregation capabilities
-- 5. Poor auto-completion and suggestion features
-- 6. No built-in analytics or search performance metrics
-- 7. Complex maintenance of search vectors and triggers
-- 8. Limited scalability for large document collections

-- MySQL full-text search (even more limited)
CREATE TABLE documents (
    doc_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255),
    content LONGTEXT,
    category VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FULLTEXT(title, content)
) ENGINE=InnoDB;

-- Basic MySQL full-text search
SELECT 
    doc_id,
    title,
    created_at,
    MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE) as score
FROM documents 
WHERE MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC
LIMIT 20;

-- MySQL limitations:
-- - Minimum word length restrictions
-- - Limited boolean query syntax
-- - Poor performance with large datasets
-- - No advanced ranking or analytics
-- - Limited customization options

MongoDB Atlas Search provides comprehensive search capabilities:

// MongoDB Atlas Search - enterprise-grade search with advanced features
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb+srv://cluster.mongodb.net');
const db = client.db('content_platform');
const articles = db.collection('articles');

// Advanced Atlas Search query with multiple search techniques
const searchQuery = [
  {
    $search: {
      index: "articles_search_index", // Custom search index
      compound: {
        must: [
          // Text search with fuzzy matching
          {
            text: {
              query: "machine learning algorithms",
              path: ["title", "content"],
              fuzzy: {
                maxEdits: 2,
                prefixLength: 1,
                maxExpansions: 50
              }
            }
          }
        ],
        should: [
          // Boost title matches
          {
            text: {
              query: "machine learning algorithms",
              path: "title",
              score: { boost: { value: 3.0 } }
            }
          },
          // Phrase matching with slop
          {
            phrase: {
              query: "machine learning",
              path: ["title", "content"],
              slop: 2,
              score: { boost: { value: 2.0 } }
            }
          },
          // Semantic search using synonyms
          {
            text: {
              query: "machine learning algorithms",
              path: ["title", "content"],
              synonyms: "tech_synonyms"
            }
          }
        ],
        filter: [
          // Date range filtering
          {
            range: {
              path: "publishedDate",
              gte: new Date("2023-01-01"),
              lte: new Date("2025-12-31")
            }
          },
          // Category filtering
          {
            text: {
              query: ["technology", "science", "ai"],
              path: "category"
            }
          }
        ],
        mustNot: [
          // Exclude draft articles
          {
            equals: {
              path: "status",
              value: "draft"
            }
          }
        ]
      },

      // Advanced highlighting
      highlight: {
        path: ["title", "content"],
        maxCharsToExamine: 500000,
        maxNumPassages: 3
      },

      // Count total matches
      count: {
        type: "total"
      }
    }
  },

  // Add computed relevance and metadata
  {
    $addFields: {
      searchScore: { $meta: "searchScore" },
      searchHighlights: { $meta: "searchHighlights" },

      // Custom scoring factors
      popularityScore: {
        $divide: [
          { $add: ["$viewCount", "$likeCount"] },
          { $max: [{ $divide: [{ $subtract: [new Date(), "$publishedDate"] }, 86400000] }, 1] }
        ]
      },

      // Content quality indicators
      contentQuality: {
        $cond: {
          if: { $gte: [{ $strLenCP: "$content" }, 1000] },
          then: { $min: [{ $divide: [{ $strLenCP: "$content" }, 500] }, 5] },
          else: 1
        }
      }
    }
  },

  // Faceted aggregations for search filters
  {
    $facet: {
      // Main search results
      results: [
        {
          $addFields: {
            finalScore: {
              $add: [
                "$searchScore",
                { $multiply: ["$popularityScore", 0.2] },
                { $multiply: ["$contentQuality", 0.1] }
              ]
            }
          }
        },
        { $sort: { finalScore: -1 } },
        { $limit: 20 },
        {
          $project: {
            articleId: "$_id",
            title: 1,
            author: 1,
            category: 1,
            tags: 1,
            publishedDate: 1,
            viewCount: 1,
            searchScore: 1,
            finalScore: 1,
            searchHighlights: 1,
            snippet: { $substr: ["$content", 0, 200] }
          }
        }
      ],

      // Category facets
      categoryFacets: [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 10 }
      ],

      // Author facets
      authorFacets: [
        {
          $group: {
            _id: "$author.name",
            count: { $sum: 1 },
            articles: { $push: "$title" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 10 }
      ],

      // Date range facets
      dateFacets: [
        {
          $group: {
            _id: {
              year: { $year: "$publishedDate" },
              month: { $month: "$publishedDate" }
            },
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { "_id.year": -1, "_id.month": -1 } }
      ],

      // Search analytics
      searchAnalytics: [
        {
          $group: {
            _id: null,
            totalResults: { $sum: 1 },
            avgScore: { $avg: "$searchScore" },
            maxScore: { $max: "$searchScore" },
            scoreDistribution: {
              $push: {
                $switch: {
                  branches: [
                    { case: { $gte: ["$searchScore", 10] }, then: "excellent" },
                    { case: { $gte: ["$searchScore", 5] }, then: "good" },
                    { case: { $gte: ["$searchScore", 2] }, then: "fair" }
                  ],
                  default: "poor"
                }
              }
            }
          }
        }
      ]
    }
  }
];

// Execute search with comprehensive results
const searchResults = await articles.aggregate(searchQuery).toArray();

// Benefits of MongoDB Atlas Search:
// - Advanced relevance scoring with custom ranking factors
// - Semantic search with synonym support and fuzzy matching
// - Real-time search index updates synchronized with data changes
// - Faceted search with complex aggregations
// - Advanced highlighting and snippet generation
// - Built-in analytics and search performance metrics
// - Support for multiple languages and custom analyzers
// - Vector search capabilities for AI and machine learning
// - Auto-completion and suggestion features
// - Geospatial search integration
// - Security and access control integration

Understanding MongoDB Atlas Search Architecture

Search Index Creation and Management

Implement comprehensive search indexes for optimal performance:

// Advanced Atlas Search index management system
class AtlasSearchManager {
  constructor(db) {
    this.db = db;
    this.searchIndexes = new Map();
    this.searchAnalytics = db.collection('search_analytics');
  }

  async createComprehensiveSearchIndex(collection, indexName, indexDefinition) {
    // Create sophisticated search index with multiple field types
    const advancedIndexDefinition = {
      name: indexName,
      definition: {
        // Text search fields with different analyzers
        mappings: {
          dynamic: false,
          fields: {
            // Title field with enhanced text analysis
            title: {
              type: "string",
              analyzer: "lucene.english",
              searchAnalyzer: "lucene.keyword",
              highlightAnalyzer: "lucene.english",
              store: true,
              indexOptions: "freqs"
            },

            // Content field with full-text capabilities
            content: {
              type: "string",
              analyzer: "content_analyzer",
              maxGrams: 15,
              minGrams: 2,
              store: true
            },

            // Category as both text and facet
            category: [
              {
                type: "string",
                analyzer: "lucene.keyword"
              },
              {
                type: "stringFacet"
              }
            ],

            // Tags for exact and fuzzy matching
            tags: {
              type: "string",
              analyzer: "lucene.standard",
              multi: {
                keyword: {
                  type: "string",
                  analyzer: "lucene.keyword"
                }
              }
            },

            // Author information
            "author.name": {
              type: "string",
              analyzer: "lucene.standard",
              store: true
            },

            "author.expertise": {
              type: "stringFacet"
            },

            // Numeric fields for sorting and filtering
            publishedDate: {
              type: "date"
            },

            viewCount: {
              type: "number",
              indexIntegers: true,
              indexDoubles: false
            },

            likeCount: {
              type: "number"
            },

            readingTime: {
              type: "number"
            },

            // Geospatial data
            "location.coordinates": {
              type: "geo"
            },

            // Vector field for semantic search
            contentEmbedding: {
              type: "knnVector",
              dimensions: 1536,
              similarity: "cosine"
            }
          }
        },

        // Custom analyzers
        analyzers: [
          {
            name: "content_analyzer",
            charFilters: [
              {
                type: "htmlStrip"
              },
              {
                type: "mapping",
                mappings: {
                  "& => and",
                  "@ => at"
                }
              }
            ],
            tokenizer: {
              type: "standard"
            },
            tokenFilters: [
              {
                type: "lowercase"
              },
              {
                type: "stop",
                stopwords: ["the", "a", "an", "and", "or", "but"]
              },
              {
                type: "snowballStemming",
                language: "english"
              },
              {
                type: "length",
                min: 2,
                max: 100
              }
            ]
          },

          {
            name: "autocomplete_analyzer",
            tokenizer: {
              type: "edgeGram",
              minGrams: 1,
              maxGrams: 20
            },
            tokenFilters: [
              {
                type: "lowercase"
              }
            ]
          }
        ],

        // Synonym mappings
        synonyms: [
          {
            name: "tech_synonyms",
            source: {
              collection: "synonyms",
              analyzer: "lucene.standard"
            }
          }
        ],

        // Search configuration
        storedSource: {
          include: ["title", "author.name", "category", "publishedDate"],
          exclude: ["content", "internalNotes"]
        }
      }
    };

    try {
      // Create the search index
      const result = await this.db.collection(collection).createSearchIndex(advancedIndexDefinition);

      // Store index metadata
      this.searchIndexes.set(indexName, {
        collection: collection,
        indexName: indexName,
        definition: advancedIndexDefinition,
        createdAt: new Date(),
        status: 'creating'
      });

      console.log(`Search index '${indexName}' created for collection '${collection}'`);
      return result;

    } catch (error) {
      console.error(`Failed to create search index '${indexName}':`, error);
      throw error;
    }
  }

  async createAutoCompleteIndex(collection, fields, indexName = 'autocomplete_index') {
    // Create specialized index for auto-completion
    const autoCompleteIndex = {
      name: indexName,
      definition: {
        mappings: {
          dynamic: false,
          fields: fields.reduce((acc, field) => {
            acc[field.path] = {
              type: "autocomplete",
              analyzer: "autocomplete_analyzer",
              tokenization: "edgeGram",
              maxGrams: field.maxGrams || 15,
              minGrams: field.minGrams || 2,
              foldDiacritics: true
            };
            return acc;
          }, {})
        },
        analyzers: [
          {
            name: "autocomplete_analyzer",
            tokenizer: {
              type: "edgeGram",
              minGrams: 2,
              maxGrams: 15
            },
            tokenFilters: [
              {
                type: "lowercase"
              },
              {
                type: "diacriticFolding"
              }
            ]
          }
        ]
      }
    };

    return await this.db.collection(collection).createSearchIndex(autoCompleteIndex);
  }

  async performAdvancedSearch(collection, searchParams) {
    // Execute sophisticated search with multiple techniques
    const pipeline = [];

    // Build complex search stage
    const searchStage = {
      $search: {
        index: searchParams.index || 'default_search_index',
        compound: {
          must: [],
          should: [],
          filter: [],
          mustNot: []
        }
      }
    };

    // Text search with boosting
    if (searchParams.query) {
      searchStage.$search.compound.must.push({
        text: {
          query: searchParams.query,
          path: searchParams.searchFields || ['title', 'content'],
          fuzzy: searchParams.fuzzy || {
            maxEdits: 2,
            prefixLength: 1
          }
        }
      });

      // Boost title matches
      searchStage.$search.compound.should.push({
        text: {
          query: searchParams.query,
          path: 'title',
          score: { boost: { value: 3.0 } }
        }
      });

      // Phrase matching
      if (searchParams.phraseSearch) {
        searchStage.$search.compound.should.push({
          phrase: {
            query: searchParams.query,
            path: ['title', 'content'],
            slop: 2,
            score: { boost: { value: 2.0 } }
          }
        });
      }
    }

    // Vector search for semantic similarity
    if (searchParams.vectorQuery) {
      searchStage.$search = {
        knnBeta: {
          vector: searchParams.vectorQuery,
          path: "contentEmbedding",
          k: searchParams.vectorK || 50,
          score: {
            boost: {
              value: searchParams.vectorBoost || 1.5
            }
          }
        }
      };
    }

    // Filters
    if (searchParams.filters) {
      if (searchParams.filters.category) {
        searchStage.$search.compound.filter.push({
          text: {
            query: searchParams.filters.category,
            path: "category"
          }
        });
      }

      if (searchParams.filters.dateRange) {
        searchStage.$search.compound.filter.push({
          range: {
            path: "publishedDate",
            gte: new Date(searchParams.filters.dateRange.start),
            lte: new Date(searchParams.filters.dateRange.end)
          }
        });
      }

      if (searchParams.filters.author) {
        searchStage.$search.compound.filter.push({
          text: {
            query: searchParams.filters.author,
            path: "author.name"
          }
        });
      }

      if (searchParams.filters.minViewCount) {
        searchStage.$search.compound.filter.push({
          range: {
            path: "viewCount",
            gte: searchParams.filters.minViewCount
          }
        });
      }
    }

    // Highlighting
    if (searchParams.highlight !== false) {
      searchStage.$search.highlight = {
        path: searchParams.highlightFields || ['title', 'content'],
        maxCharsToExamine: 500000,
        maxNumPassages: 5
      };
    }

    // Count configuration
    if (searchParams.count) {
      searchStage.$search.count = {
        type: searchParams.count.type || 'total',
        threshold: searchParams.count.threshold || 1000
      };
    }

    pipeline.push(searchStage);

    // Add scoring and ranking
    pipeline.push({
      $addFields: {
        searchScore: { $meta: "searchScore" },
        searchHighlights: { $meta: "searchHighlights" },

        // Custom relevance scoring
        relevanceScore: {
          $add: [
            "$searchScore",
            // Boost recent content
            {
              $multiply: [
                {
                  $max: [
                    0,
                    {
                      $subtract: [
                        30,
                        {
                          $divide: [
                            { $subtract: [new Date(), "$publishedDate"] },
                            86400000
                          ]
                        }
                      ]
                    }
                  ]
                },
                0.1
              ]
            },
            // Boost popular content
            {
              $multiply: [
                { $log10: { $max: [1, "$viewCount"] } },
                0.2
              ]
            },
            // Boost quality content
            {
              $multiply: [
                { $min: [{ $divide: [{ $strLenCP: "$content" }, 1000] }, 3] },
                0.15
              ]
            }
          ]
        }
      }
    });

    // Faceted search results
    if (searchParams.facets) {
      pipeline.push({
        $facet: {
          results: [
            { $sort: { relevanceScore: -1 } },
            { $skip: searchParams.skip || 0 },
            { $limit: searchParams.limit || 20 },
            {
              $project: {
                _id: 1,
                title: 1,
                author: 1,
                category: 1,
                tags: 1,
                publishedDate: 1,
                viewCount: 1,
                likeCount: 1,
                searchScore: 1,
                relevanceScore: 1,
                searchHighlights: 1,
                snippet: { $substr: ["$content", 0, 250] },
                readingTime: 1
              }
            }
          ],

          facets: this.buildFacetPipeline(searchParams.facets),

          totalCount: [
            { $count: "total" }
          ]
        }
      });
    } else {
      // Simple results without faceting
      pipeline.push(
        { $sort: { relevanceScore: -1 } },
        { $skip: searchParams.skip || 0 },
        { $limit: searchParams.limit || 20 }
      );
    }

    // Execute search and track analytics
    const startTime = Date.now();
    const results = await this.db.collection(collection).aggregate(pipeline).toArray();
    const executionTime = Date.now() - startTime;

    // Log search analytics
    await this.logSearchAnalytics(searchParams, results, executionTime);

    return results;
  }

  buildFacetPipeline(facetConfig) {
    const facetPipeline = {};

    if (facetConfig.category) {
      facetPipeline.categories = [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 20 }
      ];
    }

    if (facetConfig.author) {
      facetPipeline.authors = [
        {
          $group: {
            _id: "$author.name",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" },
            expertise: { $first: "$author.expertise" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 15 }
      ];
    }

    if (facetConfig.tags) {
      facetPipeline.tags = [
        { $unwind: "$tags" },
        {
          $group: {
            _id: "$tags",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 25 }
      ];
    }

    if (facetConfig.dateRanges) {
      facetPipeline.dateRanges = [
        {
          $bucket: {
            groupBy: "$publishedDate",
            boundaries: [
              new Date("2020-01-01"),
              new Date("2022-01-01"),
              new Date("2023-01-01"),
              new Date("2024-01-01"),
              new Date("2025-01-01"),
              new Date("2030-01-01")
            ],
            default: "older",
            output: {
              count: { $sum: 1 },
              avgScore: { $avg: "$searchScore" }
            }
          }
        }
      ];
    }

    if (facetConfig.viewRanges) {
      facetPipeline.viewRanges = [
        {
          $bucket: {
            groupBy: "$viewCount",
            boundaries: [0, 100, 1000, 10000, 100000, 1000000],
            default: "very_popular",
            output: {
              count: { $sum: 1 },
              avgScore: { $avg: "$searchScore" }
            }
          }
        }
      ];
    }

    return facetPipeline;
  }

  async performAutoComplete(collection, query, field, limit = 10) {
    // Auto-completion search
    const pipeline = [
      {
        $search: {
          index: 'autocomplete_index',
          autocomplete: {
            query: query,
            path: field,
            tokenOrder: "sequential",
            fuzzy: {
              maxEdits: 1,
              prefixLength: 1
            }
          }
        }
      },
      {
        $group: {
          _id: `$${field}`,
          score: { $max: { $meta: "searchScore" } },
          count: { $sum: 1 }
        }
      },
      { $sort: { score: -1, count: -1 } },
      { $limit: limit },
      {
        $project: {
          suggestion: "$_id",
          score: 1,
          frequency: "$count",
          _id: 0
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async performSemanticSearch(collection, queryVector, filters = {}, limit = 20) {
    // Vector-based semantic search
    const pipeline = [
      {
        $vectorSearch: {
          index: "vector_search_index",
          path: "contentEmbedding",
          queryVector: queryVector,
          numCandidates: limit * 10,
          limit: limit,
          filter: filters
        }
      },
      {
        $addFields: {
          vectorScore: { $meta: "vectorSearchScore" }
        }
      },
      {
        $project: {
          title: 1,
          content: { $substr: ["$content", 0, 200] },
          author: 1,
          category: 1,
          publishedDate: 1,
          vectorScore: 1,
          similarity: { $multiply: ["$vectorScore", 100] }
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async createSearchSuggestions(collection, userQuery, suggestionTypes = ['spelling', 'query', 'category']) {
    // Generate search suggestions and corrections
    const suggestions = {
      spelling: [],
      queries: [],
      categories: [],
      authors: []
    };

    // Spelling suggestions using fuzzy search
    if (suggestionTypes.includes('spelling')) {
      const spellingPipeline = [
        {
          $search: {
            index: 'default_search_index',
            text: {
              query: userQuery,
              path: ['title', 'content'],
              fuzzy: {
                maxEdits: 2,
                prefixLength: 0
              }
            }
          }
        },
        { $limit: 5 },
        {
          $project: {
            title: 1,
            score: { $meta: "searchScore" }
          }
        }
      ];

      suggestions.spelling = await this.db.collection(collection).aggregate(spellingPipeline).toArray();
    }

    // Query suggestions from search history
    if (suggestionTypes.includes('query')) {
      suggestions.queries = await this.searchAnalytics.find({
        query: new RegExp(userQuery, 'i'),
        resultCount: { $gt: 0 }
      })
      .sort({ searchCount: -1 })
      .limit(5)
      .project({ query: 1, resultCount: 1 })
      .toArray();
    }

    // Category suggestions
    if (suggestionTypes.includes('category')) {
      const categoryPipeline = [
        {
          $search: {
            index: 'default_search_index',
            text: {
              query: userQuery,
              path: 'category'
            }
          }
        },
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            score: { $max: { $meta: "searchScore" } }
          }
        },
        { $sort: { score: -1, count: -1 } },
        { $limit: 5 }
      ];

      suggestions.categories = await this.db.collection(collection).aggregate(categoryPipeline).toArray();
    }

    return suggestions;
  }

  async logSearchAnalytics(searchParams, results, executionTime) {
    // Track search analytics for optimization
    const analyticsDoc = {
      query: searchParams.query,
      searchType: this.determineSearchType(searchParams),
      filters: searchParams.filters || {},
      resultCount: Array.isArray(results) ? results.length : 
                   (results[0] && results[0].totalCount ? results[0].totalCount[0]?.total : 0),
      executionTime: executionTime,
      timestamp: new Date(),

      // Search quality metrics
      avgScore: this.calculateAverageScore(results),
      scoreDistribution: this.analyzeScoreDistribution(results),

      // User experience metrics
      hasResults: (results && results.length > 0),
      fastResponse: executionTime < 500,

      // Technical metrics
      index: searchParams.index,
      facetsRequested: !!searchParams.facets,
      highlightRequested: searchParams.highlight !== false
    };

    await this.searchAnalytics.insertOne(analyticsDoc);

    // Update search frequency
    await this.searchAnalytics.updateOne(
      { 
        query: searchParams.query,
        searchType: analyticsDoc.searchType 
      },
      { 
        $inc: { searchCount: 1 },
        $set: { lastSearched: new Date() }
      },
      { upsert: true }
    );
  }

  determineSearchType(searchParams) {
    if (searchParams.vectorQuery) return 'vector';
    if (searchParams.phraseSearch) return 'phrase';
    if (searchParams.fuzzy) return 'fuzzy';
    return 'text';
  }

  calculateAverageScore(results) {
    if (!results || !results.length) return 0;

    const scores = results.map(r => r.searchScore || r.relevanceScore || 0);
    return scores.reduce((sum, score) => sum + score, 0) / scores.length;
  }

  analyzeScoreDistribution(results) {
    if (!results || !results.length) return {};

    const scores = results.map(r => r.searchScore || r.relevanceScore || 0);
    const distribution = {
      excellent: scores.filter(s => s >= 10).length,
      good: scores.filter(s => s >= 5 && s < 10).length,
      fair: scores.filter(s => s >= 2 && s < 5).length,
      poor: scores.filter(s => s < 2).length
    };

    return distribution;
  }

  async getSearchAnalytics(dateRange = {}, groupBy = 'day') {
    // Comprehensive search analytics
    const matchStage = {
      timestamp: {
        $gte: dateRange.start || new Date(Date.now() - 30 * 24 * 60 * 60 * 1000),
        $lte: dateRange.end || new Date()
      }
    };

    const pipeline = [
      { $match: matchStage },

      {
        $group: {
          _id: this.getGroupingExpression(groupBy),
          totalSearches: { $sum: 1 },
          uniqueQueries: { $addToSet: "$query" },
          avgExecutionTime: { $avg: "$executionTime" },
          avgResultCount: { $avg: "$resultCount" },
          successfulSearches: {
            $sum: { $cond: [{ $gt: ["$resultCount", 0] }, 1, 0] }
          },
          fastSearches: {
            $sum: { $cond: [{ $lt: ["$executionTime", 500] }, 1, 0] }
          },
          searchTypes: { $push: "$searchType" },
          popularQueries: { $push: "$query" }
        }
      },

      {
        $addFields: {
          uniqueQueryCount: { $size: "$uniqueQueries" },
          successRate: { $divide: ["$successfulSearches", "$totalSearches"] },
          performanceRate: { $divide: ["$fastSearches", "$totalSearches"] },
          topQueries: {
            $slice: [
              {
                $sortArray: {
                  input: {
                    $reduce: {
                      input: "$popularQueries",
                      initialValue: [],
                      in: {
                        $concatArrays: [
                          "$$value",
                          [{ query: "$$this", count: 1 }]
                        ]
                      }
                    }
                  },
                  sortBy: { count: -1 }
                }
              },
              10
            ]
          }
        }
      },

      { $sort: { _id: -1 } }
    ];

    return await this.searchAnalytics.aggregate(pipeline).toArray();
  }

  getGroupingExpression(groupBy) {
    const dateExpressions = {
      hour: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" },
        day: { $dayOfMonth: "$timestamp" },
        hour: { $hour: "$timestamp" }
      },
      day: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" },
        day: { $dayOfMonth: "$timestamp" }
      },
      week: {
        year: { $year: "$timestamp" },
        week: { $week: "$timestamp" }
      },
      month: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" }
      }
    };

    return dateExpressions[groupBy] || dateExpressions.day;
  }

  async optimizeSearchPerformance(collection, analysisRange = 30) {
    // Analyze and optimize search performance
    const analysisDate = new Date(Date.now() - analysisRange * 24 * 60 * 60 * 1000);

    const performanceAnalysis = await this.searchAnalytics.aggregate([
      { $match: { timestamp: { $gte: analysisDate } } },

      {
        $group: {
          _id: null,
          totalSearches: { $sum: 1 },
          avgExecutionTime: { $avg: "$executionTime" },
          slowSearches: {
            $sum: { $cond: [{ $gt: ["$executionTime", 2000] }, 1, 0] }
          },
          emptyResults: {
            $sum: { $cond: [{ $eq: ["$resultCount", 0] }, 1, 0] }
          },
          commonQueries: { $push: "$query" },
          slowQueries: {
            $push: {
              $cond: [
                { $gt: ["$executionTime", 1000] },
                { query: "$query", executionTime: "$executionTime" },
                null
              ]
            }
          }
        }
      }
    ]).toArray();

    const analysis = performanceAnalysis[0];
    const recommendations = [];

    // Performance recommendations
    if (analysis.avgExecutionTime > 1000) {
      recommendations.push({
        type: 'performance',
        issue: 'High average execution time',
        recommendation: 'Consider index optimization or query refinement',
        priority: 'high'
      });
    }

    if (analysis.slowSearches / analysis.totalSearches > 0.1) {
      recommendations.push({
        type: 'performance',
        issue: 'High percentage of slow searches',
        recommendation: 'Review index configuration and query complexity',
        priority: 'high'
      });
    }

    if (analysis.emptyResults / analysis.totalSearches > 0.3) {
      recommendations.push({
        type: 'relevance',
        issue: 'High percentage of searches with no results',
        recommendation: 'Improve fuzzy matching and synonyms configuration',
        priority: 'medium'
      });
    }

    return {
      analysis: analysis,
      recommendations: recommendations,
      generatedAt: new Date()
    };
  }
}

SQL-Style Search Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Atlas Search operations:

-- QueryLeaf Atlas Search operations with SQL-familiar syntax

-- Create full-text search index
CREATE SEARCH INDEX articles_search_idx ON articles (
  -- Text fields with different analyzers
  title WITH (analyzer='lucene.english', boost=3.0),
  content WITH (analyzer='content_analyzer', store=true),

  -- Faceted fields
  category AS FACET,
  "author.name" AS FACET,
  tags AS FACET,

  -- Numeric and date fields
  publishedDate AS DATE,
  viewCount AS NUMBER,
  likeCount AS NUMBER,

  -- Auto-completion fields
  title AS AUTOCOMPLETE WITH (maxGrams=15, minGrams=2),

  -- Vector field for semantic search
  contentEmbedding AS VECTOR WITH (dimensions=1536, similarity='cosine')
);

-- Advanced text search with ranking
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  view_count,

  -- Search relevance scoring
  SEARCH_SCORE() as search_score,
  SEARCH_HIGHLIGHTS('title', 'content') as highlights,

  -- Custom relevance calculation
  (SEARCH_SCORE() + 
   LOG10(GREATEST(1, view_count)) * 0.2 +
   CASE 
     WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 2.0
     WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1.0
     ELSE 0
   END) as final_score

FROM articles
WHERE SEARCH_TEXT('machine learning algorithms', 
  fields => ARRAY['title', 'content'],
  fuzzy => JSON_BUILD_OBJECT('maxEdits', 2, 'prefixLength', 1),
  boost => JSON_BUILD_OBJECT('title', 3.0, 'content', 1.0)
)
AND category IN ('technology', 'science', 'ai')
AND published_date >= '2023-01-01'
AND status != 'draft'

ORDER BY final_score DESC
LIMIT 20;

-- Faceted search with aggregations
WITH search_results AS (
  SELECT *,
    SEARCH_SCORE() as search_score,
    SEARCH_HIGHLIGHTS('title', 'content') as highlights
  FROM articles
  WHERE SEARCH_TEXT('artificial intelligence',
    fields => ARRAY['title', 'content'],
    synonyms => 'tech_synonyms'
  )
)
SELECT 
  -- Main results
  json_build_object(
    'results', json_agg(
      json_build_object(
        'article_id', article_id,
        'title', title,
        'author', author,
        'category', category,
        'search_score', search_score,
        'highlights', highlights
      ) ORDER BY search_score DESC LIMIT 20
    ),

    -- Category facets
    'categoryFacets', (
      SELECT json_agg(
        json_build_object(
          'category', category,
          'count', COUNT(*),
          'avgScore', AVG(search_score)
        )
      )
      FROM (
        SELECT category, search_score
        FROM search_results
        GROUP BY category, search_score
      ) cat_data
      GROUP BY category
      ORDER BY COUNT(*) DESC
    ),

    -- Author facets
    'authorFacets', (
      SELECT json_agg(
        json_build_object(
          'author', author->>'name',
          'count', COUNT(*),
          'expertise', author->>'expertise'
        )
      )
      FROM search_results
      GROUP BY author->>'name', author->>'expertise'
      ORDER BY COUNT(*) DESC
      LIMIT 10
    ),

    -- Search analytics
    'analytics', json_build_object(
      'totalResults', COUNT(*),
      'avgScore', AVG(search_score),
      'maxScore', MAX(search_score),
      'scoreDistribution', json_build_object(
        'excellent', COUNT(*) FILTER (WHERE search_score >= 10),
        'good', COUNT(*) FILTER (WHERE search_score >= 5 AND search_score < 10),
        'fair', COUNT(*) FILTER (WHERE search_score >= 2 AND search_score < 5),
        'poor', COUNT(*) FILTER (WHERE search_score < 2)
      )
    )
  )
FROM search_results;

-- Auto-completion search
SELECT 
  suggestion,
  score,
  frequency
FROM AUTOCOMPLETE_SEARCH('machine lear', 
  field => 'title',
  limit => 10,
  fuzzy => JSON_BUILD_OBJECT('maxEdits', 1)
)
ORDER BY score DESC, frequency DESC;

-- Semantic vector search
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  VECTOR_SCORE() as similarity_score,
  ROUND(VECTOR_SCORE() * 100, 2) as similarity_percentage
FROM articles
WHERE VECTOR_SEARCH(@query_embedding,
  field => 'contentEmbedding',
  k => 20,
  filter => JSON_BUILD_OBJECT('category', ARRAY['technology', 'ai'])
)
ORDER BY similarity_score DESC;

-- Combined text and vector search (hybrid search)
WITH text_search AS (
  SELECT article_id, title, author, category, published_date,
    SEARCH_SCORE() as text_score,
    1 as search_type
  FROM articles
  WHERE SEARCH_TEXT('neural networks deep learning')
  ORDER BY SEARCH_SCORE() DESC
  LIMIT 50
),
vector_search AS (
  SELECT article_id, title, author, category, published_date,
    VECTOR_SCORE() as vector_score,
    2 as search_type
  FROM articles
  WHERE VECTOR_SEARCH(@neural_networks_embedding, field => 'contentEmbedding', k => 50)
),
combined_results AS (
  -- Combine and re-rank results
  SELECT 
    COALESCE(t.article_id, v.article_id) as article_id,
    COALESCE(t.title, v.title) as title,
    COALESCE(t.author, v.author) as author,
    COALESCE(t.category, v.category) as category,
    COALESCE(t.published_date, v.published_date) as published_date,

    -- Hybrid scoring
    COALESCE(t.text_score, 0) * 0.6 + COALESCE(v.vector_score, 0) * 0.4 as hybrid_score,

    CASE 
      WHEN t.article_id IS NOT NULL AND v.article_id IS NOT NULL THEN 'both'
      WHEN t.article_id IS NOT NULL THEN 'text_only'
      ELSE 'vector_only'
    END as match_type
  FROM text_search t
  FULL OUTER JOIN vector_search v ON t.article_id = v.article_id
)
SELECT * FROM combined_results
ORDER BY hybrid_score DESC, match_type = 'both' DESC
LIMIT 20;

-- Search with custom scoring and boosting
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  view_count,
  like_count,

  -- Multi-factor scoring
  (
    SEARCH_SCORE() * 1.0 +                                    -- Base search relevance
    LOG10(GREATEST(1, view_count)) * 0.3 +                   -- Popularity boost
    LOG10(GREATEST(1, like_count)) * 0.2 +                   -- Engagement boost
    CASE 
      WHEN published_date >= CURRENT_DATE - INTERVAL '7 days' THEN 3.0
      WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 2.0  
      WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1.0
      ELSE 0
    END +                                                     -- Recency boost
    CASE 
      WHEN LENGTH(content) >= 2000 THEN 1.5
      WHEN LENGTH(content) >= 1000 THEN 1.0
      ELSE 0.5
    END                                                       -- Content quality boost
  ) as comprehensive_score

FROM articles
WHERE SEARCH_COMPOUND(
  must => ARRAY[
    SEARCH_TEXT('blockchain cryptocurrency', fields => ARRAY['title', 'content'])
  ],
  should => ARRAY[
    SEARCH_TEXT('blockchain', field => 'title', boost => 3.0),
    SEARCH_PHRASE('blockchain technology', fields => ARRAY['title', 'content'], slop => 2)
  ],
  filter => ARRAY[
    SEARCH_RANGE('published_date', gte => '2022-01-01'),
    SEARCH_TERMS('category', values => ARRAY['technology', 'finance'])
  ],
  must_not => ARRAY[
    SEARCH_TERM('status', value => 'draft')
  ]
)
ORDER BY comprehensive_score DESC;

-- Search analytics and performance monitoring  
SELECT 
  DATE_TRUNC('day', search_timestamp) as search_date,
  search_query,
  COUNT(*) as search_count,
  AVG(execution_time_ms) as avg_execution_time,
  AVG(result_count) as avg_results,

  -- Performance metrics
  COUNT(*) FILTER (WHERE execution_time_ms < 500) as fast_searches,
  COUNT(*) FILTER (WHERE result_count > 0) as successful_searches,
  COUNT(*) FILTER (WHERE result_count = 0) as empty_searches,

  -- Search quality metrics
  AVG(CASE WHEN result_count > 0 THEN avg_search_score END) as avg_relevance,

  -- User behavior indicators
  COUNT(DISTINCT user_id) as unique_searchers,
  AVG(click_through_rate) as avg_ctr

FROM search_analytics
WHERE search_timestamp >= CURRENT_DATE - INTERVAL '30 days'
  AND search_query IS NOT NULL
GROUP BY DATE_TRUNC('day', search_timestamp), search_query
HAVING COUNT(*) >= 10  -- Only frequent searches
ORDER BY search_count DESC, avg_execution_time ASC;

-- Search optimization recommendations
WITH search_performance AS (
  SELECT 
    search_query,
    COUNT(*) as frequency,
    AVG(execution_time_ms) as avg_time,
    AVG(result_count) as avg_results,
    STDDEV(execution_time_ms) as time_variance
  FROM search_analytics
  WHERE search_timestamp >= CURRENT_DATE - INTERVAL '7 days'
  GROUP BY search_query
  HAVING COUNT(*) >= 5
),
optimization_analysis AS (
  SELECT *,
    CASE 
      WHEN avg_time > 2000 THEN 'slow_query'
      WHEN avg_results = 0 THEN 'no_results'
      WHEN avg_results < 5 THEN 'few_results'
      WHEN time_variance > avg_time THEN 'inconsistent_performance'
      ELSE 'optimal'
    END as performance_category,

    CASE 
      WHEN avg_time > 2000 THEN 'Add more specific indexes or optimize query complexity'
      WHEN avg_results = 0 THEN 'Improve fuzzy matching and synonym configuration'
      WHEN avg_results < 5 THEN 'Review relevance scoring and boost popular content'
      WHEN time_variance > avg_time THEN 'Investigate index fragmentation or resource contention'
      ELSE 'Query performing well'
    END as recommendation
)
SELECT 
  search_query,
  frequency,
  ROUND(avg_time, 2) as avg_execution_time_ms,
  ROUND(avg_results, 1) as avg_result_count,
  performance_category,
  recommendation,

  -- Priority scoring
  CASE 
    WHEN performance_category = 'slow_query' AND frequency > 100 THEN 1
    WHEN performance_category = 'no_results' AND frequency > 50 THEN 2
    WHEN performance_category = 'inconsistent_performance' AND frequency > 75 THEN 3
    ELSE 4
  END as optimization_priority

FROM optimization_analysis
WHERE performance_category != 'optimal'
ORDER BY optimization_priority, frequency DESC;

-- QueryLeaf provides comprehensive Atlas Search capabilities:
-- 1. SQL-familiar search index creation and management
-- 2. Advanced text search with custom scoring and boosting
-- 3. Faceted search with aggregations and analytics
-- 4. Auto-completion and suggestion generation
-- 5. Vector search for semantic similarity
-- 6. Hybrid search combining text and vector approaches
-- 7. Search analytics and performance monitoring
-- 8. Automated optimization recommendations
-- 9. Real-time search index synchronization
-- 10. Integration with MongoDB's native Atlas Search features

Best Practices for Atlas Search Implementation

Search Index Optimization

Essential practices for optimal search performance:

  1. Index Design Strategy: Design indexes specifically for your search patterns and query types
  2. Field Analysis: Use appropriate analyzers for different content types and languages
  3. Relevance Tuning: Implement custom scoring with business logic and user behavior
  4. Performance Monitoring: Track search analytics and optimize based on real usage patterns
  5. Faceting Strategy: Design facets to support filtering and discovery workflows
  6. Auto-completion Design: Implement sophisticated suggestion systems for user experience

Search Quality and Relevance

Optimize search quality through comprehensive relevance engineering:

  1. Multi-factor Scoring: Combine text relevance with business metrics and user behavior
  2. Semantic Enhancement: Use synonyms and vector search for better understanding
  3. Query Understanding: Implement fuzzy matching and error correction
  4. Content Quality: Factor content quality metrics into relevance scoring
  5. Personalization: Incorporate user preferences and search history
  6. A/B Testing: Continuously test and optimize search relevance algorithms

Conclusion

MongoDB Atlas Search provides enterprise-grade search capabilities that eliminate the complexity of external search engines while delivering sophisticated full-text search, semantic understanding, and search analytics. The integration of advanced search features with familiar SQL syntax makes implementing modern search applications both powerful and accessible.

Key Atlas Search benefits include:

  • Native Integration: Built-in search without external dependencies or synchronization
  • Advanced Relevance: Sophisticated scoring with custom business logic
  • Real-time Updates: Automatic search index synchronization with data changes
  • Comprehensive Analytics: Built-in search performance and user behavior tracking
  • Scalable Architecture: Enterprise-grade performance with horizontal scaling
  • Developer Friendly: Familiar query syntax with powerful search capabilities

Whether you're building e-commerce search, content discovery platforms, knowledge bases, or applications requiring sophisticated text analysis, MongoDB Atlas Search with QueryLeaf's familiar SQL interface provides the foundation for modern search experiences. This combination enables you to implement advanced search capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Atlas Search operations while providing SQL-familiar search index creation, query syntax, and analytics. Advanced search features, relevance tuning, and performance optimization are seamlessly handled through familiar SQL patterns, making enterprise-grade search both powerful and accessible.

The integration of native search capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both sophisticated search functionality and familiar database interaction patterns, ensuring your search solutions remain both effective and maintainable as they scale and evolve.

MongoDB Embedded Documents vs References: Data Modeling Patterns and Performance Optimization for Enterprise Applications

Modern applications require sophisticated data modeling strategies that balance query performance, data consistency, and schema flexibility across complex relationships and evolving business requirements. Traditional relational databases force all relationships through normalized foreign key structures that often create performance bottlenecks, complex joins, and rigid schemas that resist change as applications evolve and business requirements shift.

MongoDB's document-oriented architecture provides powerful flexibility in how relationships are modeled, offering both embedded document patterns that co-locate related data within single documents and reference patterns that maintain relationships through document identifiers. Understanding when to embed versus when to reference is crucial for designing scalable, performant applications that can adapt to changing requirements while maintaining optimal query performance and data consistency.

The Traditional Relational Normalization Challenge

Conventional relational database modeling relies heavily on normalization principles that create complex join-heavy queries and performance challenges:

-- Traditional PostgreSQL normalized schema with complex relationship management overhead

-- User profile management with multiple related entities requiring joins
CREATE TABLE users (
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(100) UNIQUE NOT NULL,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    -- Basic user metadata
    date_of_birth DATE,
    phone_number VARCHAR(20),
    status VARCHAR(20) DEFAULT 'active',

    CONSTRAINT valid_status CHECK (status IN ('active', 'inactive', 'suspended', 'deleted'))
);

-- User addresses requiring separate table and joins for access
CREATE TABLE user_addresses (
    address_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    address_type VARCHAR(20) NOT NULL DEFAULT 'home',

    -- Address components
    street_address VARCHAR(500) NOT NULL,
    apartment_unit VARCHAR(100),
    city VARCHAR(100) NOT NULL,
    state_province VARCHAR(100),
    postal_code VARCHAR(20) NOT NULL,
    country VARCHAR(3) NOT NULL DEFAULT 'USA',

    -- Address metadata
    is_primary BOOLEAN DEFAULT FALSE,
    is_billing BOOLEAN DEFAULT FALSE,
    is_shipping BOOLEAN DEFAULT FALSE,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT valid_address_type CHECK (address_type IN ('home', 'work', 'billing', 'shipping', 'other'))
);

-- User preferences requiring separate storage and complex queries
CREATE TABLE user_preferences (
    preference_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    preference_category VARCHAR(50) NOT NULL,
    preference_key VARCHAR(100) NOT NULL,
    preference_value JSONB NOT NULL,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    UNIQUE (user_id, preference_category, preference_key),
    CONSTRAINT valid_category CHECK (preference_category IN (
        'notifications', 'display', 'privacy', 'content', 'accessibility'
    ))
);

-- User social connections with bidirectional relationship complexity
CREATE TABLE user_connections (
    connection_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    requester_user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    requested_user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    connection_type VARCHAR(30) NOT NULL DEFAULT 'friend',
    connection_status VARCHAR(20) NOT NULL DEFAULT 'pending',

    -- Connection metadata
    requested_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    responded_at TIMESTAMP WITH TIME ZONE,
    last_interaction_at TIMESTAMP WITH TIME ZONE,

    -- Connection details
    connection_strength INTEGER DEFAULT 1 CHECK (connection_strength BETWEEN 1 AND 10),
    mutual_connections INTEGER DEFAULT 0,
    shared_interests TEXT[],

    CONSTRAINT no_self_connection CHECK (requester_user_id != requested_user_id),
    CONSTRAINT valid_connection_type CHECK (connection_type IN (
        'friend', 'family', 'colleague', 'acquaintance', 'blocked'
    )),
    CONSTRAINT valid_status CHECK (connection_status IN (
        'pending', 'accepted', 'declined', 'blocked', 'removed'
    )),

    UNIQUE (requester_user_id, requested_user_id, connection_type)
);

-- User activity tracking requiring separate table with heavy join overhead
CREATE TABLE user_activities (
    activity_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    activity_type VARCHAR(50) NOT NULL,
    activity_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    -- Activity details
    activity_data JSONB NOT NULL DEFAULT '{}',
    activity_source VARCHAR(50) DEFAULT 'web',
    ip_address INET,
    user_agent TEXT,

    -- Context information
    session_id VARCHAR(100),
    page_url TEXT,
    referrer_url TEXT,

    -- Performance tracking
    response_time_ms INTEGER,
    error_occurred BOOLEAN DEFAULT FALSE,
    error_details JSONB,

    CONSTRAINT valid_activity_type CHECK (activity_type IN (
        'login', 'logout', 'page_view', 'action_performed', 'data_modified', 'error_occurred'
    )),
    CONSTRAINT valid_source CHECK (activity_source IN ('web', 'mobile', 'api', 'system'))
);

-- Complex query requiring multiple joins for complete user profile
CREATE OR REPLACE VIEW complete_user_profiles AS
SELECT 
    u.user_id,
    u.email,
    u.username,
    u.first_name,
    u.last_name,
    u.date_of_birth,
    u.phone_number,
    u.status,
    u.created_at,

    -- Primary address information (requires join)
    primary_addr.street_address as primary_street,
    primary_addr.city as primary_city,
    primary_addr.state_province as primary_state,
    primary_addr.postal_code as primary_postal,
    primary_addr.country as primary_country,

    -- Aggregated address count
    COALESCE(addr_counts.total_addresses, 0) as total_addresses,

    -- Connection statistics (expensive aggregation)
    COALESCE(conn_stats.total_connections, 0) as total_connections,
    COALESCE(conn_stats.pending_requests, 0) as pending_requests,
    COALESCE(conn_stats.accepted_connections, 0) as accepted_connections,

    -- Recent activity summary (expensive aggregation with time windows)
    COALESCE(activity_stats.total_activities_7d, 0) as activities_last_7_days,
    COALESCE(activity_stats.last_login, null) as last_login_time,
    COALESCE(activity_stats.last_activity, null) as last_activity_time,

    -- Preference counts (requires additional join)
    COALESCE(pref_counts.total_preferences, 0) as total_preferences

FROM users u

-- Left join for primary address (performance impact)
LEFT JOIN user_addresses primary_addr ON u.user_id = primary_addr.user_id 
    AND primary_addr.is_primary = TRUE

-- Subquery for address counts (additional performance overhead)
LEFT JOIN (
    SELECT user_id, COUNT(*) as total_addresses
    FROM user_addresses
    GROUP BY user_id
) addr_counts ON u.user_id = addr_counts.user_id

-- Complex subquery for connection statistics
LEFT JOIN (
    SELECT 
        COALESCE(requester_user_id, requested_user_id) as user_id,
        COUNT(*) as total_connections,
        COUNT(*) FILTER (WHERE connection_status = 'pending') as pending_requests,
        COUNT(*) FILTER (WHERE connection_status = 'accepted') as accepted_connections
    FROM user_connections
    GROUP BY COALESCE(requester_user_id, requested_user_id)
) conn_stats ON u.user_id = conn_stats.user_id

-- Time-based activity aggregation (expensive computation)
LEFT JOIN (
    SELECT 
        user_id,
        COUNT(*) FILTER (WHERE activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days') as total_activities_7d,
        MAX(activity_timestamp) FILTER (WHERE activity_type = 'login') as last_login,
        MAX(activity_timestamp) as last_activity
    FROM user_activities
    GROUP BY user_id
) activity_stats ON u.user_id = activity_stats.user_id

-- Preference aggregation
LEFT JOIN (
    SELECT user_id, COUNT(*) as total_preferences
    FROM user_preferences
    GROUP BY user_id
) pref_counts ON u.user_id = pref_counts.user_id;

-- Performance analysis of complex join-heavy queries
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM complete_user_profiles 
WHERE status = 'active' 
AND total_connections > 10 
ORDER BY last_activity_time DESC NULLS LAST
LIMIT 20;

-- Complex friend recommendation query with multiple joins and aggregations
WITH friend_recommendations AS (
    SELECT DISTINCT
        u1.user_id as target_user_id,
        u2.user_id as recommended_user_id,
        u2.first_name,
        u2.last_name,
        u2.username,

        -- Mutual connections calculation (expensive)
        mutual_stats.mutual_count,

        -- Shared interests analysis
        CASE 
            WHEN EXISTS (
                SELECT 1 FROM user_connections uc1
                JOIN user_connections uc2 ON uc1.requested_user_id = uc2.requester_user_id
                WHERE uc1.requester_user_id = u1.user_id 
                AND uc2.requested_user_id = u2.user_id
                AND uc1.shared_interests && uc2.shared_interests
            ) THEN TRUE ELSE FALSE
        END as has_shared_interests,

        -- Activity similarity
        CASE 
            WHEN EXISTS (
                SELECT 1 FROM user_activities ua1
                JOIN user_activities ua2 ON ua1.activity_type = ua2.activity_type
                WHERE ua1.user_id = u1.user_id 
                AND ua2.user_id = u2.user_id
                AND ua1.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
                AND ua2.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
                GROUP BY ua1.activity_type 
                HAVING COUNT(*) > 5
            ) THEN TRUE ELSE FALSE
        END as similar_activity_patterns,

        -- Geographic proximity (if addresses available)
        CASE 
            WHEN addr1.city = addr2.city AND addr1.state_province = addr2.state_province 
            THEN TRUE ELSE FALSE
        END as same_geographic_area,

        -- Recommendation score calculation
        (
            COALESCE(mutual_stats.mutual_count, 0) * 3 +
            CASE WHEN has_shared_interests THEN 2 ELSE 0 END +
            CASE WHEN similar_activity_patterns THEN 2 ELSE 0 END +
            CASE WHEN same_geographic_area THEN 1 ELSE 0 END
        ) as recommendation_score

    FROM users u1
    CROSS JOIN users u2

    -- Ensure not already connected
    LEFT JOIN user_connections existing_conn ON (
        (existing_conn.requester_user_id = u1.user_id AND existing_conn.requested_user_id = u2.user_id) OR
        (existing_conn.requester_user_id = u2.user_id AND existing_conn.requested_user_id = u1.user_id)
    )

    -- Mutual connections calculation (very expensive subquery)
    LEFT JOIN (
        SELECT 
            uc1.requester_user_id as user1_id,
            uc2.requester_user_id as user2_id,
            COUNT(*) as mutual_count
        FROM user_connections uc1
        JOIN user_connections uc2 ON uc1.requested_user_id = uc2.requested_user_id
        WHERE uc1.connection_status = 'accepted' 
        AND uc2.connection_status = 'accepted'
        AND uc1.requester_user_id != uc2.requester_user_id
        GROUP BY uc1.requester_user_id, uc2.requester_user_id
    ) mutual_stats ON mutual_stats.user1_id = u1.user_id AND mutual_stats.user2_id = u2.user_id

    -- Address proximity joins
    LEFT JOIN user_addresses addr1 ON u1.user_id = addr1.user_id AND addr1.is_primary = TRUE
    LEFT JOIN user_addresses addr2 ON u2.user_id = addr2.user_id AND addr2.is_primary = TRUE

    WHERE u1.user_id != u2.user_id
    AND u1.status = 'active'
    AND u2.status = 'active'
    AND existing_conn.connection_id IS NULL -- Not already connected
)

SELECT 
    target_user_id,
    recommended_user_id,
    first_name,
    last_name,
    username,
    mutual_count,
    recommendation_score,
    has_shared_interests,
    similar_activity_patterns,
    same_geographic_area,

    -- Ranking within recommendations for this user
    ROW_NUMBER() OVER (
        PARTITION BY target_user_id 
        ORDER BY recommendation_score DESC, mutual_count DESC
    ) as recommendation_rank

FROM friend_recommendations
WHERE recommendation_score > 0
ORDER BY target_user_id, recommendation_score DESC;

-- Problems with traditional normalized relational modeling:
-- 1. Complex multi-table joins required for basic user profile queries affecting performance
-- 2. Expensive aggregation queries across multiple related tables with poor scalability  
-- 3. Rigid schema structure requiring ALTER TABLE operations for new fields
-- 4. Foreign key constraint management overhead affecting insert/update performance
-- 5. Complex query optimization challenges with multiple join paths and aggregations
-- 6. Difficulty modeling variable or optional relationship structures
-- 7. Performance degradation as related data volume increases due to join complexity
-- 8. Complex application code required to reconstruct related objects from multiple tables
-- 9. Limited ability to co-locate frequently accessed related data for optimal performance
-- 10. Expensive view materialization and maintenance for denormalized query patterns

MongoDB provides flexible document modeling patterns that optimize for query performance and data access patterns:

// MongoDB Document Modeling - Flexible embedded and reference patterns for optimal performance
const { MongoClient, ObjectId } = require('mongodb');

// Advanced MongoDB Document Modeling Manager for Enterprise Data Relationship Optimization
class AdvancedDocumentModelingManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'enterprise_application');

    this.config = {
      // Modeling configuration
      enableEmbeddedOptimization: config.enableEmbeddedOptimization !== false,
      enableReferenceOptimization: config.enableReferenceOptimization !== false,
      enableHybridModeling: config.enableHybridModeling !== false,

      // Performance optimization
      enableQueryOptimization: config.enableQueryOptimization !== false,
      enableIndexOptimization: config.enableIndexOptimization !== false,
      enableAggregationOptimization: config.enableAggregationOptimization !== false,

      // Data consistency
      enableConsistencyValidation: config.enableConsistencyValidation !== false,
      enableReferentialIntegrity: config.enableReferentialIntegrity !== false,
      enableDataSynchronization: config.enableDataSynchronization !== false,

      // Monitoring and analytics
      enablePerformanceMonitoring: config.enablePerformanceMonitoring !== false,
      enableQueryAnalytics: config.enableQueryAnalytics !== false,
      enableDocumentSizeMonitoring: config.enableDocumentSizeMonitoring !== false
    };

    // Modeling strategy tracking
    this.modelingStrategies = new Map();
    this.performanceMetrics = new Map();
    this.queryPatterns = new Map();

    this.initializeModelingManager();
  }

  async initializeModelingManager() {
    console.log('Initializing Advanced Document Modeling Manager...');

    try {
      // Setup embedded document patterns
      await this.setupEmbeddedDocumentPatterns();

      // Setup reference patterns
      await this.setupReferencePatterns();

      // Setup hybrid modeling patterns
      await this.setupHybridModelingPatterns();

      // Initialize performance monitoring
      if (this.config.enablePerformanceMonitoring) {
        await this.initializePerformanceMonitoring();
      }

      console.log('Document Modeling Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing document modeling manager:', error);
      throw error;
    }
  }

  async setupEmbeddedDocumentPatterns() {
    console.log('Setting up embedded document modeling patterns...');

    try {
      // User profile with embedded addresses and preferences - optimal for frequent co-access
      const userProfilesCollection = this.db.collection('user_profiles_embedded');

      // Create optimized indexes for embedded document queries
      await userProfilesCollection.createIndexes([
        { key: { email: 1 }, unique: true, background: true },
        { key: { username: 1 }, unique: true, background: true },
        { key: { 'addresses.type': 1, 'addresses.isPrimary': 1 }, background: true },
        { key: { 'preferences.category': 1, 'preferences.key': 1 }, background: true },
        { key: { status: 1, lastActivityAt: -1 }, background: true }
      ]);

      this.modelingStrategies.set('user_profiles_embedded', {
        collection: userProfilesCollection,
        pattern: 'embedded_documents',
        useCase: 'frequently_accessed_related_data',
        benefits: [
          'Single query for complete user profile',
          'Atomic updates for user and related data',
          'No joins required for common queries',
          'Optimal performance for read-heavy workloads'
        ],
        considerations: [
          'Document size growth with related data',
          'Potential for data duplication',
          'Complex update operations for nested data'
        ],
        queryOptimization: {
          primaryQueries: ['find_by_user_id', 'find_by_email', 'find_with_addresses'],
          indexStrategy: 'compound_indexes_for_embedded_fields',
          projectionStrategy: 'selective_field_projection'
        }
      });

      // Order documents with embedded line items - transactional consistency
      const ordersCollection = this.db.collection('orders_embedded');

      await ordersCollection.createIndexes([
        { key: { customerId: 1, orderDate: -1 }, background: true },
        { key: { orderStatus: 1, orderDate: -1 }, background: true },
        { key: { 'items.productId': 1 }, background: true },
        { key: { 'items.category': 1, orderDate: -1 }, background: true },
        { key: { totalAmount: 1 }, background: true }
      ]);

      this.modelingStrategies.set('orders_embedded', {
        collection: ordersCollection,
        pattern: 'embedded_array_documents',
        useCase: 'transactional_consistency_required',
        benefits: [
          'ACID guarantees for order and line items',
          'Single document queries for complete orders',
          'Efficient aggregation across order items',
          'Simplified application logic'
        ],
        considerations: [
          'Document size with many line items',
          'Array index performance for large arrays',
          'Memory usage for large embedded arrays'
        ]
      });

      console.log('Embedded document patterns configured successfully');

    } catch (error) {
      console.error('Error setting up embedded document patterns:', error);
      throw error;
    }
  }

  async setupReferencePatterns() {
    console.log('Setting up reference modeling patterns...');

    try {
      // User collection with references to separate related collections
      const usersCollection = this.db.collection('users_referenced');
      const addressesCollection = this.db.collection('user_addresses_referenced');
      const activitiesCollection = this.db.collection('user_activities_referenced');

      // User collection indexes
      await usersCollection.createIndexes([
        { key: { email: 1 }, unique: true, background: true },
        { key: { username: 1 }, unique: true, background: true },
        { key: { status: 1, createdAt: -1 }, background: true }
      ]);

      // Address collection with user references
      await addressesCollection.createIndexes([
        { key: { userId: 1, type: 1 }, background: true },
        { key: { userId: 1, isPrimary: 1 }, background: true },
        { key: { city: 1, stateProvince: 1 }, background: true }
      ]);

      // Activity collection with user references and time-based queries
      await activitiesCollection.createIndexes([
        { key: { userId: 1, timestamp: -1 }, background: true },
        { key: { activityType: 1, timestamp: -1 }, background: true },
        { key: { timestamp: -1 }, background: true }
      ]);

      this.modelingStrategies.set('users_referenced', {
        collections: {
          users: usersCollection,
          addresses: addressesCollection,  
          activities: activitiesCollection
        },
        pattern: 'normalized_references',
        useCase: 'independent_entity_management',
        benefits: [
          'Normalized data structure reduces duplication',
          'Independent scaling of related collections',
          'Flexible querying of individual entity types',
          'Efficient updates to specific data types'
        ],
        considerations: [
          'Multiple queries required for complete data',
          'Application-level join complexity',
          'Potential consistency challenges',
          'Network round-trips for related data'
        ],
        queryOptimization: {
          primaryQueries: ['find_user_with_addresses', 'find_user_activities', 'aggregate_user_data'],
          joinStrategy: 'application_level_population',
          indexStrategy: 'reference_field_optimization'
        }
      });

      console.log('Reference patterns configured successfully');

    } catch (error) {
      console.error('Error setting up reference patterns:', error);
      throw error;
    }
  }

  async setupHybridModelingPatterns() {
    console.log('Setting up hybrid modeling patterns...');

    try {
      // Blog posts with embedded metadata and referenced comments
      const blogPostsCollection = this.db.collection('blog_posts_hybrid');
      const commentsCollection = this.db.collection('blog_comments_hybrid');

      await blogPostsCollection.createIndexes([
        { key: { authorId: 1, publishedAt: -1 }, background: true },
        { key: { 'tags.name': 1, publishedAt: -1 }, background: true },
        { key: { status: 1, publishedAt: -1 }, background: true },
        { key: { 'metadata.category': 1 }, background: true }
      ]);

      await commentsCollection.createIndexes([
        { key: { postId: 1, createdAt: -1 }, background: true },
        { key: { authorId: 1, createdAt: -1 }, background: true },
        { key: { status: 1, createdAt: -1 }, background: true }
      ]);

      this.modelingStrategies.set('blog_posts_hybrid', {
        collections: {
          posts: blogPostsCollection,
          comments: commentsCollection
        },
        pattern: 'hybrid_embedded_and_referenced',
        useCase: 'mixed_access_patterns',
        benefits: [
          'Optimized for different query patterns',
          'Embedded data for frequent access',
          'Referenced data for independent management',
          'Balanced performance and flexibility'
        ],
        considerations: [
          'Complex modeling decisions',
          'Mixed query strategies required',
          'Potential data consistency complexity'
        ]
      });

      console.log('Hybrid modeling patterns configured successfully');

    } catch (error) {
      console.error('Error setting up hybrid patterns:', error);
      throw error;
    }
  }

  async createEmbeddedUserProfile(userData) {
    console.log('Creating user profile with embedded document pattern...');

    try {
      const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;

      const embeddedProfile = {
        _id: new ObjectId(),
        email: userData.email,
        username: userData.username,
        firstName: userData.firstName,
        lastName: userData.lastName,
        phoneNumber: userData.phoneNumber,
        dateOfBirth: userData.dateOfBirth,
        status: 'active',

        // Embedded addresses for optimal co-access
        addresses: userData.addresses?.map(addr => ({
          _id: new ObjectId(),
          type: addr.type,
          streetAddress: addr.streetAddress,
          apartmentUnit: addr.apartmentUnit,
          city: addr.city,
          stateProvince: addr.stateProvince,
          postalCode: addr.postalCode,
          country: addr.country,
          isPrimary: addr.isPrimary || false,
          isBilling: addr.isBilling || false,
          isShipping: addr.isShipping || false,
          createdAt: new Date(),
          updatedAt: new Date()
        })) || [],

        // Embedded preferences for atomic updates
        preferences: userData.preferences?.map(pref => ({
          _id: new ObjectId(),
          category: pref.category,
          key: pref.key,
          value: pref.value,
          dataType: pref.dataType || 'string',
          createdAt: new Date(),
          updatedAt: new Date()
        })) || [],

        // Embedded profile metadata
        profileMetadata: {
          theme: userData.theme || 'light',
          language: userData.language || 'en',
          timezone: userData.timezone || 'UTC',
          notificationSettings: {
            email: userData.emailNotifications !== false,
            push: userData.pushNotifications !== false,
            sms: userData.smsNotifications || false
          },
          privacySettings: {
            profileVisibility: userData.profileVisibility || 'public',
            allowDirectMessages: userData.allowDirectMessages !== false,
            shareActivityStatus: userData.shareActivityStatus !== false
          }
        },

        // Activity summary (embedded for performance)
        activitySummary: {
          totalLogins: 0,
          lastLoginAt: null,
          lastActivityAt: new Date(),
          accountCreatedAt: new Date(),
          profileCompletionScore: this.calculateProfileCompleteness(userData)
        },

        // Audit information
        createdAt: new Date(),
        updatedAt: new Date(),
        version: 1
      };

      const result = await userProfilesCollection.insertOne(embeddedProfile);

      // Update performance metrics
      await this.updateModelingMetrics('user_profiles_embedded', 'create', embeddedProfile);

      console.log(`Embedded user profile created: ${result.insertedId}`);

      return {
        userId: result.insertedId,
        modelingPattern: 'embedded_documents',
        documentsCreated: 1,
        queryOptimized: true,
        atomicUpdates: true
      };

    } catch (error) {
      console.error('Error creating embedded user profile:', error);
      throw error;
    }
  }

  async createReferencedUserProfile(userData) {
    console.log('Creating user profile with reference pattern...');

    try {
      const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;
      const addressesCollection = this.modelingStrategies.get('users_referenced').collections.addresses;

      // Create main user document
      const userDocument = {
        _id: new ObjectId(),
        email: userData.email,
        username: userData.username,
        firstName: userData.firstName,
        lastName: userData.lastName,
        phoneNumber: userData.phoneNumber,
        dateOfBirth: userData.dateOfBirth,
        status: 'active',

        // Basic profile information
        profileMetadata: {
          theme: userData.theme || 'light',
          language: userData.language || 'en',
          timezone: userData.timezone || 'UTC'
        },

        createdAt: new Date(),
        updatedAt: new Date(),
        version: 1
      };

      const userResult = await usersCollection.insertOne(userDocument);
      const userId = userResult.insertedId;

      // Create referenced address documents
      const addressDocuments = userData.addresses?.map(addr => ({
        _id: new ObjectId(),
        userId: userId,
        type: addr.type,
        streetAddress: addr.streetAddress,
        apartmentUnit: addr.apartmentUnit,
        city: addr.city,
        stateProvince: addr.stateProvince,
        postalCode: addr.postalCode,
        country: addr.country,
        isPrimary: addr.isPrimary || false,
        isBilling: addr.isBilling || false,
        isShipping: addr.isShipping || false,
        createdAt: new Date(),
        updatedAt: new Date()
      })) || [];

      let addressResults = null;
      if (addressDocuments.length > 0) {
        addressResults = await addressesCollection.insertMany(addressDocuments);
      }

      // Update performance metrics
      await this.updateModelingMetrics('users_referenced', 'create', {
        mainDocument: userDocument,
        referencedDocuments: addressDocuments
      });

      console.log(`Referenced user profile created: ${userId} with ${addressDocuments.length} addresses`);

      return {
        userId: userId,
        modelingPattern: 'normalized_references',
        documentsCreated: 1 + addressDocuments.length,
        addressIds: addressResults ? Object.values(addressResults.insertedIds) : [],
        queryOptimized: false, // Requires joins
        normalizedStructure: true
      };

    } catch (error) {
      console.error('Error creating referenced user profile:', error);
      throw error;
    }
  }

  async getUserProfileEmbedded(userId, options = {}) {
    console.log(`Retrieving embedded user profile: ${userId}`);

    try {
      const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;

      // Single query for complete profile - optimal performance
      const projection = options.fields ? this.buildProjection(options.fields) : {};

      const profile = await userProfilesCollection.findOne(
        { _id: new ObjectId(userId) },
        { projection }
      );

      if (!profile) {
        throw new Error(`User profile not found: ${userId}`);
      }

      // Update query metrics
      await this.updateQueryMetrics('user_profiles_embedded', 'single_document_query', {
        documentsReturned: 1,
        queryTime: Date.now(),
        projectionUsed: Object.keys(projection).length > 0
      });

      console.log(`Embedded profile retrieved: ${userId} (single query)`);

      return {
        profile: profile,
        modelingPattern: 'embedded_documents',
        queriesExecuted: 1,
        performanceOptimized: true,
        dataConsistency: 'guaranteed'
      };

    } catch (error) {
      console.error(`Error retrieving embedded profile ${userId}:`, error);
      throw error;
    }
  }

  async getUserProfileReferenced(userId, options = {}) {
    console.log(`Retrieving referenced user profile: ${userId}`);

    try {
      const collections = this.modelingStrategies.get('users_referenced').collections;

      // Multiple queries required for complete profile
      const queries = [];

      // Main user query
      queries.push(
        collections.users.findOne({ _id: new ObjectId(userId) })
      );

      // Related data queries
      if (!options.userOnly) {
        queries.push(
          collections.addresses.find({ userId: new ObjectId(userId) }).toArray()
        );
      }

      const [userDoc, addressDocs] = await Promise.all(queries);

      if (!userDoc) {
        throw new Error(`User not found: ${userId}`);
      }

      // Construct complete profile from multiple documents
      const completeProfile = {
        ...userDoc,
        addresses: addressDocs || [],

        // Derived fields
        primaryAddress: addressDocs?.find(addr => addr.isPrimary),
        addressCount: addressDocs?.length || 0
      };

      // Update query metrics
      await this.updateQueryMetrics('users_referenced', 'multi_document_query', {
        documentsReturned: 1 + (addressDocs?.length || 0),
        queriesExecuted: queries.length,
        queryTime: Date.now()
      });

      console.log(`Referenced profile retrieved: ${userId} (${queries.length} queries)`);

      return {
        profile: completeProfile,
        modelingPattern: 'normalized_references', 
        queriesExecuted: queries.length,
        performanceOptimized: false,
        dataConsistency: 'eventual'
      };

    } catch (error) {
      console.error(`Error retrieving referenced profile ${userId}:`, error);
      throw error;
    }
  }

  async updateEmbeddedUserAddress(userId, addressId, updateData) {
    console.log(`Updating embedded user address: ${userId}, ${addressId}`);

    try {
      const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;

      // Atomic update of embedded address document
      const updateFields = {};
      Object.keys(updateData).forEach(key => {
        updateFields[`addresses.$.${key}`] = updateData[key];
      });
      updateFields['addresses.$.updatedAt'] = new Date();
      updateFields['updatedAt'] = new Date();

      const result = await userProfilesCollection.updateOne(
        { 
          _id: new ObjectId(userId), 
          'addresses._id': new ObjectId(addressId) 
        },
        { 
          $set: updateFields,
          $inc: { version: 1 }
        }
      );

      if (result.matchedCount === 0) {
        throw new Error(`Address not found: ${addressId} for user ${userId}`);
      }

      console.log(`Embedded address updated: ${addressId} (atomic operation)`);

      return {
        addressId: addressId,
        modelingPattern: 'embedded_documents',
        atomicUpdate: true,
        documentsModified: result.modifiedCount,
        consistencyGuaranteed: true
      };

    } catch (error) {
      console.error(`Error updating embedded address:`, error);
      throw error;
    }
  }

  async updateReferencedUserAddress(userId, addressId, updateData) {
    console.log(`Updating referenced user address: ${userId}, ${addressId}`);

    try {
      const addressesCollection = this.modelingStrategies.get('users_referenced').collections.addresses;

      // Update referenced address document
      const result = await addressesCollection.updateOne(
        { 
          _id: new ObjectId(addressId),
          userId: new ObjectId(userId) 
        },
        { 
          $set: {
            ...updateData,
            updatedAt: new Date()
          }
        }
      );

      if (result.matchedCount === 0) {
        throw new Error(`Address not found: ${addressId} for user ${userId}`);
      }

      // Potentially update user document timestamp (separate operation)
      const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;
      await usersCollection.updateOne(
        { _id: new ObjectId(userId) },
        { $set: { updatedAt: new Date() } }
      );

      console.log(`Referenced address updated: ${addressId} (separate operations)`);

      return {
        addressId: addressId,
        modelingPattern: 'normalized_references',
        atomicUpdate: false,
        documentsModified: result.modifiedCount,
        consistencyGuaranteed: false
      };

    } catch (error) {
      console.error(`Error updating referenced address:`, error);
      throw error;
    }
  }

  async performComplexAggregation(pattern, aggregationQuery) {
    console.log(`Performing complex aggregation with ${pattern} pattern`);

    try {
      let result;
      const startTime = Date.now();

      if (pattern === 'embedded') {
        const collection = this.modelingStrategies.get('user_profiles_embedded').collection;

        // Single collection aggregation pipeline
        const pipeline = [
          { $match: aggregationQuery.match || {} },

          // Unwind embedded arrays for aggregation
          ...(aggregationQuery.unwindAddresses ? [{ $unwind: '$addresses' }] : []),
          ...(aggregationQuery.unwindPreferences ? [{ $unwind: '$preferences' }] : []),

          // Group and aggregate
          {
            $group: {
              _id: aggregationQuery.groupBy || null,
              userCount: { $sum: 1 },
              avgProfileScore: { $avg: '$activitySummary.profileCompletionScore' },
              totalAddresses: { $sum: { $size: '$addresses' } },
              activeUsers: { 
                $sum: { $cond: [{ $eq: ['$status', 'active'] }, 1, 0] } 
              }
            }
          },

          { $sort: { userCount: -1 } },
          { $limit: aggregationQuery.limit || 100 }
        ];

        result = await collection.aggregate(pipeline).toArray();

      } else if (pattern === 'referenced') {
        // Multi-collection aggregation with $lookup
        const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;

        const pipeline = [
          { $match: aggregationQuery.match || {} },

          // Lookup addresses
          {
            $lookup: {
              from: 'user_addresses_referenced',
              localField: '_id',
              foreignField: 'userId',
              as: 'addresses'
            }
          },

          // Lookup activities
          {
            $lookup: {
              from: 'user_activities_referenced', 
              localField: '_id',
              foreignField: 'userId',
              as: 'activities'
            }
          },

          // Group and aggregate
          {
            $group: {
              _id: aggregationQuery.groupBy || null,
              userCount: { $sum: 1 },
              totalAddresses: { $sum: { $size: '$addresses' } },
              totalActivities: { $sum: { $size: '$activities' } },
              activeUsers: { 
                $sum: { $cond: [{ $eq: ['$status', 'active'] }, 1, 0] } 
              }
            }
          },

          { $sort: { userCount: -1 } },
          { $limit: aggregationQuery.limit || 100 }
        ];

        result = await usersCollection.aggregate(pipeline).toArray();
      }

      const executionTime = Date.now() - startTime;

      // Update aggregation metrics
      await this.updateQueryMetrics(`${pattern}_aggregation`, 'complex_aggregation', {
        executionTime: executionTime,
        documentsProcessed: result.length,
        pipelineStages: aggregationQuery.pipelineStages || 0
      });

      console.log(`${pattern} aggregation completed in ${executionTime}ms`);

      return {
        results: result,
        modelingPattern: pattern,
        executionTime: executionTime,
        performanceProfile: executionTime < 100 ? 'optimal' : executionTime < 500 ? 'acceptable' : 'needs_optimization'
      };

    } catch (error) {
      console.error(`Error performing ${pattern} aggregation:`, error);
      throw error;
    }
  }

  // Utility methods for document modeling optimization

  calculateProfileCompleteness(userData) {
    let score = 0;

    // Basic information (50 points)
    if (userData.firstName) score += 10;
    if (userData.lastName) score += 10;
    if (userData.email) score += 10;
    if (userData.phoneNumber) score += 10;
    if (userData.dateOfBirth) score += 10;

    // Addresses (25 points)
    if (userData.addresses?.length > 0) score += 25;

    // Preferences (25 points)
    if (userData.preferences?.length > 0) score += 25;

    return Math.min(score, 100);
  }

  buildProjection(fields) {
    const projection = {};
    fields.forEach(field => {
      projection[field] = 1;
    });
    return projection;
  }

  async updateModelingMetrics(strategy, operation, metadata) {
    if (!this.config.enablePerformanceMonitoring) return;

    const metrics = this.performanceMetrics.get(strategy) || {
      totalOperations: 0,
      operationTypes: {},
      averageDocumentSize: 0,
      performanceProfile: 'unknown'
    };

    metrics.totalOperations++;
    metrics.operationTypes[operation] = (metrics.operationTypes[operation] || 0) + 1;
    metrics.lastOperation = new Date();

    if (metadata.documentsCreated) {
      metrics.documentsCreated = (metrics.documentsCreated || 0) + metadata.documentsCreated;
    }

    this.performanceMetrics.set(strategy, metrics);
  }

  async updateQueryMetrics(strategy, queryType, metadata) {
    if (!this.config.enableQueryAnalytics) return;

    const queryMetrics = this.queryPatterns.get(strategy) || {
      totalQueries: 0,
      queryTypes: {},
      averageQueryTime: 0,
      performanceProfile: {}
    };

    queryMetrics.totalQueries++;
    queryMetrics.queryTypes[queryType] = (queryMetrics.queryTypes[queryType] || 0) + 1;

    if (metadata.queryTime) {
      const currentAvg = queryMetrics.averageQueryTime || 0;
      queryMetrics.averageQueryTime = (currentAvg + metadata.queryTime) / 2;
    }

    if (metadata.executionTime) {
      queryMetrics.performanceProfile[queryType] = metadata.executionTime;
    }

    this.queryPatterns.set(strategy, queryMetrics);
  }

  async getModelingRecommendations(collectionName, queryPatterns) {
    console.log(`Generating modeling recommendations for: ${collectionName}`);

    const recommendations = {
      currentPattern: 'unknown',
      recommendedPattern: 'unknown',
      reasoning: [],
      tradeoffs: {},
      migrationComplexity: 'unknown'
    };

    // Analyze query patterns
    const embeddedQueries = queryPatterns.filter(q => q.type === 'find_complete_document').length;
    const partialQueries = queryPatterns.filter(q => q.type === 'find_partial_data').length;
    const updateFrequency = queryPatterns.filter(q => q.type === 'update_operation').length;
    const aggregationComplexity = queryPatterns.filter(q => q.type === 'aggregation').length;

    // Analyze data characteristics
    const avgDocumentSize = queryPatterns.reduce((sum, q) => sum + (q.documentSize || 0), 0) / queryPatterns.length;
    const dataGrowthRate = queryPatterns.reduce((sum, q) => sum + (q.growthRate || 0), 0) / queryPatterns.length;

    // Generate recommendations based on patterns
    if (embeddedQueries > partialQueries * 2 && avgDocumentSize < 16 * 1024 * 1024) {
      recommendations.recommendedPattern = 'embedded_documents';
      recommendations.reasoning.push('High frequency of complete document queries');
      recommendations.reasoning.push('Document size within MongoDB limits');

      if (updateFrequency > embeddedQueries * 0.3) {
        recommendations.reasoning.push('Consider hybrid pattern due to high update frequency');
      }

    } else if (partialQueries > embeddedQueries && dataGrowthRate > 0.1) {
      recommendations.recommendedPattern = 'normalized_references';
      recommendations.reasoning.push('High frequency of partial data queries');
      recommendations.reasoning.push('High data growth rate favors normalization');

    } else if (aggregationComplexity > queryPatterns.length * 0.2) {
      recommendations.recommendedPattern = 'hybrid_pattern';
      recommendations.reasoning.push('Complex aggregation requirements');
      recommendations.reasoning.push('Mixed access patterns detected');
    }

    // Define tradeoffs
    recommendations.tradeoffs = {
      embedded_documents: {
        benefits: ['Single query performance', 'Atomic updates', 'Data locality'],
        drawbacks: ['Document size growth', 'Potential duplication', 'Complex nested updates']
      },
      normalized_references: {
        benefits: ['Data normalization', 'Independent scaling', 'Flexible querying'],
        drawbacks: ['Multiple queries required', 'Application complexity', 'Consistency challenges']
      },
      hybrid_pattern: {
        benefits: ['Optimized for mixed patterns', 'Balanced performance'],
        drawbacks: ['Increased complexity', 'Mixed consistency models']
      }
    };

    return recommendations;
  }

  async getPerformanceAnalysis() {
    console.log('Generating performance analysis for modeling patterns...');

    const analysis = {
      embeddedPatterns: {},
      referencedPatterns: {},
      hybridPatterns: {},
      recommendations: []
    };

    // Analyze embedded pattern performance
    for (const [strategy, metrics] of this.performanceMetrics) {
      if (strategy.includes('embedded')) {
        analysis.embeddedPatterns[strategy] = {
          totalOperations: metrics.totalOperations,
          operationBreakdown: metrics.operationTypes,
          averagePerformance: metrics.averageQueryTime || 0,
          performanceRating: this.ratePerformance(metrics.averageQueryTime || 0)
        };
      } else if (strategy.includes('referenced')) {
        analysis.referencedPatterns[strategy] = {
          totalOperations: metrics.totalOperations,
          operationBreakdown: metrics.operationTypes,
          averagePerformance: metrics.averageQueryTime || 0,
          performanceRating: this.ratePerformance(metrics.averageQueryTime || 0)
        };
      }
    }

    // Generate global recommendations
    analysis.recommendations = [
      'Use embedded documents for frequently co-accessed data',
      'Use references for large or independently managed entities',
      'Consider hybrid patterns for complex applications',
      'Monitor document sizes to avoid 16MB limit',
      'Optimize indexes based on query patterns'
    ];

    return analysis;
  }

  ratePerformance(avgTime) {
    if (avgTime < 10) return 'excellent';
    if (avgTime < 50) return 'good';
    if (avgTime < 200) return 'acceptable';
    return 'needs_optimization';
  }

  async cleanup() {
    console.log('Cleaning up Document Modeling Manager...');

    this.modelingStrategies.clear();
    this.performanceMetrics.clear();
    this.queryPatterns.clear();

    console.log('Document Modeling Manager cleanup completed');
  }
}

// Example usage demonstrating embedded vs referenced patterns
async function demonstrateDocumentModelingPatterns() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const modelingManager = new AdvancedDocumentModelingManager(client, {
    database: 'document_modeling_demo',
    enablePerformanceMonitoring: true,
    enableQueryAnalytics: true
  });

  try {
    // Sample user data for demonstration
    const sampleUserData = {
      email: '[email protected]',
      username: 'johndoe123',
      firstName: 'John',
      lastName: 'Doe',
      phoneNumber: '+1-555-0123',
      dateOfBirth: new Date('1990-05-15'),

      addresses: [
        {
          type: 'home',
          streetAddress: '123 Main Street',
          apartmentUnit: 'Apt 4B',
          city: 'New York',
          stateProvince: 'NY',
          postalCode: '10001',
          country: 'USA',
          isPrimary: true,
          isShipping: true
        },
        {
          type: 'work',
          streetAddress: '456 Corporate Blvd',
          city: 'New York',
          stateProvince: 'NY',
          postalCode: '10002',
          country: 'USA',
          isBilling: true
        }
      ],

      preferences: [
        {
          category: 'notifications',
          key: 'email_frequency',
          value: 'daily',
          dataType: 'string'
        },
        {
          category: 'display',
          key: 'theme',
          value: 'dark',
          dataType: 'string'
        }
      ]
    };

    // Demonstrate embedded document pattern
    console.log('Creating embedded user profile...');
    const embeddedResult = await modelingManager.createEmbeddedUserProfile(sampleUserData);
    console.log('Embedded Result:', embeddedResult);

    // Demonstrate referenced pattern
    console.log('Creating referenced user profile...');
    const referencedResult = await modelingManager.createReferencedUserProfile(sampleUserData);
    console.log('Referenced Result:', referencedResult);

    // Demonstrate query performance differences
    console.log('Comparing query performance...');

    const embeddedQuery = await modelingManager.getUserProfileEmbedded(embeddedResult.userId);
    console.log('Embedded Query Result:', {
      pattern: embeddedQuery.modelingPattern,
      queries: embeddedQuery.queriesExecuted,
      optimized: embeddedQuery.performanceOptimized
    });

    const referencedQuery = await modelingManager.getUserProfileReferenced(referencedResult.userId);
    console.log('Referenced Query Result:', {
      pattern: referencedQuery.modelingPattern,
      queries: referencedQuery.queriesExecuted,
      optimized: referencedQuery.performanceOptimized
    });

    // Demonstrate update operations
    console.log('Comparing update operations...');

    const addressId = embeddedQuery.profile.addresses[0]._id;
    const referencedAddressId = referencedResult.addressIds[0];

    const embeddedUpdate = await modelingManager.updateEmbeddedUserAddress(
      embeddedResult.userId,
      addressId,
      { streetAddress: '789 Updated Street' }
    );
    console.log('Embedded Update:', embeddedUpdate);

    const referencedUpdate = await modelingManager.updateReferencedUserAddress(
      referencedResult.userId,
      referencedAddressId,
      { streetAddress: '789 Updated Street' }
    );
    console.log('Referenced Update:', referencedUpdate);

    // Demonstrate aggregation performance
    console.log('Comparing aggregation performance...');

    const embeddedAggregation = await modelingManager.performComplexAggregation('embedded', {
      match: { status: 'active' },
      groupBy: '$profileMetadata.theme',
      limit: 10
    });

    const referencedAggregation = await modelingManager.performComplexAggregation('referenced', {
      match: { status: 'active' },
      groupBy: '$profileMetadata.theme',
      limit: 10
    });

    console.log('Aggregation Comparison:', {
      embedded: {
        time: embeddedAggregation.executionTime,
        profile: embeddedAggregation.performanceProfile
      },
      referenced: {
        time: referencedAggregation.executionTime,
        profile: referencedAggregation.performanceProfile
      }
    });

    // Get performance analysis
    const performanceAnalysis = await modelingManager.getPerformanceAnalysis();
    console.log('Performance Analysis:', performanceAnalysis);

    return {
      embeddedResult,
      referencedResult,
      queryComparison: {
        embedded: embeddedQuery,
        referenced: referencedQuery
      },
      updateComparison: {
        embedded: embeddedUpdate,
        referenced: referencedUpdate
      },
      aggregationComparison: {
        embedded: embeddedAggregation,
        referenced: referencedAggregation
      },
      performanceAnalysis
    };

  } catch (error) {
    console.error('Error demonstrating document modeling patterns:', error);
    throw error;
  } finally {
    await modelingManager.cleanup();
    await client.close();
  }
}

// Benefits of MongoDB Flexible Document Modeling:
// - Embedded documents provide optimal query performance for frequently co-accessed data
// - Reference patterns enable normalized data structures and independent entity management
// - Hybrid patterns optimize for mixed access patterns and complex application requirements
// - Flexible schema evolution accommodates changing business requirements without migrations
// - Query optimization strategies can be tailored to specific data access patterns
// - Atomic operations available for embedded documents ensure data consistency
// - Application-level joins provide flexibility while maintaining performance where needed
// - Document size management enables balanced approaches between embedding and referencing

module.exports = {
  AdvancedDocumentModelingManager,
  demonstrateDocumentModelingPatterns
};

SQL-Style Document Modeling with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB document relationship management and modeling pattern optimization:

-- QueryLeaf document modeling with SQL-familiar embedded and reference pattern syntax

-- Configure document modeling optimization settings
SET enable_embedded_optimization = true;
SET enable_reference_optimization = true;
SET enable_hybrid_modeling = true;
SET document_size_monitoring = true;
SET query_pattern_analysis = true;
SET performance_monitoring = true;

-- Create embedded document pattern for frequently co-accessed data
WITH embedded_user_profiles AS (
  INSERT INTO user_profiles_embedded
  SELECT 
    GENERATE_UUID() as user_id,
    'user' || generate_series(1, 1000) || '@example.com' as email,
    'user' || generate_series(1, 1000) as username,
    (ARRAY['John', 'Jane', 'Mike', 'Sarah', 'David'])[1 + floor(random() * 5)] as first_name,
    (ARRAY['Smith', 'Johnson', 'Williams', 'Brown', 'Jones'])[1 + floor(random() * 5)] as last_name,
    '+1-555-' || LPAD(floor(random() * 10000)::text, 4, '0') as phone_number,
    CURRENT_DATE - (random() * 365 * 30 + 18 * 365)::int as date_of_birth,
    'active' as status,

    -- Embedded addresses array for optimal co-access
    JSON_BUILD_ARRAY(
      JSON_BUILD_OBJECT(
        '_id', GENERATE_UUID(),
        'type', 'home',
        'streetAddress', floor(random() * 9999 + 1) || ' ' || 
          (ARRAY['Main St', 'Oak Ave', 'First St', 'Second Ave', 'Third St'])[1 + floor(random() * 5)],
        'apartmentUnit', CASE WHEN random() > 0.6 THEN 'Apt ' || (1 + floor(random() * 50))::text ELSE NULL END,
        'city', (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[1 + floor(random() * 5)],
        'stateProvince', (ARRAY['NY', 'CA', 'IL', 'TX', 'AZ'])[1 + floor(random() * 5)],
        'postalCode', LPAD(floor(random() * 100000)::text, 5, '0'),
        'country', 'USA',
        'isPrimary', true,
        'isBilling', random() > 0.5,
        'isShipping', random() > 0.3,
        'createdAt', CURRENT_TIMESTAMP,
        'updatedAt', CURRENT_TIMESTAMP
      ),
      -- Additional address if random condition met
      CASE WHEN random() > 0.7 THEN
        JSON_BUILD_OBJECT(
          '_id', GENERATE_UUID(),
          'type', 'work',
          'streetAddress', floor(random() * 999 + 100) || ' Business Blvd',
          'city', (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[1 + floor(random() * 5)],
          'stateProvince', (ARRAY['NY', 'CA', 'IL', 'TX', 'AZ'])[1 + floor(random() * 5)],
          'postalCode', LPAD(floor(random() * 100000)::text, 5, '0'),
          'country', 'USA',
          'isPrimary', false,
          'isBilling', true,
          'isShipping', false,
          'createdAt', CURRENT_TIMESTAMP,
          'updatedAt', CURRENT_TIMESTAMP
        )
      ELSE NULL END
    ) FILTER (WHERE JSON_BUILD_OBJECT IS NOT NULL) as addresses,

    -- Embedded preferences for atomic updates
    JSON_BUILD_ARRAY(
      JSON_BUILD_OBJECT(
        '_id', GENERATE_UUID(),
        'category', 'notifications',
        'key', 'email_frequency', 
        'value', (ARRAY['immediate', 'daily', 'weekly', 'never'])[1 + floor(random() * 4)],
        'dataType', 'string',
        'createdAt', CURRENT_TIMESTAMP,
        'updatedAt', CURRENT_TIMESTAMP
      ),
      JSON_BUILD_OBJECT(
        '_id', GENERATE_UUID(),
        'category', 'display',
        'key', 'theme',
        'value', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
        'dataType', 'string',
        'createdAt', CURRENT_TIMESTAMP,
        'updatedAt', CURRENT_TIMESTAMP
      ),
      JSON_BUILD_OBJECT(
        '_id', GENERATE_UUID(),
        'category', 'privacy',
        'key', 'profile_visibility',
        'value', (ARRAY['public', 'friends', 'private'])[1 + floor(random() * 3)],
        'dataType', 'string',
        'createdAt', CURRENT_TIMESTAMP,
        'updatedAt', CURRENT_TIMESTAMP
      )
    ) as preferences,

    -- Embedded profile metadata for single-query access
    JSON_BUILD_OBJECT(
      'theme', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
      'language', (ARRAY['en', 'es', 'fr', 'de'])[1 + floor(random() * 4)],
      'timezone', (ARRAY['UTC', 'EST', 'PST', 'CST', 'MST'])[1 + floor(random() * 5)],
      'notificationSettings', JSON_BUILD_OBJECT(
        'email', random() > 0.2,
        'push', random() > 0.3,
        'sms', random() > 0.8
      ),
      'privacySettings', JSON_BUILD_OBJECT(
        'profileVisibility', (ARRAY['public', 'friends', 'private'])[1 + floor(random() * 3)],
        'allowDirectMessages', random() > 0.1,
        'shareActivityStatus', random() > 0.4
      )
    ) as profile_metadata,

    -- Embedded activity summary for performance
    JSON_BUILD_OBJECT(
      'totalLogins', floor(random() * 100),
      'lastLoginAt', CURRENT_TIMESTAMP - (random() * INTERVAL '30 days'),
      'lastActivityAt', CURRENT_TIMESTAMP - (random() * INTERVAL '7 days'),
      'accountCreatedAt', CURRENT_TIMESTAMP - (random() * 365 + 30) * INTERVAL '1 day',
      'profileCompletionScore', 70 + floor(random() * 30) -- 70-100%
    ) as activity_summary,

    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP as updated_at,
    1 as version
  RETURNING user_id, email, username
),

-- Create normalized reference pattern for independent entity management  
users_referenced AS (
  INSERT INTO users_referenced
  SELECT 
    GENERATE_UUID() as user_id,
    'ref_user' || generate_series(1, 1000) || '@example.com' as email,
    'ref_user' || generate_series(1, 1000) as username,
    (ARRAY['Alice', 'Bob', 'Carol', 'David', 'Eve'])[1 + floor(random() * 5)] as first_name,
    (ARRAY['Wilson', 'Davis', 'Miller', 'Moore', 'Taylor'])[1 + floor(random() * 5)] as last_name,
    '+1-555-' || LPAD(floor(random() * 10000)::text, 4, '0') as phone_number,
    CURRENT_DATE - (random() * 365 * 30 + 18 * 365)::int as date_of_birth,
    'active' as status,

    -- Basic profile metadata only (normalized approach)
    JSON_BUILD_OBJECT(
      'theme', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
      'language', (ARRAY['en', 'es', 'fr', 'de'])[1 + floor(random() * 4)],
      'timezone', (ARRAY['UTC', 'EST', 'PST', 'CST', 'MST'])[1 + floor(random() * 5)]
    ) as profile_metadata,

    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP as updated_at,
    1 as version
  RETURNING user_id, email, username
),

-- Create separate referenced address documents
user_addresses_referenced AS (
  INSERT INTO user_addresses_referenced
  SELECT 
    GENERATE_UUID() as address_id,
    ur.user_id,

    -- Address type and details
    (ARRAY['home', 'work', 'billing', 'shipping'])[1 + floor(random() * 4)] as type,
    floor(random() * 9999 + 1) || ' ' || 
      (ARRAY['Broadway', 'Park Ave', 'Wall St', 'Madison Ave', 'Fifth Ave'])[1 + floor(random() * 5)] as street_address,
    CASE WHEN random() > 0.7 THEN 'Unit ' || (1 + floor(random() * 100))::text ELSE NULL END as apartment_unit,
    (ARRAY['Boston', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas'])[1 + floor(random() * 5)] as city,
    (ARRAY['MA', 'PA', 'TX', 'CA', 'TX'])[1 + floor(random() * 5)] as state_province,
    LPAD(floor(random() * 100000)::text, 5, '0') as postal_code,
    'USA' as country,

    -- Address flags
    row_number() OVER (PARTITION BY ur.user_id) = 1 as is_primary, -- First address is primary
    random() > 0.6 as is_billing,
    random() > 0.4 as is_shipping,

    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP as updated_at

  FROM users_referenced ur
  CROSS JOIN generate_series(1, 1 + floor(random() * 2)::int) -- 1-3 addresses per user
  RETURNING address_id, user_id, type
),

-- Create referenced user activities for independent tracking
user_activities_referenced AS (
  INSERT INTO user_activities_referenced  
  SELECT 
    GENERATE_UUID() as activity_id,
    ur.user_id,

    -- Activity classification
    (ARRAY['login', 'logout', 'page_view', 'action_performed', 'data_modified', 'error_occurred'])
      [1 + floor(random() * 6)] as activity_type,
    CURRENT_TIMESTAMP - (random() * INTERVAL '90 days') as activity_timestamp,

    -- Activity details
    JSON_BUILD_OBJECT(
      'page', (ARRAY['/dashboard', '/profile', '/settings', '/reports', '/help'])[1 + floor(random() * 5)],
      'action', (ARRAY['click', 'view', 'edit', 'save', 'delete'])[1 + floor(random() * 5)],
      'duration', floor(random() * 300 + 5), -- 5-305 seconds
      'userAgent', 'Mozilla/5.0 (Enterprise Browser)',
      'ipAddress', '192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254))
    ) as activity_data,

    (ARRAY['web', 'mobile', 'api', 'system'])[1 + floor(random() * 4)] as activity_source,
    ('192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254)))::inet as ip_address,
    'Mozilla/5.0 (compatible; Enterprise App)' as user_agent,

    -- Session and tracking
    'session_' || floor(random() * 10000) as session_id,
    'https://app.example.com' || (ARRAY['/dashboard', '/profile', '/settings'])[1 + floor(random() * 3)] as page_url,

    -- Performance tracking  
    floor(random() * 500 + 50) as response_time_ms,
    random() > 0.95 as error_occurred, -- 5% error rate
    CASE WHEN random() > 0.95 THEN
      JSON_BUILD_OBJECT('error', 'timeout', 'code', '500', 'message', 'Request timeout')
    ELSE NULL END as error_details

  FROM users_referenced ur
  CROSS JOIN generate_series(1, floor(random() * 50 + 10)::int) -- 10-60 activities per user
  RETURNING activity_id, user_id, activity_type, activity_timestamp
)

-- Query performance comparison between embedded and referenced patterns
SELECT 
  'EMBEDDED_PATTERN' as modeling_approach,
  'Single document query for complete profile' as query_description,
  1 as queries_required,
  'Optimal - all data co-located' as performance_profile,
  'Guaranteed - single document ACID' as consistency_model,
  'Atomic updates possible' as update_characteristics,
  'Potential 16MB limit concern' as scalability_considerations

UNION ALL

SELECT 
  'REFERENCED_PATTERN' as modeling_approach,
  'Multiple queries required for complete profile' as query_description,
  3 as queries_required,
  'Moderate - requires joins/lookups' as performance_profile,
  'Eventual - across multiple documents' as consistency_model,
  'Independent entity updates' as update_characteristics,
  'Unlimited growth potential' as scalability_considerations;

-- Demonstrate embedded document queries (single collection access)
WITH embedded_query_patterns AS (
  -- Single query retrieves complete user profile with all related data
  SELECT 
    user_id,
    email,
    first_name,
    last_name,

    -- Extract embedded address information
    JSON_ARRAY_LENGTH(addresses) as total_addresses,
    JSON_EXTRACT_PATH_TEXT(addresses, '0', 'city') as primary_city,
    JSON_EXTRACT_PATH_TEXT(addresses, '0', 'stateProvince') as primary_state,

    -- Extract embedded preferences
    JSON_ARRAY_LENGTH(preferences) as total_preferences,

    -- Extract activity summary (embedded for performance)
    CAST(JSON_EXTRACT_PATH_TEXT(activity_summary, 'totalLogins') AS INTEGER) as total_logins,
    TO_TIMESTAMP(JSON_EXTRACT_PATH_TEXT(activity_summary, 'lastLoginAt'), 'YYYY-MM-DD"T"HH24:MI:SS.MS"Z"') as last_login,
    CAST(JSON_EXTRACT_PATH_TEXT(activity_summary, 'profileCompletionScore') AS INTEGER) as completion_score,

    -- Performance metrics
    1 as documents_accessed,
    0 as join_operations_required,
    'immediate' as consistency_guarantee,

    -- Query classification
    'embedded_single_document' as query_pattern,
    'optimal_performance' as performance_classification

  FROM user_profiles_embedded
  WHERE status = 'active'
  AND JSON_EXTRACT_PATH_TEXT(profile_metadata, 'theme') = 'dark'
  LIMIT 100
),

-- Demonstrate referenced pattern queries (multiple collection access required)
referenced_query_patterns AS (
  -- Multiple queries required to reconstruct complete user profile
  SELECT 
    u.user_id,
    u.email,
    u.first_name,
    u.last_name,

    -- Address information requires separate query/join
    COUNT(addr.address_id) as total_addresses,
    addr_primary.city as primary_city,
    addr_primary.state_province as primary_state,

    -- Activity summary requires aggregation from separate collection
    COUNT(act.activity_id) as total_activities,
    MAX(act.activity_timestamp) FILTER (WHERE act.activity_type = 'login') as last_login,
    COUNT(act.activity_id) FILTER (WHERE act.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days') as recent_activities,

    -- Performance metrics
    3 as documents_accessed, -- Users + Addresses + Activities
    2 as join_operations_required,
    'eventual' as consistency_guarantee,

    -- Query classification
    'referenced_multi_document' as query_pattern,
    'moderate_performance' as performance_classification

  FROM users_referenced u
  LEFT JOIN user_addresses_referenced addr ON u.user_id = addr.user_id
  LEFT JOIN user_addresses_referenced addr_primary ON u.user_id = addr_primary.user_id AND addr_primary.is_primary = true
  LEFT JOIN user_activities_referenced act ON u.user_id = act.user_id

  WHERE u.status = 'active'
  AND JSON_EXTRACT_PATH_TEXT(u.profile_metadata, 'theme') = 'dark'

  GROUP BY u.user_id, u.email, u.first_name, u.last_name, addr_primary.city, addr_primary.state_province
  LIMIT 100
),

-- Performance analysis and comparison
modeling_performance_analysis AS (
  SELECT 
    query_pattern,
    performance_classification,
    AVG(documents_accessed) as avg_documents_per_query,
    AVG(join_operations_required) as avg_joins_per_query,
    COUNT(*) as total_queries_analyzed,

    -- Performance scoring
    CASE 
      WHEN AVG(documents_accessed) = 1 AND AVG(join_operations_required) = 0 THEN 'excellent'
      WHEN AVG(documents_accessed) <= 3 AND AVG(join_operations_required) <= 2 THEN 'good'
      WHEN AVG(documents_accessed) <= 5 AND AVG(join_operations_required) <= 4 THEN 'acceptable'
      ELSE 'needs_optimization'
    END as overall_performance_rating,

    -- Consistency analysis
    MODE() WITHIN GROUP (ORDER BY consistency_guarantee) as primary_consistency_model,

    -- Scalability assessment
    CASE 
      WHEN query_pattern = 'embedded_single_document' THEN 'Limited by 16MB document size'
      WHEN query_pattern = 'referenced_multi_document' THEN 'Unlimited horizontal scaling'
      ELSE 'Hybrid scaling characteristics'
    END as scalability_profile

  FROM (
    SELECT * FROM embedded_query_patterns
    UNION ALL
    SELECT * FROM referenced_query_patterns
  ) combined_patterns
  GROUP BY query_pattern, performance_classification
),

-- Document modeling recommendations based on query patterns
modeling_recommendations AS (
  SELECT 
    mpa.query_pattern,
    mpa.overall_performance_rating,
    mpa.scalability_profile,
    mpa.primary_consistency_model,

    -- Use case recommendations
    CASE 
      WHEN mpa.query_pattern = 'embedded_single_document' THEN
        JSON_BUILD_ARRAY(
          'Optimal for frequently co-accessed related data',
          'Best for read-heavy workloads with complete document queries',
          'Ideal for maintaining ACID guarantees across related entities',
          'Suitable for moderate data growth with stable relationships'
        )
      WHEN mpa.query_pattern = 'referenced_multi_document' THEN
        JSON_BUILD_ARRAY(
          'Best for large datasets with independent entity management',
          'Optimal for write-heavy workloads with frequent partial updates',
          'Ideal for applications requiring flexible schema evolution',
          'Suitable for unlimited horizontal scaling requirements'
        )
      ELSE
        JSON_BUILD_ARRAY(
          'Consider hybrid approach for mixed access patterns',
          'Evaluate specific query requirements for optimization',
          'Balance performance and scalability based on use case'
        )
    END as use_case_recommendations,

    -- Performance optimization strategies
    CASE mpa.overall_performance_rating
      WHEN 'excellent' THEN 'Continue current approach with monitoring'
      WHEN 'good' THEN 'Minor optimizations possible through indexing'
      WHEN 'acceptable' THEN 'Consider query pattern optimization or hybrid approach'
      ELSE 'Significant architectural changes recommended'
    END as optimization_strategy,

    -- Specific implementation guidance
    JSON_BUILD_OBJECT(
      'indexing_strategy', 
        CASE 
          WHEN mpa.query_pattern = 'embedded_single_document' THEN 'Compound indexes on embedded fields'
          ELSE 'Reference field optimization with lookup performance'
        END,
      'consistency_approach',
        CASE mpa.primary_consistency_model
          WHEN 'immediate' THEN 'Single document transactions available'
          ELSE 'Application-level consistency management required'
        END,
      'scaling_considerations',
        CASE 
          WHEN mpa.scalability_profile LIKE '%16MB%' THEN 'Monitor document sizes and consider archival strategies'
          ELSE 'Plan for horizontal scaling and sharding strategies'
        END
    ) as implementation_guidance

  FROM modeling_performance_analysis mpa
)

-- Comprehensive document modeling strategy dashboard
SELECT 
  mr.query_pattern,
  mr.overall_performance_rating,
  mr.primary_consistency_model,
  mr.optimization_strategy,

  -- Performance characteristics
  mpa.avg_documents_per_query as avg_docs_per_query,
  mpa.avg_joins_per_query as avg_joins_required,
  mpa.total_queries_analyzed,

  -- Architectural guidance
  mr.use_case_recommendations,
  mr.implementation_guidance,

  -- Decision matrix
  CASE 
    WHEN mr.query_pattern = 'embedded_single_document' AND mr.overall_performance_rating = 'excellent' THEN
      'RECOMMENDED: Use embedded documents for this use case'
    WHEN mr.query_pattern = 'referenced_multi_document' AND mr.scalability_profile LIKE '%Unlimited%' THEN
      'RECOMMENDED: Use referenced pattern for scalability requirements'
    ELSE
      'EVALUATE: Consider hybrid approach or further analysis'
  END as architectural_recommendation,

  -- Implementation priorities
  JSON_BUILD_OBJECT(
    'immediate_actions', 
      CASE mr.overall_performance_rating
        WHEN 'needs_optimization' THEN JSON_BUILD_ARRAY('Review query patterns', 'Optimize indexing', 'Consider architectural changes')
        WHEN 'acceptable' THEN JSON_BUILD_ARRAY('Monitor performance trends', 'Optimize critical queries')
        ELSE JSON_BUILD_ARRAY('Continue monitoring', 'Plan for growth')
      END,
    'monitoring_focus',
      CASE 
        WHEN mr.query_pattern = 'embedded_single_document' THEN 'Document size growth and query performance'
        ELSE 'Join performance and data consistency'
      END,
    'success_metrics',
      JSON_BUILD_OBJECT(
        'performance_target', CASE mr.overall_performance_rating WHEN 'excellent' THEN 'maintain' ELSE 'improve' END,
        'consistency_requirement', mr.primary_consistency_model,
        'scalability_readiness', 
          CASE WHEN mr.scalability_profile LIKE '%Unlimited%' THEN 'high' ELSE 'moderate' END
      )
  ) as implementation_roadmap

FROM modeling_recommendations mr
JOIN modeling_performance_analysis mpa ON mr.query_pattern = mpa.query_pattern
ORDER BY 
  CASE mr.overall_performance_rating
    WHEN 'excellent' THEN 1
    WHEN 'good' THEN 2
    WHEN 'acceptable' THEN 3
    ELSE 4
  END,
  mpa.avg_documents_per_query ASC;

-- QueryLeaf provides comprehensive MongoDB document modeling capabilities:
-- 1. Embedded document patterns for optimal query performance and data locality
-- 2. Referenced patterns for normalized structures and independent entity scaling
-- 3. Hybrid modeling strategies combining embedding and referencing for complex requirements
-- 4. Performance analysis and optimization recommendations based on query patterns
-- 5. SQL-familiar syntax for document relationship management and pattern selection
-- 6. Comprehensive modeling analytics with performance profiling and scalability assessment
-- 7. Automated recommendations for optimal modeling patterns based on access requirements
-- 8. Enterprise-grade consistency and performance monitoring for production deployments
-- 9. Flexible schema evolution support with minimal application impact
-- 10. Advanced query optimization techniques tailored to document modeling patterns

Best Practices for MongoDB Document Modeling Implementation

Strategic Modeling Decisions

Essential practices for making optimal embedded vs referenced modeling decisions:

  1. Query Pattern Analysis: Design document structure based on actual application query patterns and data access requirements
  2. Data Growth Assessment: Evaluate data growth patterns to prevent document size issues with embedded patterns
  3. Update Frequency Analysis: Consider update patterns when deciding between atomic embedded updates and independent referenced updates
  4. Consistency Requirements: Choose modeling patterns based on consistency requirements and transaction scope needs
  5. Performance Baseline Establishment: Establish performance baselines for different modeling approaches with realistic data volumes
  6. Scalability Planning: Design modeling strategies that accommodate expected growth in data volume and query complexity

Production Optimization and Management

Optimize document modeling for enterprise-scale applications:

  1. Index Strategy Optimization: Design indexes that support both embedded field queries and reference lookups efficiently
  2. Document Size Monitoring: Implement monitoring for document sizes to prevent 16MB limit issues with embedded patterns
  3. Query Performance Analysis: Continuously analyze query performance across different modeling patterns for optimization opportunities
  4. Migration Planning: Plan for potential modeling pattern changes as application requirements evolve
  5. Consistency Management: Implement appropriate consistency management strategies for referenced patterns
  6. Monitoring and Alerting: Establish comprehensive monitoring for performance, consistency, and scalability metrics

Conclusion

MongoDB's flexible document modeling provides powerful options for optimizing data relationships through embedded documents, references, or hybrid approaches. The choice between embedding and referencing depends on specific query patterns, consistency requirements, scalability needs, and performance objectives. Understanding these tradeoffs enables architects to design optimal data models that balance performance, scalability, and maintainability.

Key MongoDB Document Modeling benefits include:

  • Performance Optimization: Choose modeling patterns that optimize for specific query patterns and data access requirements
  • Flexible Relationships: Model relationships using the approach that best fits application needs rather than rigid normalization rules
  • ACID Guarantees: Leverage single-document ACID properties for embedded patterns or manage consistency for referenced patterns
  • Scalability Options: Scale using approaches appropriate to data growth patterns and access requirements
  • Schema Evolution: Evolve document structures as requirements change without expensive migration procedures
  • SQL Accessibility: Manage document relationships using familiar SQL-style syntax and optimization techniques

Whether you're building user management systems, content platforms, e-commerce applications, or analytics systems, MongoDB's document modeling flexibility with QueryLeaf's familiar SQL interface provides the foundation for scalable, performant, and maintainable data architectures.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB document relationships while providing SQL-familiar syntax for embedded and referenced pattern management. Advanced modeling strategies, performance analysis, and optimization recommendations are seamlessly accessible through familiar SQL constructs, making sophisticated document relationship management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's flexible document modeling with SQL-style relationship management makes it an ideal platform for applications requiring both optimal query performance and familiar operational patterns, ensuring your data architecture can adapt to changing requirements while maintaining performance excellence and development productivity.

MongoDB TTL Collections and Automatic Data Lifecycle Management: Intelligent Data Expiration and Cleanup for Scalable Applications

Modern applications generate massive amounts of transient data that requires intelligent lifecycle management to prevent storage bloat, maintain system performance, and comply with data retention policies. Traditional database systems require complex scheduled procedures, manual cleanup scripts, or application-level logic to manage data expiration, leading to inefficient resource utilization, inconsistent cleanup processes, and maintenance overhead that scales poorly with data volume.

MongoDB's TTL (Time-To-Live) collections provide native automatic document expiration capabilities that enable applications to define sophisticated data lifecycle policies at the database level. Unlike traditional approaches that require external orchestration or application logic, MongoDB TTL indexes automatically remove expired documents based on configurable time-based rules, ensuring consistent data management without performance impact or maintenance complexity.

Traditional Data Cleanup Challenges

Conventional approaches to data lifecycle management face significant operational and performance limitations:

-- Traditional PostgreSQL data cleanup approach (complex and resource-intensive)

-- Example: Managing session data with manual cleanup procedures
CREATE TABLE user_sessions (
  session_id VARCHAR(128) PRIMARY KEY,
  user_id BIGINT NOT NULL,
  session_data JSONB,
  ip_address INET,
  user_agent TEXT,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  last_accessed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  expires_at TIMESTAMP WITH TIME ZONE NOT NULL
);

CREATE INDEX idx_sessions_expires_at ON user_sessions(expires_at);
CREATE INDEX idx_sessions_last_accessed ON user_sessions(last_accessed_at);

-- Manual cleanup procedure (requires scheduled execution)
CREATE OR REPLACE FUNCTION cleanup_expired_sessions()
RETURNS INTEGER AS $$
DECLARE
  deleted_count INTEGER;
BEGIN
  -- Delete expired sessions in batches to avoid long locks
  WITH expired_sessions AS (
    SELECT session_id 
    FROM user_sessions 
    WHERE expires_at < NOW()
    LIMIT 10000  -- Batch processing to prevent lock contention
  )
  DELETE FROM user_sessions 
  WHERE session_id IN (SELECT session_id FROM expired_sessions);

  GET DIAGNOSTICS deleted_count = ROW_COUNT;

  RAISE NOTICE 'Deleted % expired sessions', deleted_count;
  RETURN deleted_count;
END;
$$ LANGUAGE plpgsql;

-- Schedule cleanup job (requires external cron or scheduler)
-- This must be configured outside the database:
-- 0 */6 * * * psql -d myapp -c "SELECT cleanup_expired_sessions();"

-- Problems with manual cleanup approach:
-- 1. Requires external scheduling and monitoring systems
-- 2. Batch processing creates inconsistent cleanup timing
-- 3. Resource-intensive operations during cleanup windows
-- 4. Risk of cleanup failure without proper monitoring
-- 5. Complex coordination across multiple tables and relationships
-- 6. Difficult to optimize cleanup performance vs. application performance

-- Example: Log data cleanup with cascading complexity
CREATE TABLE application_logs (
  log_id BIGSERIAL PRIMARY KEY,
  application_id INTEGER NOT NULL,
  severity_level VARCHAR(10) NOT NULL,
  message TEXT NOT NULL,
  metadata JSONB,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),

  -- Manual retention policy management
  retention_category VARCHAR(20) DEFAULT 'standard', -- 'critical', 'standard', 'debug'
  retention_expires_at TIMESTAMP WITH TIME ZONE
);

-- Complex trigger for setting retention dates
CREATE OR REPLACE FUNCTION set_log_retention_date()
RETURNS TRIGGER AS $$
BEGIN
  NEW.retention_expires_at := CASE NEW.retention_category
    WHEN 'critical' THEN NEW.created_at + INTERVAL '2 years'
    WHEN 'standard' THEN NEW.created_at + INTERVAL '6 months'
    WHEN 'debug' THEN NEW.created_at + INTERVAL '7 days'
    ELSE NEW.created_at + INTERVAL '30 days'
  END;

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_set_log_retention 
  BEFORE INSERT OR UPDATE ON application_logs
  FOR EACH ROW 
  EXECUTE FUNCTION set_log_retention_date();

-- Complex cleanup with retention policy handling
CREATE OR REPLACE FUNCTION cleanup_application_logs()
RETURNS TABLE(retention_category VARCHAR, deleted_count BIGINT) AS $$
DECLARE
  category VARCHAR;
  del_count BIGINT;
BEGIN
  -- Process each retention category separately
  FOR category IN SELECT DISTINCT l.retention_category FROM application_logs l LOOP
    WITH expired_logs AS (
      SELECT log_id
      FROM application_logs
      WHERE retention_category = category 
        AND retention_expires_at < NOW()
      LIMIT 50000  -- Large batch size for logs
    )
    DELETE FROM application_logs 
    WHERE log_id IN (SELECT log_id FROM expired_logs);

    GET DIAGNOSTICS del_count = ROW_COUNT;

    RETURN QUERY SELECT category, del_count;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Traditional approach limitations:
-- 1. Complex stored procedure logic for different retention policies
-- 2. Risk of cleanup procedures failing and accumulating stale data
-- 3. Performance impact during cleanup operations
-- 4. Difficult to test and validate cleanup logic
-- 5. Manual coordination required for related table cleanup
-- 6. No atomic cleanup guarantees across related documents
-- 7. Resource contention between cleanup and application operations

-- Example: MySQL cleanup with limited capabilities
CREATE TABLE mysql_cache_entries (
  cache_key VARCHAR(255) PRIMARY KEY,
  cache_value LONGTEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP NOT NULL,

  INDEX idx_expires_at (expires_at)
);

-- MySQL cleanup event (requires event scheduler enabled)
DELIMITER ;;
CREATE EVENT cleanup_cache_entries
ON SCHEDULE EVERY 1 HOUR
DO
BEGIN
  -- Simple cleanup with limited error handling
  DELETE FROM mysql_cache_entries 
  WHERE expires_at < NOW();
END;;
DELIMITER ;

-- MySQL limitations:
-- 1. Basic event scheduler with limited scheduling options
-- 2. No sophisticated batch processing or resource management
-- 3. Limited error handling and monitoring capabilities
-- 4. Event scheduler can be disabled accidentally
-- 5. No built-in support for complex retention policies
-- 6. Cleanup operations can block other database operations
-- 7. No automatic optimization for cleanup performance

-- Oracle approach with job scheduling
CREATE TABLE oracle_temp_data (
  temp_id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  data_content CLOB,
  created_date TIMESTAMP DEFAULT SYSTIMESTAMP,
  expiry_date TIMESTAMP NOT NULL
);

-- Oracle job for cleanup (complex setup required)
BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'CLEANUP_TEMP_DATA_JOB',
    job_type        => 'PLSQL_BLOCK',
    job_action      => 'BEGIN
                          DELETE FROM oracle_temp_data 
                          WHERE expiry_date < SYSTIMESTAMP;
                          COMMIT;
                        END;',
    start_date      => SYSTIMESTAMP,
    repeat_interval => 'FREQ=HOURLY; INTERVAL=2',
    enabled         => TRUE
  );
END;

-- Oracle complexity issues:
-- 1. Requires DBMS_SCHEDULER privileges and configuration
-- 2. Job management complexity and monitoring requirements  
-- 3. Manual transaction management in cleanup procedures
-- 4. Complex scheduling syntax and limited flexibility
-- 5. Jobs can fail silently without proper monitoring
-- 6. Resource management and performance tuning required
-- 7. Expensive licensing for advanced job scheduling features

MongoDB TTL collections provide effortless automatic data lifecycle management:

// MongoDB TTL Collections - native automatic document expiration and lifecycle management

const { MongoClient } = require('mongodb');

// Comprehensive MongoDB TTL and Data Lifecycle Management System
class MongoDBTTLManager {
  constructor(db) {
    this.db = db;
    this.ttlCollections = new Map();
    this.lifecyclePolicies = new Map();
    this.expirationStats = {
      documentsExpired: 0,
      storageReclaimed: 0,
      lastCleanupRun: null
    };
    this.ttlIndexSpecs = new Map();
  }

  // Create collection with automatic TTL expiration
  async createTTLCollection(collectionName, ttlConfig) {
    console.log(`Creating TTL collection: ${collectionName}`);

    const {
      ttlField = 'expiresAt',
      expireAfterSeconds = null,
      indexOnCreatedAt = false,
      additionalIndexes = [],
      validationSchema = null
    } = ttlConfig;

    try {
      // Create collection with optional validation
      const collectionOptions = {};
      if (validationSchema) {
        collectionOptions.validator = validationSchema;
        collectionOptions.validationLevel = 'strict';
      }

      await this.db.createCollection(collectionName, collectionOptions);
      const collection = this.db.collection(collectionName);

      // Create TTL index for automatic expiration
      if (expireAfterSeconds !== null) {
        // TTL index with expireAfterSeconds for automatic cleanup
        await collection.createIndex(
          { [ttlField]: 1 },
          { 
            expireAfterSeconds: expireAfterSeconds,
            background: true,
            name: `ttl_${ttlField}_${expireAfterSeconds}`
          }
        );

        console.log(`Created TTL index on ${ttlField} with expiration: ${expireAfterSeconds} seconds`);
      } else {
        // TTL index on Date field for document-specific expiration
        await collection.createIndex(
          { [ttlField]: 1 },
          {
            expireAfterSeconds: 0, // Documents expire based on date value
            background: true,
            name: `ttl_${ttlField}_document_specific`
          }
        );

        console.log(`Created document-specific TTL index on ${ttlField}`);
      }

      // Optional index on created timestamp for queries
      if (indexOnCreatedAt) {
        await collection.createIndex(
          { createdAt: 1 },
          { background: true, name: 'idx_created_at' }
        );
      }

      // Create additional indexes as specified
      for (const indexSpec of additionalIndexes) {
        await collection.createIndex(
          indexSpec.fields,
          { background: true, ...indexSpec.options }
        );
      }

      // Store TTL configuration for reference
      this.ttlCollections.set(collectionName, {
        ttlField: ttlField,
        expireAfterSeconds: expireAfterSeconds,
        collection: collection,
        createdAt: new Date(),
        config: ttlConfig
      });

      this.ttlIndexSpecs.set(collectionName, {
        field: ttlField,
        expireAfterSeconds: expireAfterSeconds,
        indexName: expireAfterSeconds !== null ? 
          `ttl_${ttlField}_${expireAfterSeconds}` : 
          `ttl_${ttlField}_document_specific`
      });

      console.log(`TTL collection ${collectionName} created successfully`);
      return collection;

    } catch (error) {
      console.error(`Failed to create TTL collection ${collectionName}:`, error);
      throw error;
    }
  }

  // Session management with automatic cleanup
  async createSessionsCollection() {
    console.log('Creating user sessions collection with TTL');

    const sessionValidation = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['sessionId', 'userId', 'createdAt', 'expiresAt'],
        properties: {
          sessionId: {
            bsonType: 'string',
            minLength: 32,
            maxLength: 128,
            description: 'Unique session identifier'
          },
          userId: {
            bsonType: 'objectId',
            description: 'Reference to user document'
          },
          sessionData: {
            bsonType: ['object', 'null'],
            description: 'Session-specific data'
          },
          ipAddress: {
            bsonType: ['string', 'null'],
            pattern: '^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$|^(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$',
            description: 'Client IP address'
          },
          userAgent: {
            bsonType: ['string', 'null'],
            maxLength: 500,
            description: 'Client user agent'
          },
          createdAt: {
            bsonType: 'date',
            description: 'Session creation timestamp'
          },
          lastAccessedAt: {
            bsonType: 'date', 
            description: 'Last session access timestamp'
          },
          expiresAt: {
            bsonType: 'date',
            description: 'Session expiration timestamp for TTL'
          },
          isActive: {
            bsonType: 'bool',
            description: 'Session active status'
          }
        }
      }
    };

    await this.createTTLCollection('userSessions', {
      ttlField: 'expiresAt',
      expireAfterSeconds: 0, // Document-specific expiration
      indexOnCreatedAt: true,
      validationSchema: sessionValidation,
      additionalIndexes: [
        {
          fields: { sessionId: 1 },
          options: { unique: true, name: 'idx_session_id' }
        },
        {
          fields: { userId: 1, isActive: 1 },
          options: { name: 'idx_user_active_sessions' }
        },
        {
          fields: { lastAccessedAt: -1 },
          options: { name: 'idx_last_accessed' }
        }
      ]
    });

    return this.db.collection('userSessions');
  }

  // Create optimized logging collection with retention policies
  async createLoggingCollection() {
    console.log('Creating application logs collection with TTL');

    const logValidation = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['level', 'message', 'timestamp', 'source'],
        properties: {
          level: {
            enum: ['debug', 'info', 'warn', 'error', 'fatal'],
            description: 'Log severity level'
          },
          message: {
            bsonType: 'string',
            maxLength: 10000,
            description: 'Log message content'
          },
          source: {
            bsonType: 'string',
            maxLength: 100,
            description: 'Log source component'
          },
          timestamp: {
            bsonType: 'date',
            description: 'Log entry timestamp'
          },
          metadata: {
            bsonType: ['object', 'null'],
            description: 'Additional log metadata'
          },
          userId: {
            bsonType: ['objectId', 'null'],
            description: 'Associated user ID if applicable'
          },
          requestId: {
            bsonType: ['string', 'null'],
            description: 'Request correlation ID'
          },
          retentionCategory: {
            enum: ['debug', 'standard', 'audit', 'critical'],
            description: 'Data retention classification'
          },
          tags: {
            bsonType: ['array', 'null'],
            items: { bsonType: 'string' },
            description: 'Searchable tags'
          }
        }
      }
    };

    // Create multiple collections for different retention periods
    const retentionPolicies = [
      { category: 'debug', days: 7 },
      { category: 'standard', days: 90 },
      { category: 'audit', days: 365 },
      { category: 'critical', days: 2555 } // 7 years
    ];

    for (const policy of retentionPolicies) {
      const collectionName = `applicationLogs_${policy.category}`;

      await this.createTTLCollection(collectionName, {
        ttlField: 'timestamp',
        expireAfterSeconds: policy.days * 24 * 60 * 60, // Convert days to seconds
        indexOnCreatedAt: false,
        validationSchema: logValidation,
        additionalIndexes: [
          {
            fields: { level: 1, timestamp: -1 },
            options: { name: 'idx_level_timestamp' }
          },
          {
            fields: { source: 1, timestamp: -1 },
            options: { name: 'idx_source_timestamp' }
          },
          {
            fields: { userId: 1, timestamp: -1 },
            options: { name: 'idx_user_timestamp', sparse: true }
          },
          {
            fields: { requestId: 1 },
            options: { name: 'idx_request_id', sparse: true }
          },
          {
            fields: { tags: 1 },
            options: { name: 'idx_tags', sparse: true }
          }
        ]
      });
    }

    this.lifecyclePolicies.set('applicationLogs', retentionPolicies);
    return retentionPolicies.map(p => ({ 
      category: p.category, 
      collection: this.db.collection(`applicationLogs_${p.category}`) 
    }));
  }

  // Cache collection with flexible TTL
  async createCacheCollection() {
    console.log('Creating cache collection with TTL');

    const cacheValidation = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['key', 'value', 'createdAt'],
        properties: {
          key: {
            bsonType: 'string',
            maxLength: 500,
            description: 'Cache key identifier'
          },
          value: {
            description: 'Cached data value (any type)'
          },
          createdAt: {
            bsonType: 'date',
            description: 'Cache entry creation time'
          },
          expiresAt: {
            bsonType: ['date', 'null'],
            description: 'Optional specific expiration time'
          },
          namespace: {
            bsonType: ['string', 'null'],
            maxLength: 100,
            description: 'Cache namespace for organization'
          },
          tags: {
            bsonType: ['array', 'null'],
            items: { bsonType: 'string' },
            description: 'Cache entry tags'
          },
          size: {
            bsonType: ['int', 'null'],
            minimum: 0,
            description: 'Cached data size in bytes'
          },
          hitCount: {
            bsonType: 'int',
            minimum: 0,
            description: 'Number of times cache entry was accessed'
          },
          lastAccessedAt: {
            bsonType: 'date',
            description: 'Last access timestamp'
          }
        }
      }
    };

    await this.createTTLCollection('cache', {
      ttlField: 'createdAt',
      expireAfterSeconds: 3600, // Default 1 hour expiration
      indexOnCreatedAt: false,
      validationSchema: cacheValidation,
      additionalIndexes: [
        {
          fields: { key: 1 },
          options: { unique: true, name: 'idx_cache_key' }
        },
        {
          fields: { namespace: 1, createdAt: -1 },
          options: { name: 'idx_namespace_created', sparse: true }
        },
        {
          fields: { tags: 1 },
          options: { name: 'idx_cache_tags', sparse: true }
        },
        {
          fields: { lastAccessedAt: -1 },
          options: { name: 'idx_last_accessed' }
        },
        {
          fields: { expiresAt: 1 },
          options: { 
            name: 'ttl_expires_at_custom',
            expireAfterSeconds: 0, // Custom expiration times
            sparse: true 
          }
        }
      ]
    });

    return this.db.collection('cache');
  }

  // Temporary data collection for processing workflows
  async createTempDataCollection() {
    console.log('Creating temporary data collection with short TTL');

    const tempDataValidation = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['workflowId', 'data', 'createdAt', 'status'],
        properties: {
          workflowId: {
            bsonType: 'string',
            maxLength: 100,
            description: 'Workflow identifier'
          },
          stepId: {
            bsonType: ['string', 'null'],
            maxLength: 100,
            description: 'Workflow step identifier'
          },
          data: {
            description: 'Temporary processing data'
          },
          status: {
            enum: ['pending', 'processing', 'completed', 'failed'],
            description: 'Processing status'
          },
          createdAt: {
            bsonType: 'date',
            description: 'Creation timestamp'
          },
          processedAt: {
            bsonType: ['date', 'null'],
            description: 'Processing completion timestamp'
          },
          priority: {
            bsonType: 'int',
            minimum: 1,
            maximum: 10,
            description: 'Processing priority level'
          },
          retryCount: {
            bsonType: 'int',
            minimum: 0,
            description: 'Number of retry attempts'
          },
          errorMessage: {
            bsonType: ['string', 'null'],
            maxLength: 1000,
            description: 'Error message if processing failed'
          }
        }
      }
    };

    await this.createTTLCollection('tempProcessingData', {
      ttlField: 'createdAt',
      expireAfterSeconds: 86400, // 24 hours
      indexOnCreatedAt: false,
      validationSchema: tempDataValidation,
      additionalIndexes: [
        {
          fields: { workflowId: 1, status: 1 },
          options: { name: 'idx_workflow_status' }
        },
        {
          fields: { status: 1, priority: -1, createdAt: 1 },
          options: { name: 'idx_processing_queue' }
        },
        {
          fields: { stepId: 1 },
          options: { name: 'idx_step_id', sparse: true }
        }
      ]
    });

    return this.db.collection('tempProcessingData');
  }

  // Insert documents with intelligent expiration management
  async insertWithTTL(collectionName, documents, ttlOptions = {}) {
    console.log(`Inserting ${Array.isArray(documents) ? documents.length : 1} documents with TTL into ${collectionName}`);

    const collection = this.db.collection(collectionName);
    const ttlConfig = this.ttlCollections.get(collectionName);

    if (!ttlConfig) {
      throw new Error(`Collection ${collectionName} is not configured for TTL`);
    }

    const documentsToInsert = Array.isArray(documents) ? documents : [documents];
    const currentTime = new Date();

    // Process each document to set appropriate expiration
    const processedDocuments = documentsToInsert.map(doc => {
      const processedDoc = { ...doc };

      // Set creation timestamp if not present
      if (!processedDoc.createdAt) {
        processedDoc.createdAt = currentTime;
      }

      // Handle TTL field based on collection configuration
      if (ttlConfig.expireAfterSeconds === 0) {
        // Document-specific expiration - use provided or calculate
        if (!processedDoc[ttlConfig.ttlField]) {
          const customTTL = ttlOptions.customExpireAfterSeconds || 
                           ttlOptions.expireAfterSeconds ||
                           3600; // Default 1 hour

          processedDoc[ttlConfig.ttlField] = new Date(
            currentTime.getTime() + (customTTL * 1000)
          );
        }
      } else {
        // Fixed expiration - set TTL field to current time for consistent expiration
        if (!processedDoc[ttlConfig.ttlField]) {
          processedDoc[ttlConfig.ttlField] = currentTime;
        }
      }

      // Add metadata for tracking
      processedDoc._ttl_configured = true;
      processedDoc._ttl_field = ttlConfig.ttlField;
      processedDoc._ttl_policy = ttlConfig.expireAfterSeconds === 0 ? 'document_specific' : 'collection_fixed';

      return processedDoc;
    });

    try {
      const result = Array.isArray(documents) ? 
        await collection.insertMany(processedDocuments) :
        await collection.insertOne(processedDocuments[0]);

      console.log(`Successfully inserted documents with TTL configuration`);
      return result;

    } catch (error) {
      console.error(`Failed to insert documents with TTL:`, error);
      throw error;
    }
  }

  // Update TTL configuration for existing collections
  async modifyTTLExpiration(collectionName, newExpireAfterSeconds) {
    console.log(`Modifying TTL expiration for ${collectionName} to ${newExpireAfterSeconds} seconds`);

    const ttlConfig = this.ttlCollections.get(collectionName);
    if (!ttlConfig) {
      throw new Error(`Collection ${collectionName} is not configured for TTL`);
    }

    const indexSpec = this.ttlIndexSpecs.get(collectionName);

    try {
      // Use collMod command to change TTL expiration
      await this.db.runCommand({
        collMod: collectionName,
        index: {
          keyPattern: { [ttlConfig.ttlField]: 1 },
          expireAfterSeconds: newExpireAfterSeconds
        }
      });

      // Update our tracking
      ttlConfig.expireAfterSeconds = newExpireAfterSeconds;
      indexSpec.expireAfterSeconds = newExpireAfterSeconds;

      console.log(`TTL expiration updated successfully for ${collectionName}`);
      return { success: true, newExpiration: newExpireAfterSeconds };

    } catch (error) {
      console.error(`Failed to modify TTL expiration:`, error);
      throw error;
    }
  }

  // Monitor TTL collection statistics and performance
  async getTTLStatistics() {
    console.log('Gathering TTL collection statistics...');

    const statistics = {
      collections: new Map(),
      summary: {
        totalCollections: this.ttlCollections.size,
        totalDocuments: 0,
        estimatedStorageSize: 0,
        oldestDocument: null,
        newestDocument: null
      }
    };

    for (const [collectionName, config] of this.ttlCollections) {
      try {
        const collection = config.collection;
        const stats = await collection.stats();

        // Get sample documents for age analysis
        const oldestDoc = await collection.findOne(
          {},
          { sort: { [config.ttlField]: 1 } }
        );

        const newestDoc = await collection.findOne(
          {},
          { sort: { [config.ttlField]: -1 } }
        );

        // Calculate expiration statistics
        const now = new Date();
        let documentsExpiringSoon = 0;

        if (config.expireAfterSeconds === 0) {
          // Document-specific expiration
          documentsExpiringSoon = await collection.countDocuments({
            [config.ttlField]: {
              $lte: new Date(now.getTime() + (3600 * 1000)) // Next hour
            }
          });
        } else {
          // Fixed expiration
          const cutoffTime = new Date(now.getTime() - (config.expireAfterSeconds - 3600) * 1000);
          documentsExpiringSoon = await collection.countDocuments({
            [config.ttlField]: { $lte: cutoffTime }
          });
        }

        const collectionStats = {
          name: collectionName,
          documentCount: stats.count,
          storageSize: stats.storageSize,
          indexSize: stats.totalIndexSize,
          averageDocumentSize: stats.avgObjSize,
          ttlField: config.ttlField,
          expireAfterSeconds: config.expireAfterSeconds,
          expirationPolicy: config.expireAfterSeconds === 0 ? 'document_specific' : 'collection_fixed',
          oldestDocument: oldestDoc ? oldestDoc[config.ttlField] : null,
          newestDocument: newestDoc ? newestDoc[config.ttlField] : null,
          documentsExpiringSoon: documentsExpiringSoon,
          createdAt: config.createdAt
        };

        statistics.collections.set(collectionName, collectionStats);
        statistics.summary.totalDocuments += stats.count;
        statistics.summary.estimatedStorageSize += stats.storageSize;

        if (!statistics.summary.oldestDocument || 
            (oldestDoc && oldestDoc[config.ttlField] < statistics.summary.oldestDocument)) {
          statistics.summary.oldestDocument = oldestDoc ? oldestDoc[config.ttlField] : null;
        }

        if (!statistics.summary.newestDocument || 
            (newestDoc && newestDoc[config.ttlField] > statistics.summary.newestDocument)) {
          statistics.summary.newestDocument = newestDoc ? newestDoc[config.ttlField] : null;
        }

      } catch (error) {
        console.warn(`Could not gather statistics for ${collectionName}:`, error.message);
      }
    }

    return statistics;
  }

  // Advanced TTL management and monitoring
  async performTTLHealthCheck() {
    console.log('Performing comprehensive TTL health check...');

    const healthCheck = {
      status: 'healthy',
      issues: [],
      recommendations: [],
      collections: new Map(),
      summary: {
        totalCollections: this.ttlCollections.size,
        healthyCollections: 0,
        collectionsWithIssues: 0,
        totalDocuments: 0
      }
    };

    for (const [collectionName, config] of this.ttlCollections) {
      const collectionHealth = {
        name: collectionName,
        status: 'healthy',
        issues: [],
        recommendations: [],
        metrics: {}
      };

      try {
        const collection = config.collection;
        const stats = await collection.stats();

        // Check for orphaned documents (shouldn't exist with proper TTL)
        const now = new Date();
        let expiredDocuments = 0;

        if (config.expireAfterSeconds === 0) {
          expiredDocuments = await collection.countDocuments({
            [config.ttlField]: { $lte: new Date(now.getTime() - 300000) } // 5 minutes ago
          });
        } else {
          const expiredCutoff = new Date(now.getTime() - config.expireAfterSeconds * 1000 - 300000);
          expiredDocuments = await collection.countDocuments({
            [config.ttlField]: { $lte: expiredCutoff }
          });
        }

        collectionHealth.metrics = {
          documentCount: stats.count,
          storageSize: stats.storageSize,
          indexCount: stats.nindexes,
          expiredDocuments: expiredDocuments
        };

        // Analyze potential issues
        if (expiredDocuments > 0) {
          collectionHealth.status = 'warning';
          collectionHealth.issues.push(`Found ${expiredDocuments} documents that should have expired`);
          collectionHealth.recommendations.push('Monitor TTL background task performance');
        }

        if (stats.count > 1000000) {
          collectionHealth.recommendations.push('Consider partitioning strategy for large collections');
        }

        if (stats.storageSize > 1073741824) { // 1GB
          collectionHealth.recommendations.push('Monitor storage usage and consider shorter retention periods');
        }

        // Check index health
        const indexes = await collection.indexes();
        const ttlIndex = indexes.find(idx => 
          idx.expireAfterSeconds !== undefined && 
          Object.keys(idx.key).includes(config.ttlField)
        );

        if (!ttlIndex) {
          collectionHealth.status = 'error';
          collectionHealth.issues.push('TTL index missing or misconfigured');
        }

        healthCheck.collections.set(collectionName, collectionHealth);
        healthCheck.summary.totalDocuments += stats.count;

        if (collectionHealth.status === 'healthy') {
          healthCheck.summary.healthyCollections++;
        } else {
          healthCheck.summary.collectionsWithIssues++;
          if (collectionHealth.status === 'error') {
            healthCheck.status = 'error';
          } else if (healthCheck.status === 'healthy') {
            healthCheck.status = 'warning';
          }
        }

      } catch (error) {
        collectionHealth.status = 'error';
        collectionHealth.issues.push(`Health check failed: ${error.message}`);
        healthCheck.status = 'error';
        healthCheck.summary.collectionsWithIssues++;
      }
    }

    // Generate overall recommendations
    if (healthCheck.summary.collectionsWithIssues > 0) {
      healthCheck.recommendations.push('Review collections with issues and optimize TTL configurations');
    }

    if (healthCheck.summary.totalDocuments > 10000000) {
      healthCheck.recommendations.push('Consider implementing data archiving strategy for historical data');
    }

    console.log(`TTL health check completed: ${healthCheck.status}`);
    return healthCheck;
  }

  // Get comprehensive TTL management report
  async generateTTLReport() {
    console.log('Generating comprehensive TTL management report...');

    const [statistics, healthCheck] = await Promise.all([
      this.getTTLStatistics(),
      this.performTTLHealthCheck()
    ]);

    const report = {
      generatedAt: new Date(),
      overview: {
        totalCollections: statistics.summary.totalCollections,
        totalDocuments: statistics.summary.totalDocuments,
        totalStorageSize: statistics.summary.estimatedStorageSize,
        healthStatus: healthCheck.status
      },
      collections: [],
      recommendations: healthCheck.recommendations,
      issues: healthCheck.issues
    };

    // Combine statistics and health data
    for (const [collectionName, stats] of statistics.collections) {
      const health = healthCheck.collections.get(collectionName);

      report.collections.push({
        name: collectionName,
        documentCount: stats.documentCount,
        storageSize: stats.storageSize,
        ttlConfiguration: {
          field: stats.ttlField,
          expireAfterSeconds: stats.expireAfterSeconds,
          policy: stats.expirationPolicy
        },
        dataAge: {
          oldest: stats.oldestDocument,
          newest: stats.newestDocument
        },
        expiration: {
          documentsExpiringSoon: stats.documentsExpiringSoon
        },
        health: {
          status: health?.status || 'unknown',
          issues: health?.issues || [],
          recommendations: health?.recommendations || []
        }
      });
    }

    console.log('TTL management report generated successfully');
    return report;
  }
}

// Example usage demonstrating comprehensive TTL management
async function demonstrateTTLOperations() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();
  const db = client.db('ttl_management_demo');

  const ttlManager = new MongoDBTTLManager(db);

  // Create various TTL collections
  await ttlManager.createSessionsCollection();
  await ttlManager.createLoggingCollection();
  await ttlManager.createCacheCollection();
  await ttlManager.createTempDataCollection();

  // Insert sample data with TTL
  await ttlManager.insertWithTTL('userSessions', [
    {
      sessionId: 'sess_' + Math.random().toString(36).substr(2, 16),
      userId: new ObjectId(),
      sessionData: { preferences: { theme: 'dark' } },
      ipAddress: '192.168.1.100',
      userAgent: 'Mozilla/5.0...',
      lastAccessedAt: new Date(),
      isActive: true
    }
  ]);

  // Insert cache entries with custom expiration
  await ttlManager.insertWithTTL('cache', [
    {
      key: 'user_profile_12345',
      value: { name: 'John Doe', email: '[email protected]' },
      namespace: 'user_profiles',
      tags: ['profile', 'active'],
      size: 256,
      hitCount: 0,
      lastAccessedAt: new Date()
    }
  ], { customExpireAfterSeconds: 7200 }); // 2 hours

  // Generate comprehensive report
  const report = await ttlManager.generateTTLReport();
  console.log('TTL Management Report:', JSON.stringify(report, null, 2));

  await client.close();
}

Advanced TTL Patterns and Enterprise Management

Sophisticated TTL Strategies for Production Systems

Implement enterprise-grade TTL management with advanced patterns and monitoring:

// Enterprise MongoDB TTL Management with Advanced Patterns and Monitoring
class EnterpriseTTLManager extends MongoDBTTLManager {
  constructor(db, enterpriseConfig = {}) {
    super(db);

    this.enterpriseConfig = {
      enableMetrics: enterpriseConfig.enableMetrics || true,
      enableAlerting: enterpriseConfig.enableAlerting || true,
      metricsCollection: enterpriseConfig.metricsCollection || 'ttl_metrics',
      alertThresholds: {
        expiredDocumentThreshold: 1000,
        storageSizeThreshold: 5368709120, // 5GB
        healthCheckFailureThreshold: 3
      },
      ...enterpriseConfig
    };

    this.metricsHistory = [];
    this.alertHistory = [];
    this.setupEnterpriseFeatures();
  }

  async setupEnterpriseFeatures() {
    if (this.enterpriseConfig.enableMetrics) {
      await this.createMetricsCollection();
      this.startMetricsCollection();
    }
  }

  async createMetricsCollection() {
    try {
      await this.db.createCollection(this.enterpriseConfig.metricsCollection, {
        validator: {
          $jsonSchema: {
            bsonType: 'object',
            required: ['timestamp', 'collectionName', 'metrics'],
            properties: {
              timestamp: { bsonType: 'date' },
              collectionName: { bsonType: 'string' },
              metrics: {
                bsonType: 'object',
                properties: {
                  documentCount: { bsonType: 'int' },
                  storageSize: { bsonType: 'long' },
                  expiredDocuments: { bsonType: 'int' },
                  documentsExpiringSoon: { bsonType: 'int' }
                }
              }
            }
          }
        }
      });

      // TTL for metrics (keep for 30 days)
      await this.db.collection(this.enterpriseConfig.metricsCollection).createIndex(
        { timestamp: 1 },
        { expireAfterSeconds: 2592000 } // 30 days
      );

      console.log('Enterprise TTL metrics collection created');
    } catch (error) {
      console.warn('Could not create metrics collection:', error.message);
    }
  }

  async createHierarchicalTTLCollection(collectionName, ttlHierarchy) {
    console.log(`Creating hierarchical TTL collection: ${collectionName}`);

    // TTL hierarchy example: { debug: 7*24*3600, info: 30*24*3600, error: 365*24*3600 }
    const baseValidator = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['level', 'data', 'timestamp'],
        properties: {
          level: {
            enum: Object.keys(ttlHierarchy),
            description: 'Data classification level'
          },
          data: { description: 'Document data' },
          timestamp: { bsonType: 'date' },
          customExpiration: {
            bsonType: ['date', 'null'],
            description: 'Override expiration time'
          }
        }
      }
    };

    await this.db.createCollection(collectionName, { validator: baseValidator });
    const collection = this.db.collection(collectionName);

    // Create multiple TTL indexes for different levels
    for (const [level, expireSeconds] of Object.entries(ttlHierarchy)) {
      await collection.createIndex(
        { level: 1, timestamp: 1 },
        {
          expireAfterSeconds: expireSeconds,
          partialFilterExpression: { level: level },
          name: `ttl_${level}_${expireSeconds}`,
          background: true
        }
      );
    }

    // Additional TTL index for custom expiration
    await collection.createIndex(
      { customExpiration: 1 },
      {
        expireAfterSeconds: 0,
        sparse: true,
        name: 'ttl_custom_expiration'
      }
    );

    return collection;
  }

  async createConditionalTTLCollection(collectionName, conditionalRules) {
    console.log(`Creating conditional TTL collection: ${collectionName}`);

    // Conditional rules example:
    // [
    //   { condition: { status: 'completed' }, expireAfterSeconds: 86400 },
    //   { condition: { status: 'failed' }, expireAfterSeconds: 604800 },
    //   { condition: { priority: 'high' }, expireAfterSeconds: 2592000 }
    // ]

    await this.db.createCollection(collectionName);
    const collection = this.db.collection(collectionName);

    // Create conditional TTL indexes
    for (const [index, rule] of conditionalRules.entries()) {
      await collection.createIndex(
        { createdAt: 1, ...rule.condition },
        {
          expireAfterSeconds: rule.expireAfterSeconds,
          partialFilterExpression: rule.condition,
          name: `ttl_conditional_${index}`,
          background: true
        }
      );
    }

    return collection;
  }

  startMetricsCollection() {
    if (!this.enterpriseConfig.enableMetrics) return;

    // Collect metrics every 5 minutes
    setInterval(async () => {
      try {
        await this.collectAndStoreMetrics();
      } catch (error) {
        console.error('Failed to collect TTL metrics:', error);
      }
    }, 300000); // 5 minutes

    console.log('TTL metrics collection started');
  }

  async collectAndStoreMetrics() {
    const metricsCollection = this.db.collection(this.enterpriseConfig.metricsCollection);
    const timestamp = new Date();

    for (const [collectionName, config] of this.ttlCollections) {
      try {
        const collection = config.collection;
        const stats = await collection.stats();

        // Calculate expired documents
        const now = new Date();
        let expiredDocuments = 0;
        let documentsExpiringSoon = 0;

        if (config.expireAfterSeconds === 0) {
          expiredDocuments = await collection.countDocuments({
            [config.ttlField]: { $lte: new Date(now.getTime() - 300000) }
          });
          documentsExpiringSoon = await collection.countDocuments({
            [config.ttlField]: {
              $lte: new Date(now.getTime() + 3600000),
              $gt: now
            }
          });
        } else {
          const expiredCutoff = new Date(now.getTime() - config.expireAfterSeconds * 1000 - 300000);
          expiredDocuments = await collection.countDocuments({
            [config.ttlField]: { $lte: expiredCutoff }
          });
        }

        const metrics = {
          timestamp: timestamp,
          collectionName: collectionName,
          metrics: {
            documentCount: stats.count,
            storageSize: stats.storageSize,
            indexSize: stats.totalIndexSize,
            expiredDocuments: expiredDocuments,
            documentsExpiringSoon: documentsExpiringSoon
          }
        };

        await metricsCollection.insertOne(metrics);

        // Check for alert conditions
        await this.checkAlertConditions(collectionName, metrics.metrics);

      } catch (error) {
        console.error(`Failed to collect metrics for ${collectionName}:`, error);
      }
    }
  }

  async checkAlertConditions(collectionName, metrics) {
    const alerts = [];
    const thresholds = this.enterpriseConfig.alertThresholds;

    if (metrics.expiredDocuments > thresholds.expiredDocumentThreshold) {
      alerts.push({
        severity: 'warning',
        message: `Collection ${collectionName} has ${metrics.expiredDocuments} expired documents`,
        metric: 'expired_documents',
        value: metrics.expiredDocuments,
        threshold: thresholds.expiredDocumentThreshold
      });
    }

    if (metrics.storageSize > thresholds.storageSizeThreshold) {
      alerts.push({
        severity: 'warning',
        message: `Collection ${collectionName} storage size ${metrics.storageSize} exceeds threshold`,
        metric: 'storage_size',
        value: metrics.storageSize,
        threshold: thresholds.storageSizeThreshold
      });
    }

    if (alerts.length > 0 && this.enterpriseConfig.enableAlerting) {
      await this.processAlerts(collectionName, alerts);
    }
  }

  async processAlerts(collectionName, alerts) {
    for (const alert of alerts) {
      console.warn(`TTL Alert - ${alert.severity.toUpperCase()}: ${alert.message}`);

      this.alertHistory.push({
        timestamp: new Date(),
        collectionName: collectionName,
        alert: alert
      });
    }
  }
}

QueryLeaf TTL Integration

QueryLeaf provides familiar SQL syntax for MongoDB TTL collections and automatic data lifecycle management:

-- QueryLeaf TTL collections with SQL-familiar syntax for automatic data expiration

-- Create table with automatic expiration (QueryLeaf converts to TTL collection)
CREATE TABLE user_sessions (
  session_id VARCHAR(128) PRIMARY KEY,
  user_id ObjectId NOT NULL,
  session_data JSONB,
  ip_address INET,
  user_agent TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP NOT NULL,
  is_active BOOLEAN DEFAULT true
) WITH (
  ttl_field = 'expires_at',
  expire_after_seconds = 0  -- Document-specific expiration
);

-- QueryLeaf automatically creates TTL index:
-- db.user_sessions.createIndex({ expires_at: 1 }, { expireAfterSeconds: 0 })

-- Create cache table with fixed expiration
CREATE TABLE cache_entries (
  cache_key VARCHAR(500) UNIQUE NOT NULL,
  cache_value JSONB NOT NULL,
  namespace VARCHAR(100),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  tags TEXT[],
  hit_count INT DEFAULT 0,
  last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) WITH (
  ttl_field = 'created_at',
  expire_after_seconds = 3600  -- 1 hour fixed expiration
);

-- Application logs with retention categories
CREATE TABLE application_logs_debug (
  level VARCHAR(10) NOT NULL CHECK (level IN ('debug', 'info', 'warn', 'error', 'fatal')),
  message TEXT NOT NULL,
  source VARCHAR(100) NOT NULL,
  timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  metadata JSONB,
  user_id ObjectId,
  request_id VARCHAR(50),
  tags TEXT[]
) WITH (
  ttl_field = 'timestamp',
  expire_after_seconds = 604800  -- 7 days for debug logs
);

CREATE TABLE application_logs_standard (
  level VARCHAR(10) NOT NULL,
  message TEXT NOT NULL,
  source VARCHAR(100) NOT NULL,
  timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  metadata JSONB,
  user_id ObjectId,
  request_id VARCHAR(50),
  tags TEXT[]
) WITH (
  ttl_field = 'timestamp',
  expire_after_seconds = 7776000  -- 90 days for standard logs
);

-- Insert data with automatic expiration handling
INSERT INTO user_sessions (
  session_id, user_id, session_data, ip_address, user_agent, expires_at
) VALUES (
  'sess_abc123def456',
  ObjectId('507f1f77bcf86cd799439011'),
  JSON_OBJECT('theme', 'dark', 'language', 'en-US'),
  '192.168.1.100',
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  CURRENT_TIMESTAMP + INTERVAL '24 hours'
);

-- Insert cache entries (automatic expiration after 1 hour)
INSERT INTO cache_entries (cache_key, cache_value, namespace, tags)
VALUES 
  ('user_profile_12345', 
   JSON_OBJECT('name', 'John Doe', 'email', '[email protected]'),
   'user_profiles', 
   ARRAY['profile', 'active']),
  ('api_response_weather_nyc',
   JSON_OBJECT('temp', 72, 'humidity', 65, 'forecast', 'sunny'), 
   'api_cache',
   ARRAY['weather', 'external_api']);

-- Insert logs with different retention periods
INSERT INTO application_logs_debug (level, message, source, metadata)
VALUES ('debug', 'Processing user request', 'auth_service', 
        JSON_OBJECT('user_id', '12345', 'endpoint', '/api/login'));

INSERT INTO application_logs_standard (level, message, source, request_id)
VALUES ('error', 'Database connection timeout', 'db_service', 'req_789xyz');

-- Query data with expiration awareness
WITH session_analysis AS (
  SELECT 
    session_id,
    user_id,
    created_at,
    last_accessed_at,
    expires_at,
    is_active,

    -- Calculate session duration and time until expiration
    EXTRACT(EPOCH FROM (last_accessed_at - created_at)) as session_duration_seconds,
    EXTRACT(EPOCH FROM (expires_at - CURRENT_TIMESTAMP)) as seconds_until_expiration,

    -- Categorize sessions by expiration status
    CASE 
      WHEN expires_at <= CURRENT_TIMESTAMP THEN 'expired'
      WHEN expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour' THEN 'expiring_soon'
      WHEN expires_at <= CURRENT_TIMESTAMP + INTERVAL '6 hours' THEN 'expiring_later'
      ELSE 'active'
    END as expiration_status,

    -- Session activity assessment
    CASE 
      WHEN last_accessed_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes' THEN 'very_active'
      WHEN last_accessed_at >= CURRENT_TIMESTAMP - INTERVAL '30 minutes' THEN 'active'
      WHEN last_accessed_at >= CURRENT_TIMESTAMP - INTERVAL '2 hours' THEN 'idle'
      ELSE 'inactive'
    END as activity_level

  FROM user_sessions
  WHERE is_active = true
)

SELECT 
  expiration_status,
  activity_level,
  COUNT(*) as session_count,
  AVG(session_duration_seconds / 60) as avg_duration_minutes,
  AVG(seconds_until_expiration / 3600) as avg_hours_until_expiration,

  -- Sessions by activity and expiration
  COUNT(*) FILTER (WHERE activity_level = 'very_active' AND expiration_status = 'active') as active_engaged_sessions,
  COUNT(*) FILTER (WHERE activity_level IN ('idle', 'inactive') AND expiration_status = 'expiring_soon') as idle_expiring_sessions

FROM session_analysis
GROUP BY expiration_status, activity_level
ORDER BY 
  CASE expiration_status 
    WHEN 'expired' THEN 1
    WHEN 'expiring_soon' THEN 2
    WHEN 'expiring_later' THEN 3 
    ELSE 4
  END,
  CASE activity_level
    WHEN 'very_active' THEN 1
    WHEN 'active' THEN 2
    WHEN 'idle' THEN 3
    ELSE 4
  END;

-- Cache performance analysis with TTL awareness
WITH cache_analysis AS (
  SELECT 
    namespace,
    cache_key,
    created_at,
    last_accessed_at,
    hit_count,

    -- Calculate cache metrics
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - created_at)) as cache_age_seconds,
    EXTRACT(EPOCH FROM (last_accessed_at - created_at)) as last_access_age_seconds,

    -- TTL status (for 1-hour expiration)
    CASE 
      WHEN created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN 'should_be_expired'
      WHEN created_at <= CURRENT_TIMESTAMP - INTERVAL '50 minutes' THEN 'expiring_very_soon'
      WHEN created_at <= CURRENT_TIMESTAMP - INTERVAL '45 minutes' THEN 'expiring_soon'
      ELSE 'fresh'
    END as ttl_status,

    -- Cache effectiveness
    CASE 
      WHEN hit_count = 0 THEN 'unused'
      WHEN hit_count = 1 THEN 'single_use'
      WHEN hit_count <= 5 THEN 'low_usage'
      WHEN hit_count <= 20 THEN 'moderate_usage'
      ELSE 'high_usage'
    END as usage_category,

    -- Access pattern analysis
    hit_count / GREATEST(EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - created_at)) / 60, 1) as hits_per_minute

  FROM cache_entries
  WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '2 hours' -- Include recently expired for analysis
)

SELECT 
  namespace,
  ttl_status,
  usage_category,

  COUNT(*) as entry_count,
  AVG(cache_age_seconds / 60) as avg_age_minutes,
  AVG(hit_count) as avg_hit_count,
  AVG(hits_per_minute) as avg_hits_per_minute,

  -- Efficiency metrics
  SUM(hit_count) as total_hits,
  COUNT(*) FILTER (WHERE hit_count = 0) as unused_entries,
  COUNT(*) FILTER (WHERE ttl_status = 'should_be_expired') as potentially_expired_entries,

  -- Cache utilization assessment
  ROUND(
    (COUNT(*) FILTER (WHERE hit_count > 1)::DECIMAL / COUNT(*)) * 100, 2
  ) as utilization_rate_percent,

  -- Performance indicators
  CASE 
    WHEN AVG(hits_per_minute) > 1 AND COUNT(*) FILTER (WHERE hit_count = 0) < COUNT(*) * 0.2 THEN 'excellent'
    WHEN AVG(hits_per_minute) > 0.5 AND COUNT(*) FILTER (WHERE hit_count = 0) < COUNT(*) * 0.4 THEN 'good'
    WHEN AVG(hits_per_minute) > 0.1 THEN 'acceptable'
    ELSE 'poor'
  END as performance_rating

FROM cache_analysis
GROUP BY namespace, ttl_status, usage_category
ORDER BY namespace, 
         CASE ttl_status 
           WHEN 'should_be_expired' THEN 1
           WHEN 'expiring_very_soon' THEN 2
           WHEN 'expiring_soon' THEN 3
           ELSE 4
         END,
         total_hits DESC;

-- Log analysis with retention awareness
WITH log_analysis AS (
  SELECT 
    source,
    level,
    timestamp,

    -- Age calculation
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - timestamp)) / 3600 as log_age_hours,

    -- Retention category based on table (debug vs standard)
    CASE 
      WHEN timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'debug_retention'
      WHEN timestamp >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'standard_retention'
      ELSE 'expired_or_archived'
    END as retention_category,

    -- Time until expiration
    CASE 
      WHEN timestamp <= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 0
      ELSE EXTRACT(EPOCH FROM ((timestamp + INTERVAL '7 days') - CURRENT_TIMESTAMP)) / 3600
    END as hours_until_debug_expiration

  FROM (
    SELECT source, level, timestamp, 'debug' as log_type FROM application_logs_debug
    UNION ALL
    SELECT source, level, timestamp, 'standard' as log_type FROM application_logs_standard
  ) combined_logs
)

SELECT 
  source,
  level,
  retention_category,

  COUNT(*) as log_count,
  AVG(log_age_hours) as avg_age_hours,
  MIN(log_age_hours) as newest_log_age_hours,
  MAX(log_age_hours) as oldest_log_age_hours,

  -- Expiration timeline
  COUNT(*) FILTER (WHERE hours_until_debug_expiration <= 24 AND hours_until_debug_expiration > 0) as expiring_within_24h,
  COUNT(*) FILTER (WHERE hours_until_debug_expiration <= 168 AND hours_until_debug_expiration > 24) as expiring_within_week,

  -- Volume analysis by time periods
  COUNT(*) FILTER (WHERE log_age_hours <= 1) as last_hour_count,
  COUNT(*) FILTER (WHERE log_age_hours <= 24) as last_day_count,
  COUNT(*) FILTER (WHERE log_age_hours <= 168) as last_week_count,

  -- Log level distribution
  ROUND(
    (COUNT(*) FILTER (WHERE level IN ('error', 'fatal'))::DECIMAL / COUNT(*)) * 100, 2
  ) as error_percentage,

  -- Data lifecycle assessment
  CASE 
    WHEN retention_category = 'expired_or_archived' THEN 'cleanup_required'
    WHEN COUNT(*) FILTER (WHERE hours_until_debug_expiration <= 24) > COUNT(*) * 0.5 THEN 'high_turnover'
    WHEN COUNT(*) FILTER (WHERE log_age_hours <= 24) > COUNT(*) * 0.8 THEN 'recent_activity'
    ELSE 'normal_lifecycle'
  END as lifecycle_status

FROM log_analysis
GROUP BY source, level, retention_category
ORDER BY source, level, 
         CASE retention_category
           WHEN 'expired_or_archived' THEN 1
           WHEN 'debug_retention' THEN 2
           ELSE 3
         END;

-- TTL collection management and monitoring
-- Query to monitor TTL collection health and performance
WITH ttl_collection_stats AS (
  SELECT 
    'user_sessions' as collection_name,
    COUNT(*) as document_count,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP) as expired_documents,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour') as expiring_soon,
    MIN(created_at) as oldest_document,
    MAX(created_at) as newest_document,
    AVG(EXTRACT(EPOCH FROM (expires_at - created_at))) as avg_ttl_seconds
  FROM user_sessions

  UNION ALL

  SELECT 
    'cache_entries' as collection_name,
    COUNT(*) as document_count,
    -- For fixed TTL, calculate based on creation time + expiration period
    COUNT(*) FILTER (WHERE created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour') as expired_documents,
    COUNT(*) FILTER (WHERE created_at <= CURRENT_TIMESTAMP - INTERVAL '50 minutes') as expiring_soon,
    MIN(created_at) as oldest_document,
    MAX(created_at) as newest_document,
    3600 as avg_ttl_seconds  -- Fixed 1-hour TTL
  FROM cache_entries
)

SELECT 
  collection_name,
  document_count,
  expired_documents,
  expiring_soon,
  oldest_document,
  newest_document,
  avg_ttl_seconds / 3600 as avg_ttl_hours,

  -- Health indicators
  CASE 
    WHEN expired_documents > document_count * 0.1 THEN 'cleanup_needed'
    WHEN expiring_soon > document_count * 0.3 THEN 'high_turnover'
    WHEN document_count = 0 THEN 'empty'
    ELSE 'healthy'
  END as health_status,

  -- Performance metrics
  ROUND((expired_documents::DECIMAL / GREATEST(document_count, 1)) * 100, 2) as expired_percentage,
  ROUND((expiring_soon::DECIMAL / GREATEST(document_count, 1)) * 100, 2) as expiring_soon_percentage,

  -- Data lifecycle summary
  EXTRACT(EPOCH FROM (newest_document - oldest_document)) / 3600 as data_age_span_hours,

  -- Recommendations
  CASE 
    WHEN expired_documents > 1000 THEN 'Monitor TTL background task performance'
    WHEN expiring_soon > document_count * 0.5 THEN 'Consider adjusting TTL settings'
    WHEN document_count > 1000000 THEN 'Monitor storage usage and performance'
    ELSE 'Collection operating normally'
  END as recommendation

FROM ttl_collection_stats
ORDER BY document_count DESC;

-- QueryLeaf provides comprehensive TTL support:
-- 1. Automatic conversion of TTL table definitions to MongoDB TTL collections
-- 2. Intelligent TTL index creation with optimal expiration strategies
-- 3. Support for both fixed and document-specific expiration patterns
-- 4. Advanced TTL monitoring and performance analysis through familiar SQL queries
-- 5. Integration with MongoDB's native TTL background task optimization
-- 6. Comprehensive data lifecycle management and retention policy enforcement
-- 7. Real-time TTL health monitoring and alerting capabilities
-- 8. Familiar SQL patterns for complex TTL collection management workflows

Best Practices for MongoDB TTL Collections

TTL Strategy Design and Implementation

Essential principles for effective MongoDB TTL implementation:

  1. Expiration Strategy Selection: Choose between document-specific and collection-wide expiration based on use case requirements
  2. Index Optimization: Design TTL indexes to minimize impact on write operations and storage overhead
  3. Background Task Monitoring: Monitor MongoDB's TTL background task performance and adjust configurations as needed
  4. Data Lifecycle Planning: Implement comprehensive data lifecycle policies that align with business and compliance requirements
  5. Performance Considerations: Balance TTL cleanup frequency with application performance and resource utilization
  6. Monitoring and Alerting: Establish comprehensive monitoring for TTL collection health, expiration effectiveness, and storage optimization

Production Deployment and Operations

Optimize TTL collections for enterprise production environments:

  1. Capacity Planning: Design TTL policies to prevent storage bloat while maintaining necessary data availability
  2. Disaster Recovery: Consider TTL implications for backup and recovery strategies
  3. Compliance Integration: Align TTL policies with data retention regulations and audit requirements
  4. Performance Monitoring: Implement detailed monitoring for TTL collection performance and resource impact
  5. Operational Procedures: Establish procedures for TTL policy changes, emergency data retention, and cleanup verification
  6. Integration Testing: Thoroughly test TTL behavior in staging environments before production deployment

Conclusion

MongoDB TTL collections provide powerful native capabilities for automatic data lifecycle management that eliminate the complexity and maintenance overhead of traditional manual cleanup approaches. The intelligent document expiration system enables applications to maintain optimal performance and storage efficiency while ensuring consistent data retention policy enforcement without external dependencies or performance impact.

Key MongoDB TTL Collections benefits include:

  • Native Automation: Built-in document expiration without external scheduling or application logic
  • Flexible Policies: Support for both fixed collection-wide and document-specific expiration strategies
  • Performance Optimization: Efficient background cleanup that minimizes impact on application operations
  • Storage Management: Automatic storage optimization through intelligent document lifecycle management
  • Operational Simplicity: Reduced maintenance overhead compared to manual cleanup procedures
  • SQL Accessibility: Familiar SQL-style TTL management through QueryLeaf for accessible data lifecycle operations

Whether you're building session management systems, caching layers, logging platforms, or temporary data processing workflows, MongoDB TTL collections with QueryLeaf's familiar SQL interface provide the foundation for efficient, automated, and reliable data lifecycle management.

QueryLeaf Integration: QueryLeaf automatically converts SQL table definitions with TTL specifications into optimized MongoDB TTL collections while providing familiar SQL syntax for TTL monitoring, analysis, and management. Advanced TTL patterns, retention policies, and lifecycle management are seamlessly handled through familiar SQL constructs, making sophisticated automatic data expiration both powerful and accessible to SQL-oriented development teams.

The combination of MongoDB's robust TTL capabilities with SQL-style data lifecycle management makes it an ideal platform for applications requiring both automatic data expiration and familiar database operation patterns, ensuring your data management workflows can scale efficiently while maintaining performance and compliance requirements.

MongoDB Document Validation and Schema Enforcement: Data Integrity and Governance for Enterprise Applications

Enterprise applications require robust data integrity mechanisms that ensure consistent data quality, enforce business rules, and maintain compliance standards across complex document structures and evolving application requirements. Traditional relational databases rely heavily on strict schema definitions and constraints, but these rigid approaches often become barriers to agility in modern applications that need to adapt to changing business requirements and diverse data structures.

MongoDB's document validation provides flexible yet powerful schema enforcement capabilities that balance data integrity requirements with the agility benefits of document-oriented storage. Unlike rigid table schemas that require expensive migrations for structural changes, MongoDB validation allows you to define comprehensive validation rules that evolve with your application while maintaining data quality and business rule compliance across your entire dataset.

The Traditional Schema Constraint Challenge

Relational databases enforce data integrity through rigid schema definitions that become increasingly problematic as applications evolve:

-- Traditional PostgreSQL schema with rigid constraints that become maintenance burdens

-- User profile management with complex validation requirements
CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,

    -- Contact information with rigid structure
    phone_number VARCHAR(20),
    address_line1 VARCHAR(255),
    address_line2 VARCHAR(255),
    city VARCHAR(100),
    state_province VARCHAR(100),
    postal_code VARCHAR(20),
    country VARCHAR(3) NOT NULL DEFAULT 'USA',

    -- Profile metadata
    date_of_birth DATE,
    gender VARCHAR(20),
    preferred_language VARCHAR(10) DEFAULT 'en',
    timezone VARCHAR(50) DEFAULT 'UTC',

    -- Account settings
    account_status VARCHAR(20) DEFAULT 'active' CHECK (account_status IN ('active', 'suspended', 'inactive', 'deleted')),
    email_verified BOOLEAN DEFAULT FALSE,
    phone_verified BOOLEAN DEFAULT FALSE,
    two_factor_enabled BOOLEAN DEFAULT FALSE,

    -- Privacy and preferences
    privacy_level VARCHAR(20) DEFAULT 'standard' CHECK (privacy_level IN ('public', 'standard', 'private', 'restricted')),
    marketing_consent BOOLEAN DEFAULT FALSE,
    analytics_consent BOOLEAN DEFAULT TRUE,

    -- Audit fields
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    created_by UUID,
    updated_by UUID,
    version INTEGER DEFAULT 1,

    -- Complex business validation constraints
    CONSTRAINT valid_email CHECK (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$'),
    CONSTRAINT valid_username CHECK (username ~* '^[a-zA-Z0-9_]{3,50}$'),
    CONSTRAINT valid_phone CHECK (phone_number IS NULL OR phone_number ~* '^\+?[1-9]\d{1,14}$'),
    CONSTRAINT valid_postal_code CHECK (postal_code ~* '^[A-Z0-9\s-]{3,12}$'),
    CONSTRAINT valid_gender CHECK (gender IS NULL OR gender IN ('male', 'female', 'non-binary', 'prefer_not_to_say')),
    CONSTRAINT valid_date_of_birth CHECK (date_of_birth IS NULL OR date_of_birth < CURRENT_DATE),
    CONSTRAINT valid_timezone CHECK (timezone ~* '^[A-Za-z_]+/[A-Za-z_]+$' OR timezone = 'UTC'),

    -- Complex interdependent constraints
    CONSTRAINT email_verified_requires_email CHECK (NOT email_verified OR email IS NOT NULL),
    CONSTRAINT phone_verified_requires_phone CHECK (NOT phone_verified OR phone_number IS NOT NULL),
    CONSTRAINT two_factor_requires_verified_contact CHECK (
        NOT two_factor_enabled OR (email_verified = TRUE OR phone_verified = TRUE)
    )
);

-- User preferences with evolving JSON structure that becomes difficult to validate
CREATE TABLE user_preferences (
    preference_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES user_profiles(user_id) ON DELETE CASCADE,
    preference_category VARCHAR(50) NOT NULL,
    preference_data JSONB NOT NULL,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    -- Basic JSON validation (limited capabilities)
    CONSTRAINT valid_preference_data CHECK (jsonb_typeof(preference_data) = 'object'),
    CONSTRAINT valid_category CHECK (preference_category IN (
        'notification_settings', 'display_preferences', 'privacy_settings', 
        'content_preferences', 'accessibility_options', 'integration_settings'
    ))
);

-- Complex stored procedure for comprehensive user data validation
CREATE OR REPLACE FUNCTION validate_user_profile_data(
    p_user_id UUID,
    p_email VARCHAR(255),
    p_username VARCHAR(50),
    p_profile_data JSONB
) RETURNS TABLE (
    is_valid BOOLEAN,
    validation_errors TEXT[],
    warnings TEXT[]
) AS $$
DECLARE
    errors TEXT[] := ARRAY[]::TEXT[];
    warnings TEXT[] := ARRAY[]::TEXT[];
    existing_email_count INTEGER;
    existing_username_count INTEGER;
    profile_completeness_score DECIMAL;

BEGIN
    -- Email validation beyond basic format checking
    IF p_email IS NOT NULL THEN
        -- Check for disposable email domains
        IF p_email ~* '@(tempmail|guerrillamail|10minutemail|mailinator)' THEN
            errors := array_append(errors, 'Disposable email addresses are not allowed');
        END IF;

        -- Check for duplicate email (excluding current user)
        SELECT COUNT(*) INTO existing_email_count
        FROM user_profiles 
        WHERE email = p_email AND (p_user_id IS NULL OR user_id != p_user_id);

        IF existing_email_count > 0 THEN
            errors := array_append(errors, 'Email address already exists');
        END IF;
    END IF;

    -- Username validation with business rules
    IF p_username IS NOT NULL THEN
        -- Check for inappropriate content (simplified)
        IF p_username ~* '(admin|root|system|test|null|undefined)' THEN
            errors := array_append(errors, 'Username contains reserved words');
        END IF;

        -- Check for duplicate username
        SELECT COUNT(*) INTO existing_username_count
        FROM user_profiles 
        WHERE username = p_username AND (p_user_id IS NULL OR user_id != p_user_id);

        IF existing_username_count > 0 THEN
            errors := array_append(errors, 'Username already exists');
        END IF;
    END IF;

    -- Complex profile data validation
    IF p_profile_data IS NOT NULL THEN
        -- Validate notification preferences structure
        IF p_profile_data ? 'notifications' THEN
            IF NOT (p_profile_data->'notifications' ? 'email_frequency' AND
                   p_profile_data->'notifications' ? 'push_enabled' AND
                   p_profile_data->'notifications' ? 'categories') THEN
                errors := array_append(errors, 'Notification preferences missing required fields');
            END IF;

            -- Validate email frequency options
            IF p_profile_data->'notifications'->>'email_frequency' NOT IN ('immediate', 'daily', 'weekly', 'never') THEN
                errors := array_append(errors, 'Invalid email frequency setting');
            END IF;
        END IF;

        -- Validate privacy settings
        IF p_profile_data ? 'privacy' THEN
            IF NOT (p_profile_data->'privacy' ? 'profile_visibility' AND
                   p_profile_data->'privacy' ? 'contact_permissions') THEN
                warnings := array_append(warnings, 'Privacy settings incomplete');
            END IF;
        END IF;

        -- Calculate profile completeness score
        profile_completeness_score := (
            CASE WHEN p_profile_data ? 'avatar_url' THEN 10 ELSE 0 END +
            CASE WHEN p_profile_data ? 'bio' THEN 15 ELSE 0 END +
            CASE WHEN p_profile_data ? 'location' THEN 10 ELSE 0 END +
            CASE WHEN p_profile_data ? 'website' THEN 10 ELSE 0 END +
            CASE WHEN p_profile_data ? 'social_links' THEN 15 ELSE 0 END +
            CASE WHEN p_profile_data ? 'interests' THEN 20 ELSE 0 END +
            CASE WHEN p_profile_data ? 'skills' THEN 20 ELSE 0 END
        );

        IF profile_completeness_score < 50 THEN
            warnings := array_append(warnings, 'Profile completeness below recommended threshold');
        END IF;
    END IF;

    -- Return validation results
    RETURN QUERY SELECT 
        array_length(errors, 1) IS NULL as is_valid,
        errors as validation_errors,
        warnings;
END;
$$ LANGUAGE plpgsql;

-- User social connections with complex validation
CREATE TABLE user_connections (
    connection_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    requester_user_id UUID NOT NULL REFERENCES user_profiles(user_id),
    requested_user_id UUID NOT NULL REFERENCES user_profiles(user_id),
    connection_type VARCHAR(30) NOT NULL,
    connection_status VARCHAR(20) NOT NULL DEFAULT 'pending',
    connection_metadata JSONB,

    requested_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    responded_at TIMESTAMP WITH TIME ZONE,
    expires_at TIMESTAMP WITH TIME ZONE,

    -- Complex validation constraints
    CONSTRAINT no_self_connection CHECK (requester_user_id != requested_user_id),
    CONSTRAINT valid_connection_type CHECK (connection_type IN (
        'friendship', 'professional', 'family', 'acquaintance', 'blocked', 'follow'
    )),
    CONSTRAINT valid_connection_status CHECK (connection_status IN (
        'pending', 'accepted', 'declined', 'blocked', 'expired', 'cancelled'
    )),
    CONSTRAINT valid_response_timing CHECK (
        (connection_status = 'pending' AND responded_at IS NULL) OR
        (connection_status != 'pending' AND responded_at IS NOT NULL)
    ),
    CONSTRAINT valid_expiration CHECK (
        expires_at IS NULL OR expires_at > requested_at
    ),

    -- Unique constraint to prevent duplicate connections
    UNIQUE (requester_user_id, requested_user_id, connection_type)
);

-- Trigger for complex business rule validation
CREATE OR REPLACE FUNCTION validate_connection_business_rules()
RETURNS TRIGGER AS $$
DECLARE
    requester_profile RECORD;
    requested_profile RECORD;
    existing_connection_count INTEGER;
    blocked_connection_exists BOOLEAN := FALSE;

BEGIN
    -- Get user profiles for validation
    SELECT * INTO requester_profile FROM user_profiles WHERE user_id = NEW.requester_user_id;
    SELECT * INTO requested_profile FROM user_profiles WHERE user_id = NEW.requested_user_id;

    -- Validate account status
    IF requester_profile.account_status != 'active' THEN
        RAISE EXCEPTION 'Cannot create connection from inactive account';
    END IF;

    IF requested_profile.account_status NOT IN ('active', 'inactive') THEN
        RAISE EXCEPTION 'Cannot create connection to suspended or deleted account';
    END IF;

    -- Check for existing blocked connections
    SELECT EXISTS(
        SELECT 1 FROM user_connections
        WHERE ((requester_user_id = NEW.requester_user_id AND requested_user_id = NEW.requested_user_id) OR
               (requester_user_id = NEW.requested_user_id AND requested_user_id = NEW.requester_user_id))
        AND connection_status = 'blocked'
    ) INTO blocked_connection_exists;

    IF blocked_connection_exists AND NEW.connection_type != 'blocked' THEN
        RAISE EXCEPTION 'Cannot create connection with blocked user';
    END IF;

    -- Validate connection limits based on type
    IF NEW.connection_type = 'friendship' THEN
        SELECT COUNT(*) INTO existing_connection_count
        FROM user_connections
        WHERE requester_user_id = NEW.requester_user_id 
        AND connection_type = 'friendship' 
        AND connection_status = 'accepted';

        IF existing_connection_count >= 5000 THEN
            RAISE EXCEPTION 'Maximum friendship connections exceeded';
        END IF;
    END IF;

    -- Set automatic expiration for pending requests
    IF NEW.connection_status = 'pending' AND NEW.expires_at IS NULL THEN
        NEW.expires_at := NEW.requested_at + INTERVAL '30 days';
    END IF;

    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER validate_connection_trigger
    BEFORE INSERT OR UPDATE ON user_connections
    FOR EACH ROW EXECUTE FUNCTION validate_connection_business_rules();

-- Problems with traditional schema validation approaches:
-- 1. Rigid schema changes require expensive ALTER TABLE operations affecting entire datasets
-- 2. Complex CHECK constraints become performance bottlenecks with limited expressiveness
-- 3. Limited JSON validation capabilities that cannot enforce nested structure requirements
-- 4. Difficult schema evolution requiring coordinated application and database changes
-- 5. Poor support for optional fields and polymorphic document structures
-- 6. Complex trigger-based validation logic that's difficult to maintain and debug
-- 7. Limited ability to enforce cross-document validation rules and referential constraints
-- 8. Poor integration with modern application frameworks and validation libraries
-- 9. Inflexible validation rules that cannot adapt to different user roles or contexts
-- 10. Expensive validation operations that impact application performance and scalability

-- Traditional validation limitations with JSON data
WITH user_validation_attempts AS (
    SELECT 
        up.user_id,
        up.email,
        up.username,

        -- Manual JSON structure validation (limited capabilities)
        CASE 
            WHEN uprefs.preference_data IS NULL THEN 'missing_preferences'
            WHEN NOT (uprefs.preference_data ? 'notifications') THEN 'missing_notifications'
            WHEN jsonb_typeof(uprefs.preference_data->'notifications') != 'object' THEN 'invalid_notifications_type'
            WHEN NOT (uprefs.preference_data->'notifications' ? 'email_frequency') THEN 'missing_email_frequency'
            ELSE 'valid'
        END as validation_status,

        -- Complex nested validation queries (poor performance)
        CASE 
            WHEN uprefs.preference_data->'notifications'->>'email_frequency' IN ('immediate', 'daily', 'weekly', 'never') 
            THEN TRUE ELSE FALSE 
        END as valid_email_frequency,

        -- Limited validation of array structures
        CASE 
            WHEN jsonb_typeof(uprefs.preference_data->'interests') = 'array' AND
                 jsonb_array_length(uprefs.preference_data->'interests') BETWEEN 1 AND 10
            THEN TRUE ELSE FALSE 
        END as valid_interests_array,

        -- Difficult cross-field validation
        CASE 
            WHEN up.email_verified = TRUE AND 
                 uprefs.preference_data->'notifications'->>'email_frequency' != 'never'
            THEN TRUE ELSE FALSE 
        END as consistent_email_settings

    FROM user_profiles up
    LEFT JOIN user_preferences uprefs ON up.user_id = uprefs.user_id 
    WHERE uprefs.preference_category = 'notification_settings'
),

validation_summary AS (
    SELECT 
        COUNT(*) as total_users,
        COUNT(*) FILTER (WHERE validation_status = 'valid') as valid_users,
        COUNT(*) FILTER (WHERE validation_status != 'valid') as invalid_users,
        COUNT(*) FILTER (WHERE NOT valid_email_frequency) as invalid_email_frequency,
        COUNT(*) FILTER (WHERE NOT valid_interests_array) as invalid_interests,
        COUNT(*) FILTER (WHERE NOT consistent_email_settings) as inconsistent_settings,

        -- Performance impact of manual validation
        EXTRACT(MILLISECONDS FROM (CURRENT_TIMESTAMP - CURRENT_TIMESTAMP)) as validation_time_ms

    FROM user_validation_attempts
)

SELECT 
    vs.total_users,
    vs.valid_users,
    vs.invalid_users,
    ROUND((vs.valid_users::decimal / vs.total_users::decimal) * 100, 2) as validation_success_rate,

    -- Manual validation issues identified
    vs.invalid_email_frequency as email_frequency_violations,
    vs.invalid_interests as interests_structure_violations, 
    vs.inconsistent_settings as cross_field_consistency_violations,

    -- Validation challenges
    'Complex manual validation queries' as primary_challenge,
    'Limited JSON schema enforcement capabilities' as technical_limitation,
    'Poor performance with large datasets' as scalability_concern,
    'Difficult maintenance and evolution' as operational_issue

FROM validation_summary vs;

-- Traditional relational database limitations for document validation:
-- 1. Rigid schema definitions that resist evolution and require expensive migrations
-- 2. Limited JSON validation capabilities with poor performance and expressiveness
-- 3. Complex trigger-based validation logic that's difficult to maintain and debug
-- 4. Poor support for polymorphic document structures and optional field validation
-- 5. Expensive CHECK constraints that impact insert/update performance significantly
-- 6. Limited ability to enforce context-aware validation rules based on user roles
-- 7. Difficult integration with modern application validation frameworks and libraries
-- 8. Poor support for nested document validation and cross-document referential integrity
-- 9. Complex migration procedures required for validation rule changes and schema updates
-- 10. Limited expressiveness for business rule validation requiring extensive stored procedure logic

MongoDB's document validation provides flexible, powerful schema enforcement with JSON Schema integration:

// MongoDB Document Validation - comprehensive schema enforcement with flexible evolution capabilities
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('enterprise_user_platform');

// Advanced Document Validation Manager for Enterprise Schema Governance
class AdvancedDocumentValidationManager {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      // Validation configuration
      enableStrictValidation: config.enableStrictValidation !== false,
      enableWarningMode: config.enableWarningMode || false,
      enableValidationBypass: config.enableValidationBypass || false,
      enableCustomValidators: config.enableCustomValidators !== false,

      // Schema governance
      enableSchemaVersioning: config.enableSchemaVersioning !== false,
      enableSchemaEvolution: config.enableSchemaEvolution !== false,
      enableValidationAnalytics: config.enableValidationAnalytics !== false,

      // Performance optimization
      enableValidationCaching: config.enableValidationCaching || false,
      enableAsyncValidation: config.enableAsyncValidation || false,
      validationTimeout: config.validationTimeout || 5000,

      // Error handling
      detailedErrorReporting: config.detailedErrorReporting !== false,
      enableValidationLogging: config.enableValidationLogging !== false,
      errorAggregationEnabled: config.errorAggregationEnabled !== false
    };

    this.validationStats = {
      totalValidations: 0,
      successfulValidations: 0,
      failedValidations: 0,
      warningCount: 0,
      averageValidationTime: 0,
      schemaEvolutions: 0
    };

    this.schemaRegistry = new Map();
    this.validationCache = new Map();
    this.customValidators = new Map();

    this.initializeValidationFramework();
  }

  async initializeValidationFramework() {
    console.log('Initializing comprehensive document validation framework...');

    try {
      // Setup user profile validation
      await this.setupUserProfileValidation();

      // Setup user preferences validation
      await this.setupUserPreferencesValidation();

      // Setup user connections validation
      await this.setupUserConnectionsValidation();

      // Setup dynamic content validation
      await this.setupDynamicContentValidation();

      // Initialize custom validators
      await this.setupCustomValidators();

      // Setup validation analytics
      if (this.config.enableValidationAnalytics) {
        await this.initializeValidationAnalytics();
      }

      console.log('Document validation framework initialized successfully');

    } catch (error) {
      console.error('Error initializing validation framework:', error);
      throw error;
    }
  }

  async setupUserProfileValidation() {
    console.log('Setting up user profile validation schema...');

    try {
      const userProfileSchema = {
        $jsonSchema: {
          bsonType: 'object',
          required: ['email', 'username', 'firstName', 'lastName', 'accountStatus'],
          additionalProperties: true, // Allow for schema evolution

          properties: {
            _id: {
              bsonType: 'objectId',
              description: 'Unique identifier for the user profile'
            },

            // Core identity fields with comprehensive validation
            email: {
              bsonType: 'string',
              pattern: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$',
              maxLength: 255,
              description: 'Valid email address with proper format validation'
            },

            username: {
              bsonType: 'string',
              pattern: '^[a-zA-Z0-9_]{3,50}$',
              minLength: 3,
              maxLength: 50,
              description: 'Alphanumeric username with underscores allowed'
            },

            firstName: {
              bsonType: 'string',
              minLength: 1,
              maxLength: 100,
              pattern: '^[a-zA-ZÀ-ÿ\\s\\-\\.\']{1,100}$',
              description: 'First name with international character support'
            },

            lastName: {
              bsonType: 'string',
              minLength: 1,
              maxLength: 100,
              pattern: '^[a-zA-ZÀ-ÿ\\s\\-\\.\']{1,100}$',
              description: 'Last name with international character support'
            },

            // Contact information with flexible validation
            contactInfo: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                phoneNumber: {
                  bsonType: 'string',
                  pattern: '^\\+?[1-9]\\d{1,14}$',
                  description: 'E.164 format phone number'
                },
                address: {
                  bsonType: 'object',
                  properties: {
                    street: { bsonType: 'string', maxLength: 255 },
                    city: { bsonType: 'string', maxLength: 100 },
                    state: { bsonType: 'string', maxLength: 100 },
                    postalCode: { bsonType: 'string', pattern: '^[A-Z0-9\\s-]{3,12}$' },
                    country: { 
                      bsonType: 'string', 
                      enum: ['US', 'CA', 'GB', 'DE', 'FR', 'AU', 'JP', 'BR', 'IN', 'MX'],
                      description: 'ISO country code'
                    }
                  },
                  additionalProperties: false
                }
              }
            },

            // Profile metadata with comprehensive validation
            profileMetadata: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                dateOfBirth: {
                  bsonType: 'date',
                  description: 'Date of birth for age verification'
                },
                gender: {
                  bsonType: 'string',
                  enum: ['male', 'female', 'non-binary', 'prefer_not_to_say'],
                  description: 'Gender identity selection'
                },
                preferredLanguage: {
                  bsonType: 'string',
                  pattern: '^[a-z]{2}(-[A-Z]{2})?$',
                  description: 'ISO language code (e.g., en, en-US)'
                },
                timezone: {
                  bsonType: 'string',
                  pattern: '^[A-Za-z_]+/[A-Za-z_]+$|^UTC$',
                  description: 'IANA timezone identifier'
                },
                bio: {
                  bsonType: 'string',
                  maxLength: 2000,
                  description: 'User biography or description'
                },
                avatarUrl: {
                  bsonType: 'string',
                  pattern: '^https?://[\\w\\-._~:/?#[\\]@!$&\'()*+,;=]+$',
                  description: 'Valid URL for profile avatar'
                },
                socialLinks: {
                  bsonType: 'array',
                  maxItems: 10,
                  items: {
                    bsonType: 'object',
                    required: ['platform', 'url'],
                    properties: {
                      platform: {
                        bsonType: 'string',
                        enum: ['twitter', 'linkedin', 'github', 'facebook', 'instagram', 'website'],
                        description: 'Social media platform identifier'
                      },
                      url: {
                        bsonType: 'string',
                        pattern: '^https?://[\\w\\-._~:/?#[\\]@!$&\'()*+,;=]+$',
                        description: 'Valid URL for social profile'
                      },
                      verified: {
                        bsonType: 'bool',
                        description: 'Whether the social link has been verified'
                      }
                    },
                    additionalProperties: false
                  }
                }
              }
            },

            // Account settings with business logic validation
            accountStatus: {
              bsonType: 'string',
              enum: ['active', 'inactive', 'suspended', 'pending_verification', 'deleted'],
              description: 'Current account status'
            },

            verification: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                emailVerified: {
                  bsonType: 'bool',
                  description: 'Email verification status'
                },
                phoneVerified: {
                  bsonType: 'bool',
                  description: 'Phone verification status'
                },
                identityVerified: {
                  bsonType: 'bool',
                  description: 'Identity verification status'
                },
                verificationDate: {
                  bsonType: 'date',
                  description: 'Date of last verification'
                },
                verificationLevel: {
                  bsonType: 'string',
                  enum: ['none', 'basic', 'enhanced', 'premium'],
                  description: 'Level of account verification'
                }
              }
            },

            // Privacy and security settings
            privacySettings: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                profileVisibility: {
                  bsonType: 'string',
                  enum: ['public', 'friends', 'private'],
                  description: 'Profile visibility setting'
                },
                contactPermissions: {
                  bsonType: 'object',
                  properties: {
                    allowMessages: { bsonType: 'bool' },
                    allowConnections: { bsonType: 'bool' },
                    allowPhoneContact: { bsonType: 'bool' },
                    allowEmailContact: { bsonType: 'bool' }
                  },
                  additionalProperties: false
                },
                dataSharing: {
                  bsonType: 'object',
                  properties: {
                    marketingConsent: { bsonType: 'bool' },
                    analyticsConsent: { bsonType: 'bool' },
                    thirdPartySharing: { bsonType: 'bool' },
                    personalizedAds: { bsonType: 'bool' }
                  },
                  additionalProperties: false
                }
              }
            },

            // Security configuration
            security: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                twoFactorEnabled: {
                  bsonType: 'bool',
                  description: 'Two-factor authentication status'
                },
                twoFactorMethod: {
                  bsonType: 'string',
                  enum: ['sms', 'email', 'authenticator', 'hardware'],
                  description: 'Two-factor authentication method'
                },
                passwordLastChanged: {
                  bsonType: 'date',
                  description: 'Date of last password change'
                },
                loginAttempts: {
                  bsonType: 'int',
                  minimum: 0,
                  maximum: 10,
                  description: 'Number of recent failed login attempts'
                },
                accountLocked: {
                  bsonType: 'bool',
                  description: 'Account lock status due to security issues'
                },
                lockoutExpires: {
                  bsonType: 'date',
                  description: 'Account lockout expiration date'
                }
              }
            },

            // Audit and versioning information
            audit: {
              bsonType: 'object',
              required: ['createdAt', 'version'],
              additionalProperties: false,
              properties: {
                createdAt: {
                  bsonType: 'date',
                  description: 'Document creation timestamp'
                },
                updatedAt: {
                  bsonType: 'date',
                  description: 'Last modification timestamp'
                },
                createdBy: {
                  bsonType: 'objectId',
                  description: 'ID of user who created this document'
                },
                updatedBy: {
                  bsonType: 'objectId',
                  description: 'ID of user who last updated this document'
                },
                version: {
                  bsonType: 'int',
                  minimum: 1,
                  description: 'Document version for optimistic locking'
                },
                changeLog: {
                  bsonType: 'array',
                  maxItems: 100,
                  items: {
                    bsonType: 'object',
                    required: ['timestamp', 'action', 'field'],
                    properties: {
                      timestamp: { bsonType: 'date' },
                      action: { 
                        bsonType: 'string', 
                        enum: ['created', 'updated', 'deleted', 'verified', 'suspended'] 
                      },
                      field: { bsonType: 'string' },
                      oldValue: { bsonType: ['string', 'number', 'bool', 'null'] },
                      newValue: { bsonType: ['string', 'number', 'bool', 'null'] },
                      reason: { bsonType: 'string', maxLength: 500 }
                    },
                    additionalProperties: false
                  }
                }
              }
            }
          }
        }
      };

      // Create collection with validation
      await this.createCollectionWithValidation('userProfiles', userProfileSchema, {
        validationLevel: 'strict',
        validationAction: this.config.enableWarningMode ? 'warn' : 'error'
      });

      // Register schema for versioning
      this.schemaRegistry.set('userProfiles', {
        version: '1.0',
        schema: userProfileSchema,
        createdAt: new Date(),
        description: 'User profile schema with comprehensive validation'
      });

      console.log('User profile validation schema configured successfully');

    } catch (error) {
      console.error('Error setting up user profile validation:', error);
      throw error;
    }
  }

  async setupUserPreferencesValidation() {
    console.log('Setting up user preferences validation schema...');

    try {
      const userPreferencesSchema = {
        $jsonSchema: {
          bsonType: 'object',
          required: ['userId', 'preferenceCategory', 'preferences'],
          additionalProperties: true,

          properties: {
            _id: {
              bsonType: 'objectId'
            },

            userId: {
              bsonType: 'objectId',
              description: 'Reference to user profile'
            },

            preferenceCategory: {
              bsonType: 'string',
              enum: [
                'notification_settings',
                'display_preferences', 
                'privacy_settings',
                'content_preferences',
                'accessibility_options',
                'integration_settings',
                'security_preferences'
              ],
              description: 'Category of user preference'
            },

            preferences: {
              bsonType: 'object',
              description: 'Category-specific preference object',

              // Use conditional validation based on category
              if: { properties: { preferenceCategory: { const: 'notification_settings' } } },
              then: {
                properties: {
                  preferences: {
                    bsonType: 'object',
                    required: ['emailFrequency', 'pushEnabled', 'categories'],
                    additionalProperties: false,
                    properties: {
                      emailFrequency: {
                        bsonType: 'string',
                        enum: ['immediate', 'hourly', 'daily', 'weekly', 'never'],
                        description: 'Email notification frequency'
                      },
                      pushEnabled: {
                        bsonType: 'bool',
                        description: 'Push notification enabled status'
                      },
                      smsEnabled: {
                        bsonType: 'bool',
                        description: 'SMS notification enabled status'
                      },
                      categories: {
                        bsonType: 'object',
                        additionalProperties: false,
                        properties: {
                          security: { bsonType: 'bool' },
                          social: { bsonType: 'bool' },
                          marketing: { bsonType: 'bool' },
                          system: { bsonType: 'bool' },
                          updates: { bsonType: 'bool' }
                        }
                      },
                      quietHours: {
                        bsonType: 'object',
                        properties: {
                          enabled: { bsonType: 'bool' },
                          startTime: { 
                            bsonType: 'string',
                            pattern: '^([01]?[0-9]|2[0-3]):[0-5][0-9]$'
                          },
                          endTime: { 
                            bsonType: 'string',
                            pattern: '^([01]?[0-9]|2[0-3]):[0-5][0-9]$'
                          },
                          timezone: { bsonType: 'string' }
                        },
                        additionalProperties: false
                      }
                    }
                  }
                }
              },

              // Display preferences validation
              else: {
                if: { properties: { preferenceCategory: { const: 'display_preferences' } } },
                then: {
                  properties: {
                    preferences: {
                      bsonType: 'object',
                      additionalProperties: false,
                      properties: {
                        theme: {
                          bsonType: 'string',
                          enum: ['light', 'dark', 'auto', 'high_contrast'],
                          description: 'UI theme preference'
                        },
                        language: {
                          bsonType: 'string',
                          pattern: '^[a-z]{2}(-[A-Z]{2})?$',
                          description: 'Display language preference'
                        },
                        dateFormat: {
                          bsonType: 'string',
                          enum: ['MM/DD/YYYY', 'DD/MM/YYYY', 'YYYY-MM-DD', 'DD-MMM-YYYY'],
                          description: 'Date display format'
                        },
                        timeFormat: {
                          bsonType: 'string',
                          enum: ['12h', '24h'],
                          description: 'Time display format'
                        },
                        timezone: {
                          bsonType: 'string',
                          pattern: '^[A-Za-z_]+/[A-Za-z_]+$|^UTC$',
                          description: 'Display timezone'
                        },
                        itemsPerPage: {
                          bsonType: 'int',
                          minimum: 10,
                          maximum: 100,
                          description: 'Number of items per page'
                        },
                        fontSize: {
                          bsonType: 'string',
                          enum: ['small', 'medium', 'large', 'extra-large'],
                          description: 'Font size preference'
                        }
                      }
                    }
                  }
                }
              }
            },

            isActive: {
              bsonType: 'bool',
              description: 'Whether preferences are currently active'
            },

            lastSyncedAt: {
              bsonType: 'date',
              description: 'Last synchronization timestamp'
            },

            metadata: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                source: {
                  bsonType: 'string',
                  enum: ['user_input', 'system_default', 'import', 'sync'],
                  description: 'Source of preference data'
                },
                deviceType: {
                  bsonType: 'string',
                  enum: ['desktop', 'mobile', 'tablet', 'api'],
                  description: 'Device type where preferences were set'
                },
                appVersion: {
                  bsonType: 'string',
                  pattern: '^\\d+\\.\\d+\\.\\d+$',
                  description: 'Application version when preferences were set'
                },
                migrationVersion: {
                  bsonType: 'int',
                  description: 'Schema migration version'
                }
              }
            },

            createdAt: {
              bsonType: 'date',
              description: 'Creation timestamp'
            },

            updatedAt: {
              bsonType: 'date',
              description: 'Last update timestamp'
            }
          }
        }
      };

      // Create collection with validation
      await this.createCollectionWithValidation('userPreferences', userPreferencesSchema, {
        validationLevel: 'moderate', // Allow some flexibility for preferences
        validationAction: 'warn'     // Don't block for preference inconsistencies
      });

      // Register schema
      this.schemaRegistry.set('userPreferences', {
        version: '1.0',
        schema: userPreferencesSchema,
        createdAt: new Date(),
        description: 'User preferences with conditional validation based on category'
      });

      console.log('User preferences validation schema configured successfully');

    } catch (error) {
      console.error('Error setting up user preferences validation:', error);
      throw error;
    }
  }

  async setupUserConnectionsValidation() {
    console.log('Setting up user connections validation schema...');

    try {
      const userConnectionsSchema = {
        $jsonSchema: {
          bsonType: 'object',
          required: ['requesterUserId', 'requestedUserId', 'connectionType', 'connectionStatus'],
          additionalProperties: true,

          properties: {
            _id: {
              bsonType: 'objectId'
            },

            requesterUserId: {
              bsonType: 'objectId',
              description: 'User who initiated the connection'
            },

            requestedUserId: {
              bsonType: 'objectId',
              description: 'User who received the connection request'
            },

            connectionType: {
              bsonType: 'string',
              enum: ['friendship', 'professional', 'family', 'acquaintance', 'blocked', 'follow'],
              description: 'Type of connection relationship'
            },

            connectionStatus: {
              bsonType: 'string',
              enum: ['pending', 'accepted', 'declined', 'blocked', 'expired', 'cancelled'],
              description: 'Current status of the connection'
            },

            connectionMetadata: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                message: {
                  bsonType: 'string',
                  maxLength: 500,
                  description: 'Optional message with connection request'
                },
                tags: {
                  bsonType: 'array',
                  maxItems: 10,
                  items: {
                    bsonType: 'string',
                    maxLength: 50
                  },
                  description: 'Tags for categorizing the connection'
                },
                context: {
                  bsonType: 'string',
                  enum: ['work', 'school', 'mutual_friends', 'event', 'online', 'family', 'other'],
                  description: 'How the users know each other'
                },
                priority: {
                  bsonType: 'int',
                  minimum: 1,
                  maximum: 5,
                  description: 'Connection priority level'
                },
                isCloseFriend: {
                  bsonType: 'bool',
                  description: 'Whether this is marked as a close friend'
                },
                mutualConnections: {
                  bsonType: 'int',
                  minimum: 0,
                  description: 'Number of mutual connections'
                }
              }
            },

            timeline: {
              bsonType: 'object',
              required: ['requestedAt'],
              additionalProperties: false,
              properties: {
                requestedAt: {
                  bsonType: 'date',
                  description: 'When the connection was requested'
                },
                respondedAt: {
                  bsonType: 'date',
                  description: 'When the connection was responded to'
                },
                expiresAt: {
                  bsonType: 'date',
                  description: 'When pending connection expires'
                },
                lastInteractionAt: {
                  bsonType: 'date',
                  description: 'Last interaction between users'
                }
              }
            },

            privacy: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                isVisible: {
                  bsonType: 'bool',
                  description: 'Whether connection is visible to others'
                },
                shareWith: {
                  bsonType: 'string',
                  enum: ['public', 'friends', 'mutual_connections', 'private'],
                  description: 'Who can see this connection'
                },
                allowNotifications: {
                  bsonType: 'bool',
                  description: 'Whether to allow notifications from this connection'
                }
              }
            }
          },

          // Custom validation rules using MongoDB's expression syntax
          $expr: {
            $and: [
              // Prevent self-connections
              { $ne: ['$requesterUserId', '$requestedUserId'] },

              // Validate response timing logic
              {
                $or: [
                  { $eq: ['$connectionStatus', 'pending'] },
                  { $ne: ['$timeline.respondedAt', null] }
                ]
              },

              // Validate expiration logic
              {
                $or: [
                  { $eq: ['$timeline.expiresAt', null] },
                  { $gt: ['$timeline.expiresAt', '$timeline.requestedAt'] }
                ]
              }
            ]
          }
        }
      };

      // Create collection with validation
      await this.createCollectionWithValidation('userConnections', userConnectionsSchema, {
        validationLevel: 'strict',
        validationAction: 'error'
      });

      // Create compound unique index to prevent duplicate connections
      await this.db.collection('userConnections').createIndex(
        { requesterUserId: 1, requestedUserId: 1, connectionType: 1 },
        { 
          unique: true,
          partialFilterExpression: { 
            connectionStatus: { $nin: ['declined', 'cancelled', 'expired'] } 
          },
          background: true,
          name: 'unique_active_connections'
        }
      );

      // Register schema
      this.schemaRegistry.set('userConnections', {
        version: '1.0',
        schema: userConnectionsSchema,
        createdAt: new Date(),
        description: 'User connections with complex business logic validation'
      });

      console.log('User connections validation schema configured successfully');

    } catch (error) {
      console.error('Error setting up user connections validation:', error);
      throw error;
    }
  }

  async createCollectionWithValidation(collectionName, schema, options = {}) {
    console.log(`Creating collection ${collectionName} with validation...`);

    try {
      // Check if collection exists
      const collections = await this.db.listCollections({ name: collectionName }).toArray();

      if (collections.length > 0) {
        // Collection exists, modify validation
        console.log(`Updating validation for existing collection: ${collectionName}`);

        await this.db.command({
          collMod: collectionName,
          validator: schema,
          validationLevel: options.validationLevel || 'strict',
          validationAction: options.validationAction || 'error'
        });

      } else {
        // Create new collection with validation
        console.log(`Creating new collection with validation: ${collectionName}`);

        await this.db.createCollection(collectionName, {
          validator: schema,
          validationLevel: options.validationLevel || 'strict',
          validationAction: options.validationAction || 'error'
        });
      }

      console.log(`Collection ${collectionName} validation configured successfully`);

    } catch (error) {
      console.error(`Error creating collection ${collectionName} with validation:`, error);
      throw error;
    }
  }

  async validateDocument(collectionName, document, options = {}) {
    console.log(`Validating document for collection: ${collectionName}`);
    const validationStart = Date.now();

    try {
      const collection = this.db.collection(collectionName);
      const schemaInfo = this.schemaRegistry.get(collectionName);

      if (!schemaInfo) {
        throw new Error(`No validation schema found for collection: ${collectionName}`);
      }

      // Perform document validation
      const validationResult = {
        isValid: true,
        errors: [],
        warnings: [],
        validatedFields: [],
        skippedFields: []
      };

      // Custom validation logic
      if (this.config.enableCustomValidators) {
        const customValidation = await this.runCustomValidators(collectionName, document);
        if (!customValidation.isValid) {
          validationResult.isValid = false;
          validationResult.errors.push(...customValidation.errors);
        }
        validationResult.warnings.push(...customValidation.warnings);
      }

      // Business logic validation
      const businessValidation = await this.validateBusinessRules(collectionName, document, options);
      if (!businessValidation.isValid) {
        validationResult.isValid = false;
        validationResult.errors.push(...businessValidation.errors);
      }
      validationResult.warnings.push(...businessValidation.warnings);

      // Update validation statistics
      const validationTime = Date.now() - validationStart;
      this.validationStats.totalValidations++;

      if (validationResult.isValid) {
        this.validationStats.successfulValidations++;
      } else {
        this.validationStats.failedValidations++;
      }

      this.validationStats.warningCount += validationResult.warnings.length;
      this.validationStats.averageValidationTime = 
        ((this.validationStats.averageValidationTime * (this.validationStats.totalValidations - 1)) + validationTime) / 
        this.validationStats.totalValidations;

      // Log validation result if enabled
      if (this.config.enableValidationLogging) {
        await this.logValidationResult(collectionName, document._id, validationResult, validationTime);
      }

      console.log(`Document validation completed for ${collectionName}: ${validationResult.isValid ? 'valid' : 'invalid'} (${validationTime}ms)`);

      return {
        ...validationResult,
        validationTime,
        schemaVersion: schemaInfo.version
      };

    } catch (error) {
      console.error(`Document validation failed for ${collectionName}:`, error);
      throw error;
    }
  }

  async validateBusinessRules(collectionName, document, options) {
    const businessValidation = {
      isValid: true,
      errors: [],
      warnings: []
    };

    switch (collectionName) {
      case 'userProfiles':
        return await this.validateUserProfileBusinessRules(document, options);

      case 'userConnections':
        return await this.validateConnectionBusinessRules(document, options);

      case 'userPreferences':
        return await this.validatePreferencesBusinessRules(document, options);

      default:
        return businessValidation;
    }
  }

  async validateUserProfileBusinessRules(document, options) {
    const validation = { isValid: true, errors: [], warnings: [] };

    try {
      // Check for duplicate email (excluding current document)
      if (document.email) {
        const emailExists = await this.db.collection('userProfiles').findOne({
          email: document.email,
          _id: { $ne: document._id }
        });

        if (emailExists) {
          validation.isValid = false;
          validation.errors.push('Email address already exists');
        }
      }

      // Check for duplicate username
      if (document.username) {
        const usernameExists = await this.db.collection('userProfiles').findOne({
          username: document.username,
          _id: { $ne: document._id }
        });

        if (usernameExists) {
          validation.isValid = false;
          validation.errors.push('Username already exists');
        }

        // Check for reserved usernames
        const reservedUsernames = ['admin', 'root', 'system', 'test', 'null', 'undefined', 'api'];
        if (reservedUsernames.some(reserved => 
          document.username.toLowerCase().includes(reserved.toLowerCase())
        )) {
          validation.errors.push('Username contains reserved words');
          validation.isValid = false;
        }
      }

      // Validate two-factor authentication requirements
      if (document.security?.twoFactorEnabled && 
          !document.verification?.emailVerified && 
          !document.verification?.phoneVerified) {
        validation.warnings.push('Two-factor authentication requires verified email or phone');
      }

      // Validate profile completeness
      const requiredFields = ['firstName', 'lastName', 'email'];
      const recommendedFields = ['profileMetadata.bio', 'contactInfo.phoneNumber', 'profileMetadata.avatarUrl'];

      const missingRequired = requiredFields.filter(field => !this.getNestedValue(document, field));
      const missingRecommended = recommendedFields.filter(field => !this.getNestedValue(document, field));

      if (missingRequired.length > 0) {
        validation.isValid = false;
        validation.errors.push(`Missing required fields: ${missingRequired.join(', ')}`);
      }

      if (missingRecommended.length > 2) {
        validation.warnings.push('Profile is incomplete - consider adding more information');
      }

      return validation;

    } catch (error) {
      validation.isValid = false;
      validation.errors.push(`Business rule validation failed: ${error.message}`);
      return validation;
    }
  }

  async validateConnectionBusinessRules(document, options) {
    const validation = { isValid: true, errors: [], warnings: [] };

    try {
      // Validate that users exist
      const [requester, requested] = await Promise.all([
        this.db.collection('userProfiles').findOne({ _id: document.requesterUserId }),
        this.db.collection('userProfiles').findOne({ _id: document.requestedUserId })
      ]);

      if (!requester) {
        validation.isValid = false;
        validation.errors.push('Requester user does not exist');
      } else if (requester.accountStatus !== 'active') {
        validation.isValid = false;
        validation.errors.push('Cannot create connection from inactive account');
      }

      if (!requested) {
        validation.isValid = false;
        validation.errors.push('Requested user does not exist');
      } else if (!['active', 'inactive'].includes(requested.accountStatus)) {
        validation.isValid = false;
        validation.errors.errors.push('Cannot create connection to suspended or deleted account');
      }

      // Check for existing blocked connections
      const blockedConnection = await this.db.collection('userConnections').findOne({
        $or: [
          { requesterUserId: document.requesterUserId, requestedUserId: document.requestedUserId },
          { requesterUserId: document.requestedUserId, requestedUserId: document.requesterUserId }
        ],
        connectionStatus: 'blocked'
      });

      if (blockedConnection && document.connectionType !== 'blocked') {
        validation.isValid = false;
        validation.errors.push('Cannot create connection with blocked user');
      }

      // Validate connection limits
      if (document.connectionType === 'friendship') {
        const connectionCount = await this.db.collection('userConnections').countDocuments({
          requesterUserId: document.requesterUserId,
          connectionType: 'friendship',
          connectionStatus: 'accepted'
        });

        if (connectionCount >= 5000) {
          validation.isValid = false;
          validation.errors.push('Maximum friendship connections exceeded');
        }
      }

      return validation;

    } catch (error) {
      validation.isValid = false;
      validation.errors.push(`Connection business rule validation failed: ${error.message}`);
      return validation;
    }
  }

  getNestedValue(object, path) {
    return path.split('.').reduce((current, key) => current && current[key], object);
  }

  async setupCustomValidators() {
    console.log('Setting up custom validators...');

    // Email domain validator
    this.customValidators.set('emailDomainValidator', async (document, field) => {
      const email = this.getNestedValue(document, field);
      if (!email) return { isValid: true, warnings: [] };

      const domain = email.split('@')[1]?.toLowerCase();
      const disposableEmailDomains = ['tempmail.com', 'guerrillamail.com', '10minutemail.com'];

      if (disposableEmailDomains.includes(domain)) {
        return {
          isValid: false,
          errors: ['Disposable email addresses are not allowed']
        };
      }

      return { isValid: true, warnings: [] };
    });

    // Profile completeness validator
    this.customValidators.set('profileCompletenessValidator', async (document) => {
      const completenessScore = this.calculateProfileCompleteness(document);

      if (completenessScore < 30) {
        return {
          isValid: true,
          warnings: ['Profile completeness is very low - consider adding more information']
        };
      }

      return { isValid: true, warnings: [] };
    });

    console.log('Custom validators configured successfully');
  }

  async runCustomValidators(collectionName, document) {
    const results = { isValid: true, errors: [], warnings: [] };

    for (const [validatorName, validator] of this.customValidators) {
      try {
        const validatorResult = await validator(document);

        if (!validatorResult.isValid) {
          results.isValid = false;
          results.errors.push(...(validatorResult.errors || []));
        }

        results.warnings.push(...(validatorResult.warnings || []));

      } catch (error) {
        console.error(`Custom validator ${validatorName} failed:`, error);
        results.warnings.push(`Validator ${validatorName} encountered an error`);
      }
    }

    return results;
  }

  calculateProfileCompleteness(userProfile) {
    let score = 0;

    // Basic required fields (40 points)
    if (userProfile.email) score += 10;
    if (userProfile.firstName) score += 10;
    if (userProfile.lastName) score += 10;
    if (userProfile.username) score += 10;

    // Profile metadata (30 points)
    if (userProfile.profileMetadata?.bio) score += 10;
    if (userProfile.profileMetadata?.avatarUrl) score += 10;
    if (userProfile.profileMetadata?.dateOfBirth) score += 5;
    if (userProfile.profileMetadata?.preferredLanguage) score += 5;

    // Contact information (20 points)
    if (userProfile.contactInfo?.phoneNumber) score += 10;
    if (userProfile.contactInfo?.address) score += 10;

    // Verification status (10 points)
    if (userProfile.verification?.emailVerified) score += 5;
    if (userProfile.verification?.phoneVerified) score += 5;

    return Math.min(score, 100); // Cap at 100%
  }

  async logValidationResult(collectionName, documentId, validationResult, validationTime) {
    try {
      const validationLog = {
        timestamp: new Date(),
        collectionName,
        documentId,
        isValid: validationResult.isValid,
        errorCount: validationResult.errors.length,
        warningCount: validationResult.warnings.length,
        validationTime,
        errors: validationResult.errors,
        warnings: validationResult.warnings
      };

      await this.db.collection('validationLogs').insertOne(validationLog);

    } catch (error) {
      console.error('Error logging validation result:', error);
    }
  }

  async getValidationStatistics() {
    return {
      ...this.validationStats,
      timestamp: new Date(),
      registeredSchemas: this.schemaRegistry.size,
      customValidators: this.customValidators.size
    };
  }

  async evolveSchema(collectionName, newSchema, options = {}) {
    console.log(`Evolving schema for collection: ${collectionName}`);

    try {
      const currentSchemaInfo = this.schemaRegistry.get(collectionName);
      if (!currentSchemaInfo) {
        throw new Error(`No existing schema found for collection: ${collectionName}`);
      }

      // Backup current schema
      const backupSchema = {
        ...currentSchemaInfo,
        backupTimestamp: new Date()
      };

      // Update validation
      await this.db.command({
        collMod: collectionName,
        validator: newSchema,
        validationLevel: options.validationLevel || 'moderate',
        validationAction: options.validationAction || 'warn'
      });

      // Update schema registry
      this.schemaRegistry.set(collectionName, {
        version: options.version || `${parseFloat(currentSchemaInfo.version) + 0.1}`,
        schema: newSchema,
        createdAt: new Date(),
        description: options.description || 'Schema evolution',
        previousVersion: backupSchema
      });

      this.validationStats.schemaEvolutions++;

      console.log(`Schema evolved successfully for collection: ${collectionName}`);

      return {
        success: true,
        newVersion: this.schemaRegistry.get(collectionName).version,
        evolutionTimestamp: new Date()
      };

    } catch (error) {
      console.error(`Schema evolution failed for ${collectionName}:`, error);
      throw error;
    }
  }
}

// Example usage demonstrating comprehensive document validation
async function demonstrateAdvancedDocumentValidation() {
  const validationManager = new AdvancedDocumentValidationManager(db, {
    enableStrictValidation: true,
    enableValidationAnalytics: true,
    enableCustomValidators: true,
    detailedErrorReporting: true
  });

  try {
    // Test user profile validation
    const userProfile = {
      email: '[email protected]',
      username: 'johndoe123',
      firstName: 'John',
      lastName: 'Doe',
      contactInfo: {
        phoneNumber: '+1234567890',
        address: {
          street: '123 Main St',
          city: 'New York',
          state: 'NY',
          postalCode: '10001',
          country: 'US'
        }
      },
      profileMetadata: {
        dateOfBirth: new Date('1990-01-15'),
        preferredLanguage: 'en-US',
        timezone: 'America/New_York',
        bio: 'Software developer passionate about technology',
        socialLinks: [
          {
            platform: 'github',
            url: 'https://github.com/johndoe',
            verified: true
          }
        ]
      },
      accountStatus: 'active',
      verification: {
        emailVerified: true,
        phoneVerified: false,
        verificationLevel: 'basic'
      },
      privacySettings: {
        profileVisibility: 'public',
        contactPermissions: {
          allowMessages: true,
          allowConnections: true
        }
      },
      security: {
        twoFactorEnabled: false,
        passwordLastChanged: new Date(),
        loginAttempts: 0
      },
      audit: {
        createdAt: new Date(),
        version: 1
      }
    };

    console.log('Validating user profile...');
    const profileValidation = await validationManager.validateDocument('userProfiles', userProfile);
    console.log('Profile validation result:', profileValidation);

    // Test user preferences validation
    const userPreferences = {
      userId: new ObjectId(),
      preferenceCategory: 'notification_settings',
      preferences: {
        emailFrequency: 'daily',
        pushEnabled: true,
        smsEnabled: false,
        categories: {
          security: true,
          social: true,
          marketing: false,
          system: true,
          updates: true
        },
        quietHours: {
          enabled: true,
          startTime: '22:00',
          endTime: '08:00',
          timezone: 'America/New_York'
        }
      },
      isActive: true,
      metadata: {
        source: 'user_input',
        deviceType: 'desktop',
        appVersion: '2.1.0'
      },
      createdAt: new Date(),
      updatedAt: new Date()
    };

    console.log('Validating user preferences...');
    const preferencesValidation = await validationManager.validateDocument('userPreferences', userPreferences);
    console.log('Preferences validation result:', preferencesValidation);

    // Get validation statistics
    const stats = await validationManager.getValidationStatistics();
    console.log('Validation statistics:', stats);

    return {
      profileValidation,
      preferencesValidation,
      validationStats: stats
    };

  } catch (error) {
    console.error('Document validation demonstration failed:', error);
    throw error;
  }
}

// Benefits of MongoDB Document Validation:
// - Flexible JSON Schema-based validation that evolves with application requirements
// - Comprehensive business rule validation with custom validator support
// - Context-aware validation rules that can adapt to different scenarios
// - Rich error reporting and validation analytics for operational insight
// - Schema versioning and evolution capabilities for production environments
// - Performance-optimized validation with caching and async processing options
// - Integration with MongoDB's native validation engine for optimal performance
// - SQL-compatible validation patterns through QueryLeaf integration

module.exports = {
  AdvancedDocumentValidationManager,
  demonstrateAdvancedDocumentValidation
};

SQL-Style Document Validation with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB document validation and schema enforcement:

-- QueryLeaf document validation with SQL-familiar schema definition and constraint syntax

-- Create validation schema for user profiles with comprehensive constraints
CREATE VALIDATION SCHEMA user_profiles_schema AS (
  -- Core identity validation
  email VARCHAR(255) NOT NULL 
    PATTERN '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    UNIQUE CONSTRAINT 'email_already_exists',

  username VARCHAR(50) NOT NULL 
    PATTERN '^[a-zA-Z0-9_]{3,50}$'
    UNIQUE CONSTRAINT 'username_already_exists'
    CHECK username NOT IN ('admin', 'root', 'system', 'test'),

  first_name VARCHAR(100) NOT NULL 
    PATTERN '^[a-zA-ZÀ-ÿ\s\-\.'\']{1,100}$',

  last_name VARCHAR(100) NOT NULL 
    PATTERN '^[a-zA-ZÀ-ÿ\s\-\.'\']{1,100}$',

  -- Nested contact information validation
  contact_info JSON OBJECT (
    phone_number VARCHAR(20) 
      PATTERN '^\+?[1-9]\d{1,14}$'
      DESCRIPTION 'E.164 format phone number',

    address JSON OBJECT (
      street VARCHAR(255),
      city VARCHAR(100),
      state VARCHAR(100),
      postal_code VARCHAR(12) PATTERN '^[A-Z0-9\s-]{3,12}$',
      country ENUM('US', 'CA', 'GB', 'DE', 'FR', 'AU', 'JP', 'BR', 'IN', 'MX')
    ) ADDITIONAL_PROPERTIES false
  ),

  -- Profile metadata with complex validation
  profile_metadata JSON OBJECT (
    date_of_birth DATE CHECK date_of_birth < CURRENT_DATE,
    gender ENUM('male', 'female', 'non-binary', 'prefer_not_to_say'),
    preferred_language VARCHAR(10) PATTERN '^[a-z]{2}(-[A-Z]{2})?$',
    timezone VARCHAR(50) PATTERN '^[A-Za-z_]+/[A-Za-z_]+$|^UTC$',
    bio VARCHAR(2000),
    avatar_url VARCHAR(500) PATTERN '^https?://[\w\-._~:/?#[\]@!$&\'()*+,;=]+$',

    -- Array validation with nested objects
    social_links ARRAY OF JSON OBJECT (
      platform ENUM('twitter', 'linkedin', 'github', 'facebook', 'instagram', 'website'),
      url VARCHAR(500) PATTERN '^https?://[\w\-._~:/?#[\]@!$&\'()*+,;=]+$',
      verified BOOLEAN DEFAULT false
    ) MAX_ITEMS 10
  ),

  -- Account status with business logic
  account_status ENUM('active', 'inactive', 'suspended', 'pending_verification', 'deleted'),

  -- Verification status with interdependent validation
  verification JSON OBJECT (
    email_verified BOOLEAN DEFAULT false,
    phone_verified BOOLEAN DEFAULT false,
    identity_verified BOOLEAN DEFAULT false,
    verification_date TIMESTAMP,
    verification_level ENUM('none', 'basic', 'enhanced', 'premium')
  ),

  -- Privacy settings validation
  privacy_settings JSON OBJECT (
    profile_visibility ENUM('public', 'friends', 'private'),
    contact_permissions JSON OBJECT (
      allow_messages BOOLEAN DEFAULT true,
      allow_connections BOOLEAN DEFAULT true,
      allow_phone_contact BOOLEAN DEFAULT false,
      allow_email_contact BOOLEAN DEFAULT true
    ),
    data_sharing JSON OBJECT (
      marketing_consent BOOLEAN DEFAULT false,
      analytics_consent BOOLEAN DEFAULT true,
      third_party_sharing BOOLEAN DEFAULT false,
      personalized_ads BOOLEAN DEFAULT false
    )
  ),

  -- Security configuration with complex validation
  security JSON OBJECT (
    two_factor_enabled BOOLEAN DEFAULT false,
    two_factor_method ENUM('sms', 'email', 'authenticator', 'hardware'),
    password_last_changed TIMESTAMP,
    login_attempts INTEGER MIN 0 MAX 10 DEFAULT 0,
    account_locked BOOLEAN DEFAULT false,
    lockout_expires TIMESTAMP
  ),

  -- Audit information with required fields
  audit JSON OBJECT NOT NULL (
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by OBJECTID,
    updated_by OBJECTID,
    version INTEGER MIN 1 DEFAULT 1,

    -- Change log with structured history
    change_log ARRAY OF JSON OBJECT (
      timestamp TIMESTAMP NOT NULL,
      action ENUM('created', 'updated', 'deleted', 'verified', 'suspended'),
      field VARCHAR(100),
      old_value VARCHAR(1000),
      new_value VARCHAR(1000),
      reason VARCHAR(500)
    ) MAX_ITEMS 100
  ),

  -- Cross-field validation constraints
  CONSTRAINT email_verification_consistency 
    CHECK (NOT verification.email_verified OR email IS NOT NULL),

  CONSTRAINT phone_verification_consistency 
    CHECK (NOT verification.phone_verified OR contact_info.phone_number IS NOT NULL),

  CONSTRAINT two_factor_requirements 
    CHECK (NOT security.two_factor_enabled OR 
           verification.email_verified = true OR 
           verification.phone_verified = true),

  CONSTRAINT account_lock_expiration 
    CHECK (NOT security.account_locked OR security.lockout_expires > CURRENT_TIMESTAMP),

  -- Business rule validation
  CONSTRAINT username_content_policy 
    CHECK (username NOT SIMILAR TO '.*(admin|root|system|test|null|undefined).*'),

  CONSTRAINT profile_completeness 
    CHECK (first_name IS NOT NULL AND 
           last_name IS NOT NULL AND 
           email IS NOT NULL AND 
           audit.version >= 1)
);

-- Apply validation schema to collection with configurable strictness
ALTER COLLECTION user_profiles 
SET VALIDATION SCHEMA user_profiles_schema
WITH (
  validation_level = 'strict',
  validation_action = 'error',
  enable_custom_validators = true,
  enable_business_rule_validation = true,
  validation_timeout_ms = 5000,
  detailed_error_reporting = true
);

-- Create conditional validation for user preferences based on category
CREATE VALIDATION SCHEMA user_preferences_schema AS (
  user_id OBJECTID NOT NULL REFERENCES user_profiles(_id),
  preference_category ENUM(
    'notification_settings',
    'display_preferences', 
    'privacy_settings',
    'content_preferences',
    'accessibility_options',
    'integration_settings',
    'security_preferences'
  ) NOT NULL,

  -- Conditional validation based on preference category
  preferences JSON OBJECT CONDITIONAL VALIDATION (
    WHEN preference_category = 'notification_settings' THEN 
      JSON OBJECT (
        email_frequency ENUM('immediate', 'hourly', 'daily', 'weekly', 'never') NOT NULL,
        push_enabled BOOLEAN NOT NULL,
        sms_enabled BOOLEAN DEFAULT false,
        categories JSON OBJECT (
          security BOOLEAN DEFAULT true,
          social BOOLEAN DEFAULT true,
          marketing BOOLEAN DEFAULT false,
          system BOOLEAN DEFAULT true,
          updates BOOLEAN DEFAULT true
        ),
        quiet_hours JSON OBJECT (
          enabled BOOLEAN DEFAULT false,
          start_time VARCHAR(5) PATTERN '^([01]?[0-9]|2[0-3]):[0-5][0-9]$',
          end_time VARCHAR(5) PATTERN '^([01]?[0-9]|2[0-3]):[0-5][0-9]$',
          timezone VARCHAR(50)
        )
      ),

    WHEN preference_category = 'display_preferences' THEN
      JSON OBJECT (
        theme ENUM('light', 'dark', 'auto', 'high_contrast') DEFAULT 'light',
        language VARCHAR(10) PATTERN '^[a-z]{2}(-[A-Z]{2})?$',
        date_format ENUM('MM/DD/YYYY', 'DD/MM/YYYY', 'YYYY-MM-DD', 'DD-MMM-YYYY'),
        time_format ENUM('12h', '24h') DEFAULT '12h',
        timezone VARCHAR(50) PATTERN '^[A-Za-z_]+/[A-Za-z_]+$|^UTC$',
        items_per_page INTEGER MIN 10 MAX 100 DEFAULT 25,
        font_size ENUM('small', 'medium', 'large', 'extra-large') DEFAULT 'medium'
      ),

    WHEN preference_category = 'privacy_settings' THEN
      JSON OBJECT (
        data_retention_period INTEGER MIN 30 MAX 2555 DEFAULT 365,
        automatic_deletion_enabled BOOLEAN DEFAULT false,
        third_party_integrations BOOLEAN DEFAULT false,
        data_export_format ENUM('json', 'csv', 'xml') DEFAULT 'json',
        activity_logging BOOLEAN DEFAULT true
      ),

    ELSE 
      JSON OBJECT ADDITIONAL_PROPERTIES true  -- Allow flexible structure for other categories
  ),

  is_active BOOLEAN DEFAULT true,
  last_synced_at TIMESTAMP,

  metadata JSON OBJECT (
    source ENUM('user_input', 'system_default', 'import', 'sync') DEFAULT 'user_input',
    device_type ENUM('desktop', 'mobile', 'tablet', 'api'),
    app_version VARCHAR(20) PATTERN '^\d+\.\d+\.\d+$',
    migration_version INTEGER DEFAULT 1
  ),

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  -- Complex cross-field validation
  CONSTRAINT notification_consistency 
    CHECK (preference_category != 'notification_settings' OR 
           (preferences.email_frequency IS NOT NULL AND preferences.push_enabled IS NOT NULL)),

  CONSTRAINT sync_timestamp_validation 
    CHECK (NOT is_active OR last_synced_at >= created_at),

  CONSTRAINT quiet_hours_logic 
    CHECK (preference_category != 'notification_settings' OR
           preferences.quiet_hours.enabled = false OR
           (preferences.quiet_hours.start_time IS NOT NULL AND 
            preferences.quiet_hours.end_time IS NOT NULL))
);

-- Apply conditional validation schema
ALTER COLLECTION user_preferences 
SET VALIDATION SCHEMA user_preferences_schema
WITH (
  validation_level = 'moderate',  -- Allow some flexibility
  validation_action = 'warn',     -- Don't block operations
  enable_conditional_validation = true
);

-- Complex validation for user connections with business logic
CREATE VALIDATION SCHEMA user_connections_schema AS (
  requester_user_id OBJECTID NOT NULL REFERENCES user_profiles(_id),
  requested_user_id OBJECTID NOT NULL REFERENCES user_profiles(_id),

  connection_type ENUM('friendship', 'professional', 'family', 'acquaintance', 'blocked', 'follow') NOT NULL,
  connection_status ENUM('pending', 'accepted', 'declined', 'blocked', 'expired', 'cancelled') NOT NULL,

  connection_metadata JSON OBJECT (
    message VARCHAR(500),
    tags ARRAY OF VARCHAR(50) MAX_ITEMS 10,
    context ENUM('work', 'school', 'mutual_friends', 'event', 'online', 'family', 'other'),
    priority INTEGER MIN 1 MAX 5 DEFAULT 3,
    is_close_friend BOOLEAN DEFAULT false,
    mutual_connections INTEGER MIN 0 DEFAULT 0
  ),

  timeline JSON OBJECT NOT NULL (
    requested_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    responded_at TIMESTAMP,
    expires_at TIMESTAMP,
    last_interaction_at TIMESTAMP
  ),

  privacy JSON OBJECT (
    is_visible BOOLEAN DEFAULT true,
    share_with ENUM('public', 'friends', 'mutual_connections', 'private') DEFAULT 'friends',
    allow_notifications BOOLEAN DEFAULT true
  ),

  -- Complex business logic validation
  CONSTRAINT no_self_connection 
    CHECK (requester_user_id != requested_user_id),

  CONSTRAINT response_timing_logic 
    CHECK ((connection_status = 'pending' AND timeline.responded_at IS NULL) OR
           (connection_status != 'pending' AND timeline.responded_at IS NOT NULL)),

  CONSTRAINT expiration_logic 
    CHECK (timeline.expires_at IS NULL OR 
           timeline.expires_at > timeline.requested_at),

  CONSTRAINT interaction_timing 
    CHECK (timeline.last_interaction_at IS NULL OR 
           timeline.last_interaction_at >= timeline.requested_at),

  -- Unique constraint simulation
  CONSTRAINT unique_active_connection
    CHECK (NOT EXISTS (
      SELECT 1 FROM user_connections uc 
      WHERE uc.requester_user_id = requester_user_id 
      AND uc.requested_user_id = requested_user_id 
      AND uc.connection_type = connection_type
      AND uc.connection_status NOT IN ('declined', 'cancelled', 'expired')
      AND uc._id != _id
    ))
);

-- Advanced validation with custom business rules
CREATE CUSTOM VALIDATOR email_domain_validator(email VARCHAR) RETURNS VALIDATION_RESULT AS (
  DECLARE disposable_domains TEXT[] := ARRAY['tempmail.com', 'guerrillamail.com', '10minutemail.com', 'mailinator.com'];
  DECLARE email_domain TEXT := SPLIT_PART(email, '@', 2);

  IF email_domain = ANY(disposable_domains) THEN
    RETURN VALIDATION_ERROR('Disposable email addresses are not allowed');
  END IF;

  RETURN VALIDATION_SUCCESS();
);

CREATE CUSTOM VALIDATOR connection_limit_validator(user_id OBJECTID, connection_type VARCHAR) RETURNS VALIDATION_RESULT AS (
  DECLARE connection_count INTEGER;
  DECLARE max_connections INTEGER;

  -- Set limits based on connection type
  max_connections := CASE connection_type
    WHEN 'friendship' THEN 5000
    WHEN 'professional' THEN 10000
    WHEN 'follow' THEN 50000
    ELSE 1000
  END;

  -- Count existing connections
  SELECT COUNT(*) INTO connection_count
  FROM user_connections 
  WHERE requester_user_id = user_id 
  AND connection_type = connection_type 
  AND connection_status = 'accepted';

  IF connection_count >= max_connections THEN
    RETURN VALIDATION_ERROR('Maximum ' || connection_type || ' connections exceeded (' || max_connections || ')');
  END IF;

  RETURN VALIDATION_SUCCESS();
);

-- Apply custom validators to collections
ALTER COLLECTION user_profiles 
ADD CUSTOM VALIDATOR email_domain_validator(email);

ALTER COLLECTION user_connections 
ADD CUSTOM VALIDATOR connection_limit_validator(requester_user_id, connection_type);

-- Validation analytics and monitoring
WITH validation_performance AS (
  SELECT 
    collection_name,
    validation_schema_version,

    -- Validation success metrics
    COUNT(*) as total_validations,
    COUNT(*) FILTER (WHERE validation_result = 'success') as successful_validations,
    COUNT(*) FILTER (WHERE validation_result = 'error') as failed_validations,
    COUNT(*) FILTER (WHERE validation_result = 'warning') as warning_validations,

    -- Performance metrics
    AVG(validation_time_ms) as avg_validation_time,
    MAX(validation_time_ms) as max_validation_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY validation_time_ms) as p95_validation_time,

    -- Error analysis
    COUNT(DISTINCT error_type) as unique_error_types,
    array_agg(DISTINCT error_type) FILTER (WHERE error_type IS NOT NULL) as common_errors,

    -- Business impact metrics
    SUM(CASE WHEN validation_result = 'error' THEN 1 ELSE 0 END) as blocked_operations,
    ROUND(
      (COUNT(*) FILTER (WHERE validation_result = 'success') * 100.0 / COUNT(*)),
      2
    ) as validation_success_rate

  FROM validation_logs
  WHERE validation_timestamp >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY collection_name, validation_schema_version
),

schema_evolution_analysis AS (
  SELECT 
    collection_name,
    schema_version,
    schema_evolution_date,

    -- Schema complexity metrics
    json_array_length(schema_definition->'properties') as field_count,
    json_array_length(schema_definition->'constraints') as constraint_count,

    -- Evolution impact
    LAG(validation_success_rate) OVER (
      PARTITION BY collection_name 
      ORDER BY schema_evolution_date
    ) as previous_success_rate,

    validation_success_rate - LAG(validation_success_rate) OVER (
      PARTITION BY collection_name 
      ORDER BY schema_evolution_date
    ) as success_rate_change,

    -- Performance impact
    avg_validation_time - LAG(avg_validation_time) OVER (
      PARTITION BY collection_name 
      ORDER BY schema_evolution_date
    ) as validation_time_change

  FROM validation_performance vp
  JOIN schema_evolution_history seh ON vp.collection_name = seh.collection_name
),

validation_recommendations AS (
  SELECT 
    vp.collection_name,
    vp.validation_success_rate,
    vp.avg_validation_time,
    vp.common_errors,

    -- Performance assessment
    CASE 
      WHEN vp.validation_success_rate >= 95 THEN 'Excellent'
      WHEN vp.validation_success_rate >= 90 THEN 'Good'
      WHEN vp.validation_success_rate >= 80 THEN 'Fair'
      ELSE 'Needs Improvement'
    END as validation_quality,

    -- Optimization recommendations
    CASE 
      WHEN vp.avg_validation_time > 100 THEN 'Optimize validation performance - consider schema simplification'
      WHEN vp.blocked_operations > 100 THEN 'Review validation rules - high error rate impacting operations'
      WHEN array_length(vp.common_errors, 1) > 5 THEN 'Address common validation errors through improved data quality'
      WHEN vp.validation_success_rate < 90 THEN 'Review validation schema for overly restrictive rules'
      ELSE 'Validation configuration is well-optimized'
    END as primary_recommendation,

    -- Schema evolution guidance
    CASE 
      WHEN sea.success_rate_change < -5 THEN 'Recent schema changes negatively impacted validation success - consider rollback'
      WHEN sea.validation_time_change > 50 THEN 'Schema complexity increase affecting performance - optimize constraints'
      WHEN sea.success_rate_change > 10 THEN 'Schema evolution improved data quality significantly'
      ELSE 'Schema evolution impact within acceptable parameters'
    END as evolution_guidance,

    -- Operational insights
    JSON_OBJECT(
      'total_validations', vp.total_validations,
      'daily_average', ROUND(vp.total_validations / 30.0, 0),
      'error_rate', ROUND((vp.failed_validations * 100.0 / vp.total_validations), 2),
      'performance_rating', 
        CASE 
          WHEN vp.avg_validation_time <= 10 THEN 'Excellent'
          WHEN vp.avg_validation_time <= 50 THEN 'Good'
          WHEN vp.avg_validation_time <= 100 THEN 'Fair'
          ELSE 'Poor'
        END,
      'schema_complexity', 
        CASE 
          WHEN sea.field_count > 50 THEN 'High'
          WHEN sea.field_count > 20 THEN 'Medium'
          ELSE 'Low'
        END
    ) as operational_insights

  FROM validation_performance vp
  LEFT JOIN schema_evolution_analysis sea ON vp.collection_name = sea.collection_name
)

-- Comprehensive validation governance dashboard
SELECT 
  vr.collection_name,
  vr.validation_success_rate || '%' as success_rate,
  vr.validation_quality,
  vr.avg_validation_time || 'ms' as avg_response_time,

  -- Optimization guidance
  vr.primary_recommendation,
  vr.evolution_guidance,

  -- Error insights
  CASE 
    WHEN array_length(vr.common_errors, 1) > 0 THEN 
      array_to_string(array(SELECT UNNEST(vr.common_errors) LIMIT 3), ', ')
    ELSE 'No common errors'
  END as top_validation_errors,

  -- Operational metrics
  vr.operational_insights,

  -- Next actions
  CASE vr.validation_quality
    WHEN 'Needs Improvement' THEN 
      JSON_ARRAY(
        'Review and simplify overly restrictive validation rules',
        'Analyze common error patterns and improve data quality',
        'Consider implementing graduated validation levels',
        'Provide better validation error messages to users'
      )
    WHEN 'Fair' THEN 
      JSON_ARRAY(
        'Optimize validation performance for better response times',
        'Address top validation errors through improved input handling',
        'Consider conditional validation for optional fields'
      )
    ELSE 
      JSON_ARRAY('Monitor validation trends for early issue detection', 'Maintain current validation excellence')
  END as recommended_actions,

  -- Governance metrics
  JSON_OBJECT(
    'data_quality_score', vr.validation_success_rate,
    'schema_maintainability', 
      CASE 
        WHEN vr.operational_insights->>'schema_complexity' = 'High' THEN 'Review for simplification'
        WHEN vr.operational_insights->>'schema_complexity' = 'Medium' THEN 'Well-balanced'
        ELSE 'Simple and maintainable'
      END,
    'business_rule_coverage', 
      CASE 
        WHEN vr.validation_success_rate >= 95 THEN 'Comprehensive'
        WHEN vr.validation_success_rate >= 85 THEN 'Good'
        ELSE 'Incomplete'
      END,
    'operational_impact', 
      CASE 
        WHEN vr.operational_insights->>'performance_rating' IN ('Excellent', 'Good') THEN 'Minimal'
        WHEN vr.operational_insights->>'performance_rating' = 'Fair' THEN 'Moderate'
        ELSE 'Significant'
      END
  ) as governance_assessment

FROM validation_recommendations vr
ORDER BY 
  CASE vr.validation_quality
    WHEN 'Needs Improvement' THEN 1
    WHEN 'Fair' THEN 2
    WHEN 'Good' THEN 3
    ELSE 4
  END,
  vr.validation_success_rate ASC;

-- QueryLeaf provides comprehensive document validation capabilities:
-- 1. SQL-familiar schema definition syntax with JSON Schema integration
-- 2. Complex conditional validation based on document structure and business logic
-- 3. Custom validator functions with sophisticated business rule enforcement
-- 4. Comprehensive validation analytics and performance monitoring
-- 5. Schema evolution management with impact analysis and rollback capabilities
-- 6. Cross-field validation constraints with sophisticated dependency checking
-- 7. Flexible validation levels and actions for different operational requirements
-- 8. Rich error reporting and validation guidance for improved data quality
-- 9. Integration with MongoDB's native validation engine for optimal performance
-- 10. Enterprise-grade governance framework with compliance and audit support

Best Practices for MongoDB Document Validation Implementation

Schema Design and Governance Principles

Essential practices for implementing effective document validation in production environments:

  1. Schema Evolution Strategy: Design validation schemas that can evolve gracefully with application requirements while maintaining data integrity
  2. Graduated Validation Levels: Implement different validation strictness levels for development, staging, and production environments
  3. Business Rule Integration: Embed critical business logic into validation rules while maintaining flexibility for edge cases
  4. Performance Optimization: Balance comprehensive validation with performance requirements through selective field validation
  5. Error Message Quality: Provide clear, actionable error messages that help developers and users understand validation failures
  6. Conditional Validation: Use conditional validation rules that adapt based on document context and user roles

Operational Excellence and Monitoring

Optimize document validation for enterprise-scale deployments:

  1. Validation Analytics: Implement comprehensive monitoring of validation performance, success rates, and error patterns
  2. Schema Versioning: Maintain proper schema versioning with rollback capabilities for production safety
  3. Custom Validator Management: Develop reusable custom validators that can be shared across multiple collections
  4. Integration Testing: Create comprehensive test suites that validate schema changes against real-world data patterns
  5. Documentation Standards: Maintain clear documentation of validation rules and business logic for team collaboration
  6. Compliance Integration: Ensure validation rules support regulatory compliance requirements and audit trails

Conclusion

MongoDB document validation provides comprehensive schema enforcement capabilities that balance data integrity requirements with the flexibility benefits of document-oriented storage. The JSON Schema-based validation system enables sophisticated business rule enforcement while allowing schemas to evolve gracefully as applications grow and change, eliminating the rigid constraints and expensive migration procedures associated with traditional relational database schemas.

Key MongoDB document validation benefits include:

  • Flexible Schema Evolution: JSON Schema-based validation that adapts to changing requirements without expensive migrations
  • Rich Business Logic: Comprehensive validation rules that enforce complex business requirements and cross-field dependencies
  • Performance Optimization: Native MongoDB integration with intelligent validation processing and caching capabilities
  • Custom Validation: Extensible custom validator framework for specialized business rule enforcement
  • Operational Excellence: Comprehensive analytics, monitoring, and schema governance capabilities for production environments
  • SQL Accessibility: Familiar validation syntax through QueryLeaf for accessible enterprise schema management

Whether you're building user management systems, content management platforms, e-commerce applications, or any system requiring robust data integrity, MongoDB document validation with QueryLeaf's familiar SQL interface provides the foundation for maintainable, scalable, and compliant data validation solutions.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style validation schemas into MongoDB's native JSON Schema format while providing familiar constraint syntax for complex business rule enforcement. Advanced validation patterns, custom validators, and schema evolution capabilities are seamlessly accessible through SQL constructs, making sophisticated document validation both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's flexible validation capabilities with SQL-style schema definition makes it an ideal platform for applications requiring both robust data integrity and agile schema management, ensuring your validation rules can evolve with your business while maintaining data quality and compliance standards.

MongoDB Data Archiving and Lifecycle Management: Automated Retention Policies and Enterprise-Grade Data Governance

Enterprise applications accumulate vast amounts of operational data over time, requiring sophisticated data lifecycle management strategies that balance regulatory compliance, storage costs, query performance, and operational efficiency. Traditional database approaches to data archiving often involve complex manual processes, inefficient storage patterns, and limited automation capabilities that become increasingly problematic as data volumes scale to petabytes and compliance requirements become more stringent.

MongoDB provides comprehensive data lifecycle management capabilities through automated retention policies, intelligent archiving strategies, and compliance-aware data governance frameworks. Unlike traditional databases that require external tools and complex ETL processes for data archiving, MongoDB enables native lifecycle management with TTL collections, automated tiering, and sophisticated retention policies that seamlessly integrate with modern data governance requirements.

The Traditional Data Archiving Challenge

Conventional database archiving approaches suffer from significant complexity and operational overhead:

-- Traditional PostgreSQL data archiving - complex manual processes and limited automation

-- Complex partitioned table structure for lifecycle management
CREATE TABLE customer_interactions (
    interaction_id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    interaction_type VARCHAR(50) NOT NULL,
    channel VARCHAR(50) NOT NULL,
    interaction_data JSONB,
    interaction_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Compliance and governance fields
    data_classification VARCHAR(20) DEFAULT 'internal',
    retention_category VARCHAR(50) DEFAULT 'standard',
    compliance_flags JSONB,

    -- Manual archiving tracking
    archived_status VARCHAR(20) DEFAULT 'active',
    archive_eligible_date DATE,
    archive_priority INTEGER DEFAULT 5,

    -- Audit trail for lifecycle events
    lifecycle_events JSONB DEFAULT '[]',

    -- Performance optimization
    created_date DATE GENERATED ALWAYS AS (interaction_timestamp::date) STORED

) PARTITION BY RANGE (created_date);

-- Create monthly partitions (requires constant manual maintenance)
CREATE TABLE customer_interactions_2023_01 PARTITION OF customer_interactions
    FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
CREATE TABLE customer_interactions_2023_02 PARTITION OF customer_interactions
    FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
CREATE TABLE customer_interactions_2023_03 PARTITION OF customer_interactions
    FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');
-- ... manual partition creation continues indefinitely

-- Complex stored procedure for manual archiving process
CREATE OR REPLACE FUNCTION archive_old_customer_interactions(
    archive_threshold_days INTEGER DEFAULT 365,
    batch_size INTEGER DEFAULT 1000
) RETURNS TABLE (
    processed_count INTEGER,
    archived_count INTEGER,
    deleted_count INTEGER,
    error_count INTEGER,
    processing_summary JSONB
) AS $$
DECLARE
    cutoff_date DATE := CURRENT_DATE - INTERVAL '1 day' * archive_threshold_days;
    batch_record RECORD;
    processed_total INTEGER := 0;
    archived_total INTEGER := 0;
    deleted_total INTEGER := 0;
    error_total INTEGER := 0;
    current_partition TEXT;
    archive_table_name TEXT;
    batch_cursor CURSOR FOR
        SELECT schemaname, tablename 
        FROM pg_tables 
        WHERE tablename LIKE 'customer_interactions_____'
        AND tablename < 'customer_interactions_' || to_char(cutoff_date, 'YYYY_MM')
        ORDER BY tablename;
BEGIN
    -- Process each partition individually (extremely inefficient)
    FOR batch_record IN batch_cursor LOOP
        current_partition := batch_record.schemaname || '.' || batch_record.tablename;
        archive_table_name := 'archive_' || batch_record.tablename;

        BEGIN
            -- Create archive table if it doesn't exist
            EXECUTE format('
                CREATE TABLE IF NOT EXISTS %I (
                    LIKE %I INCLUDING ALL
                ) INHERITS (customer_interactions_archive)', 
                archive_table_name, current_partition);

            -- Copy data to archive table with complex validation
            EXECUTE format('
                WITH archive_candidates AS (
                    SELECT *,
                        -- Complex compliance validation
                        CASE 
                            WHEN data_classification = ''confidential'' AND 
                                 CURRENT_DATE - created_date > INTERVAL ''7 years'' THEN ''expired_confidential''
                            WHEN data_classification = ''public'' AND 
                                 CURRENT_DATE - created_date > INTERVAL ''3 years'' THEN ''expired_public''
                            WHEN compliance_flags ? ''gdpr_subject'' AND 
                                 CURRENT_DATE - created_date > INTERVAL ''6 years'' THEN ''gdpr_expired''
                            WHEN compliance_flags ? ''financial_record'' AND 
                                 CURRENT_DATE - created_date > INTERVAL ''7 years'' THEN ''financial_expired''
                            ELSE ''active''
                        END as archive_status
                    FROM %I
                    WHERE created_date < %L
                ),
                archive_insertions AS (
                    INSERT INTO %I 
                    SELECT 
                        ac.*,
                        -- Add archiving metadata
                        ac.lifecycle_events || jsonb_build_array(
                            jsonb_build_object(
                                ''event'', ''archived'',
                                ''timestamp'', CURRENT_TIMESTAMP,
                                ''archive_reason'', ac.archive_status,
                                ''archive_batch'', %L
                            )
                        ) as lifecycle_events
                    FROM archive_candidates ac
                    WHERE ac.archive_status != ''active''
                    RETURNING interaction_id
                )
                SELECT COUNT(*) FROM archive_insertions',
                current_partition, cutoff_date, archive_table_name, 
                'batch_' || extract(epoch from now())::text
            ) INTO archived_total;

            processed_total := processed_total + archived_total;

            -- Delete archived records from active table (risky operation)
            EXECUTE format('
                DELETE FROM %I 
                WHERE created_date < %L 
                AND interaction_id IN (
                    SELECT interaction_id FROM %I 
                    WHERE archive_status != ''active''
                )', current_partition, cutoff_date, archive_table_name);

            GET DIAGNOSTICS deleted_total = ROW_COUNT;

            -- Log archiving operation
            INSERT INTO archiving_audit_log (
                table_name, 
                archive_date, 
                records_archived, 
                records_deleted,
                archive_table_name
            ) VALUES (
                current_partition, 
                CURRENT_TIMESTAMP, 
                archived_total, 
                deleted_total,
                archive_table_name
            );

        EXCEPTION WHEN OTHERS THEN
            error_total := error_total + 1;
            INSERT INTO archiving_error_log (
                table_name,
                error_message,
                error_timestamp,
                sqlstate
            ) VALUES (
                current_partition,
                SQLERRM,
                CURRENT_TIMESTAMP,
                SQLSTATE
            );
        END;
    END LOOP;

    RETURN QUERY SELECT 
        processed_total,
        archived_total,
        deleted_total,
        error_total,
        jsonb_build_object(
            'processing_timestamp', CURRENT_TIMESTAMP,
            'archive_threshold_days', archive_threshold_days,
            'batch_size', batch_size,
            'cutoff_date', cutoff_date
        );
END;
$$ LANGUAGE plpgsql;

-- Complex compliance-aware data retention management
WITH data_classification_rules AS (
    SELECT 
        'confidential' as classification,
        ARRAY['financial_record', 'personal_data', 'health_info'] as compliance_tags,
        7 * 365 as retention_days,
        true as encryption_required,
        'secure_deletion' as deletion_method
    UNION ALL
    SELECT 
        'internal' as classification,
        ARRAY['business_record', 'operational_data'] as compliance_tags,
        5 * 365 as retention_days,
        false as encryption_required,
        'standard_deletion' as deletion_method
    UNION ALL
    SELECT 
        'public' as classification,
        ARRAY['marketing_data', 'public_interaction'] as compliance_tags,
        3 * 365 as retention_days,
        false as encryption_required,
        'standard_deletion' as deletion_method
),
retention_analysis AS (
    SELECT 
        ci.interaction_id,
        ci.customer_id,
        ci.data_classification,
        ci.compliance_flags,
        ci.created_date,

        -- Match with retention rules
        dcr.retention_days,
        dcr.encryption_required,
        dcr.deletion_method,

        -- Calculate retention status
        CASE 
            WHEN CURRENT_DATE - ci.created_date > INTERVAL '1 day' * dcr.retention_days THEN 'expired'
            WHEN CURRENT_DATE - ci.created_date > INTERVAL '1 day' * (dcr.retention_days - 30) THEN 'expiring_soon'
            ELSE 'active'
        END as retention_status,

        -- Check for legal holds
        CASE 
            WHEN EXISTS (
                SELECT 1 FROM legal_holds lh 
                WHERE lh.customer_id = ci.customer_id 
                AND lh.status = 'active'
                AND lh.hold_type && ARRAY(SELECT jsonb_array_elements_text(ci.compliance_flags))
            ) THEN 'legal_hold'
            ELSE 'normal_retention'
        END as legal_status,

        -- Complex GDPR compliance checks
        CASE 
            WHEN ci.compliance_flags ? 'gdpr_subject' THEN
                CASE 
                    WHEN EXISTS (
                        SELECT 1 FROM gdpr_deletion_requests gdr 
                        WHERE gdr.customer_id = ci.customer_id 
                        AND gdr.status = 'approved'
                    ) THEN 'gdpr_deletion_required'
                    WHEN CURRENT_DATE - ci.created_date > INTERVAL '6 years' THEN 'gdpr_retention_expired'
                    ELSE 'gdpr_compliant'
                END
            ELSE 'gdpr_not_applicable'
        END as gdpr_status

    FROM customer_interactions ci
    JOIN data_classification_rules dcr ON ci.data_classification = dcr.classification
    WHERE ci.archived_status = 'active'
),
complex_retention_actions AS (
    SELECT 
        ra.*,

        -- Determine required action
        CASE 
            WHEN ra.legal_status = 'legal_hold' THEN 'maintain_with_hold'
            WHEN ra.gdpr_status = 'gdpr_deletion_required' THEN 'immediate_deletion'
            WHEN ra.gdpr_status = 'gdpr_retention_expired' THEN 'gdpr_compliant_deletion'
            WHEN ra.retention_status = 'expired' THEN 'archive_and_purge'
            WHEN ra.retention_status = 'expiring_soon' THEN 'prepare_for_archival'
            ELSE 'no_action_required'
        END as required_action,

        -- Calculate priority
        CASE 
            WHEN ra.gdpr_status IN ('gdpr_deletion_required', 'gdpr_retention_expired') THEN 1
            WHEN ra.retention_status = 'expired' AND ra.encryption_required THEN 2
            WHEN ra.retention_status = 'expired' THEN 3
            WHEN ra.retention_status = 'expiring_soon' THEN 4
            ELSE 5
        END as action_priority,

        -- Estimate processing complexity
        CASE 
            WHEN ra.encryption_required AND ra.gdpr_status != 'gdpr_not_applicable' THEN 'high_complexity'
            WHEN ra.encryption_required OR ra.gdpr_status != 'gdpr_not_applicable' THEN 'medium_complexity'
            ELSE 'low_complexity'
        END as processing_complexity

    FROM retention_analysis ra
),
action_summary AS (
    SELECT 
        required_action,
        processing_complexity,
        action_priority,
        COUNT(*) as record_count,

        -- Group by customer to handle GDPR requests efficiently
        COUNT(DISTINCT customer_id) as affected_customers,

        -- Calculate processing estimates
        CASE processing_complexity
            WHEN 'high_complexity' THEN COUNT(*) * 5  -- 5 seconds per record
            WHEN 'medium_complexity' THEN COUNT(*) * 2  -- 2 seconds per record
            ELSE COUNT(*) * 0.5  -- 0.5 seconds per record
        END as estimated_processing_time_seconds,

        -- Group compliance requirements
        array_agg(DISTINCT data_classification) as data_classifications_affected,
        array_agg(DISTINCT gdpr_status) as gdpr_statuses,
        array_agg(DISTINCT legal_status) as legal_statuses

    FROM complex_retention_actions
    WHERE required_action != 'no_action_required'
    GROUP BY required_action, processing_complexity, action_priority
)

SELECT 
    required_action,
    processing_complexity,
    action_priority,
    record_count,
    affected_customers,
    ROUND(estimated_processing_time_seconds / 3600.0, 2) as estimated_hours,
    data_classifications_affected,
    gdpr_statuses,
    legal_statuses,

    -- Provide actionable recommendations
    CASE required_action
        WHEN 'immediate_deletion' THEN 'Execute secure deletion within 72 hours to comply with GDPR'
        WHEN 'gdpr_compliant_deletion' THEN 'Schedule deletion batch during maintenance window'
        WHEN 'archive_and_purge' THEN 'Move to cold storage then schedule purge after verification'
        WHEN 'prepare_for_archival' THEN 'Begin archival preparation and stakeholder notification'
        WHEN 'maintain_with_hold' THEN 'Maintain records due to legal hold - no action until hold lifted'
        ELSE 'Review retention policy alignment'
    END as recommended_action

FROM action_summary
ORDER BY action_priority, estimated_processing_time_seconds DESC;

-- Problems with traditional data archiving approaches:
-- 1. Manual partition management creates operational overhead and human error risk
-- 2. Complex compliance validation requires extensive custom logic and maintenance
-- 3. No automated lifecycle management - everything requires manual scheduling
-- 4. Limited integration with modern compliance frameworks (GDPR, CCPA, SOX)
-- 5. Expensive cold storage integration requires external tools and ETL processes
-- 6. Poor performance for cross-partition queries during archival operations
-- 7. Complex error handling and rollback mechanisms for failed archival operations
-- 8. No automated cost optimization based on data access patterns
-- 9. Difficult integration with cloud storage tiers and automated cost management
-- 10. Limited audit trails and compliance reporting for data governance requirements

-- Attempt at automated retention with limited PostgreSQL capabilities
CREATE OR REPLACE FUNCTION automated_retention_policy()
RETURNS void AS $$
DECLARE
    policy_record RECORD;
    retention_cursor CURSOR FOR
        SELECT 
            table_name,
            retention_days,
            archive_method,
            deletion_method
        FROM data_retention_policies
        WHERE enabled = true;
BEGIN
    -- Limited automation through basic stored procedures
    FOR policy_record IN retention_cursor LOOP
        -- Execute retention policy (basic implementation)
        EXECUTE format('
            DELETE FROM %I 
            WHERE created_date < CURRENT_DATE - INTERVAL ''%s days''
            AND archived_status = ''eligible_for_deletion''',
            policy_record.table_name,
            policy_record.retention_days
        );

        -- Log retention execution (basic logging)
        INSERT INTO retention_execution_log (
            table_name,
            execution_date,
            records_processed,
            policy_applied
        ) VALUES (
            policy_record.table_name,
            CURRENT_TIMESTAMP,
            ROW_COUNT,
            'automated_retention'
        );
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Schedule retention policy (requires external cron job)
-- SELECT cron.schedule('retention-policy', '0 2 * * 0', 'SELECT automated_retention_policy();');

-- Traditional limitations:
-- 1. No intelligent data tiering based on access patterns
-- 2. Limited support for compliance-aware automated retention
-- 3. No integration with modern cloud storage tiers
-- 4. Complex manual processes for data lifecycle management
-- 5. Poor support for real-time compliance reporting
-- 6. Limited automation capabilities requiring external orchestration
-- 7. No built-in support for legal hold management
-- 8. Difficult integration with data governance frameworks
-- 9. No automated cost optimization or storage tier management
-- 10. Complex backup and recovery for archived data across multiple storage systems

MongoDB provides comprehensive automated data lifecycle management:

// MongoDB Advanced Data Archiving and Lifecycle Management - automated retention with enterprise governance
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('enterprise_data_governance');

// Comprehensive Data Lifecycle Management System
class AdvancedDataLifecycleManager {
  constructor(db, governanceConfig = {}) {
    this.db = db;
    this.collections = {
      customers: db.collection('customers'),
      interactions: db.collection('customer_interactions'),
      orders: db.collection('orders'),
      payments: db.collection('payments'),

      // Archive collections
      archivedInteractions: db.collection('archived_interactions'),
      archivedOrders: db.collection('archived_orders'),

      // Governance and compliance tracking
      retentionPolicies: db.collection('retention_policies'),
      complianceAuditLog: db.collection('compliance_audit_log'),
      legalHolds: db.collection('legal_holds'),
      dataClassifications: db.collection('data_classifications'),
      lifecycleEvents: db.collection('lifecycle_events'),
      governanceMetrics: db.collection('governance_metrics')
    };

    // Advanced governance configuration
    this.governanceConfig = {
      // Automated retention policies
      enableAutomatedRetention: governanceConfig.enableAutomatedRetention !== false,
      enableIntelligentTiering: governanceConfig.enableIntelligentTiering !== false,
      enableComplianceAutomation: governanceConfig.enableComplianceAutomation !== false,

      // Compliance frameworks
      gdprCompliance: governanceConfig.gdprCompliance !== false,
      ccpaCompliance: governanceConfig.ccpaCompliance || false,
      soxCompliance: governanceConfig.soxCompliance || false,
      hipaaCompliance: governanceConfig.hipaaCompliance || false,

      // Data classification and protection
      enableDataClassification: governanceConfig.enableDataClassification !== false,
      enableEncryptionAtRest: governanceConfig.enableEncryptionAtRest !== false,
      enableSecureDeletion: governanceConfig.enableSecureDeletion !== false,

      // Storage optimization
      enableCloudStorageTiering: governanceConfig.enableCloudStorageTiering || false,
      enableCostOptimization: governanceConfig.enableCostOptimization !== false,
      enableAutomatedArchiving: governanceConfig.enableAutomatedArchiving !== false,

      // Monitoring and reporting
      enableComplianceReporting: governanceConfig.enableComplianceReporting !== false,
      enableAuditTrails: governanceConfig.enableAuditTrails !== false,
      enableGovernanceMetrics: governanceConfig.enableGovernanceMetrics !== false,

      // Default retention periods (in days)
      defaultRetentionPeriods: {
        confidential: 2555,  // 7 years
        internal: 1825,      // 5 years
        public: 1095,        // 3 years
        temporary: 90        // 90 days
      },

      // Archival and deletion policies
      archivalConfig: {
        warmToColStorageThreshold: 90,  // Days
        coldToFrozenThreshold: 365,     // Days
        deletionGracePeriod: 30,        // Days
        batchProcessingSize: 1000,
        enableProgressiveArchival: true
      }
    };

    this.initializeDataGovernance();
  }

  async initializeDataGovernance() {
    console.log('Initializing advanced data governance and lifecycle management...');

    try {
      // Setup automated retention policies
      await this.setupAutomatedRetentionPolicies();

      // Initialize data classification framework
      await this.setupDataClassificationFramework();

      // Setup compliance automation
      await this.setupComplianceAutomation();

      // Initialize intelligent archiving
      await this.setupIntelligentArchiving();

      // Setup governance monitoring
      await this.setupGovernanceMonitoring();

      console.log('Data governance system initialized successfully');

    } catch (error) {
      console.error('Error initializing data governance:', error);
      throw error;
    }
  }

  async setupAutomatedRetentionPolicies() {
    console.log('Setting up automated retention policies with TTL and lifecycle rules...');

    try {
      // Customer interactions with automated TTL based on data classification
      await this.collections.interactions.createIndex(
        { "dataGovernance.retentionExpiry": 1 },
        { 
          expireAfterSeconds: 0,
          background: true,
          name: "automated_retention_policy"
        }
      );

      // Setup sophisticated retention policy framework
      const retentionPolicies = [
        {
          _id: new ObjectId(),
          policyName: 'customer_interactions_retention',
          description: 'Automated retention for customer interaction data based on classification and compliance',

          // Collection and criteria configuration
          targetCollections: ['customer_interactions'],
          retentionCriteria: {
            confidential: {
              retentionPeriod: 2555, // 7 years
              complianceFrameworks: ['SOX', 'Financial_Records'],
              secureDelete: true,
              encryptionRequired: true
            },
            internal: {
              retentionPeriod: 1825, // 5 years
              complianceFrameworks: ['Business_Records'],
              secureDelete: false,
              encryptionRequired: false
            },
            public: {
              retentionPeriod: 1095, // 3 years
              complianceFrameworks: ['Marketing_Data'],
              secureDelete: false,
              encryptionRequired: false
            },
            gdpr_subject: {
              retentionPeriod: 2190, // 6 years
              complianceFrameworks: ['GDPR'],
              rightToErasure: true,
              secureDelete: true
            }
          },

          // Advanced policy configuration
          policyConfig: {
            enableLegalHoldRespect: true,
            enableGdprCompliance: true,
            enableProgressiveArchival: true,
            enableCostOptimization: true,
            batchProcessingSize: 1000,
            executionSchedule: 'daily',
            timezoneHandling: 'UTC'
          },

          // Automation and monitoring
          automationSettings: {
            enableAutomaticExecution: true,
            enableNotifications: true,
            enableAuditLogging: true,
            enableComplianceReporting: true,
            executionWindow: { start: '02:00', end: '06:00' }
          },

          // Governance metadata
          governance: {
            createdBy: 'system',
            createdAt: new Date(),
            approvedBy: 'compliance_team',
            approvedAt: new Date(),
            lastReviewDate: new Date(),
            nextReviewDate: new Date(Date.now() + 365 * 24 * 60 * 60 * 1000), // 1 year
            complianceStatus: 'approved'
          }
        },

        {
          _id: new ObjectId(),
          policyName: 'order_data_retention',
          description: 'Financial and order data retention with enhanced compliance tracking',

          targetCollections: ['orders', 'payments'],
          retentionCriteria: {
            financial_record: {
              retentionPeriod: 2920, // 8 years for financial records
              complianceFrameworks: ['SOX', 'Tax_Records', 'Financial_Regulations'],
              secureDelete: true,
              encryptionRequired: true,
              auditTrailRequired: true
            },
            standard_order: {
              retentionPeriod: 2555, // 7 years
              complianceFrameworks: ['Business_Records'],
              secureDelete: false,
              encryptionRequired: false,
              auditTrailRequired: false
            }
          },

          policyConfig: {
            enableLegalHoldRespect: true,
            enableTaxCompliancet: true,
            enableFinancialAuditSupport: true,
            batchProcessingSize: 500,
            executionSchedule: 'weekly',
            requireManualApproval: true // Financial data requires manual approval
          },

          governance: {
            createdBy: 'finance_team',
            approvedBy: 'compliance_officer',
            complianceStatus: 'approved',
            regulatoryAlignment: ['SOX', 'Tax_Regulations', 'Financial_Compliance']
          }
        }
      ];

      // Insert retention policies
      await this.collections.retentionPolicies.insertMany(retentionPolicies);

      console.log('Automated retention policies configured successfully');

    } catch (error) {
      console.error('Error setting up retention policies:', error);
      throw error;
    }
  }

  async setupDataClassificationFramework() {
    console.log('Setting up data classification framework for automated governance...');

    const classificationFramework = {
      _id: new ObjectId(),
      frameworkName: 'enterprise_data_classification',
      version: '2.1',

      // Data sensitivity levels
      sensitivityLevels: {
        public: {
          level: 0,
          description: 'Information available to general public',
          handlingRequirements: {
            encryption: false,
            accessControl: 'none',
            auditLogging: false,
            retentionPeriod: 1095 // 3 years
          },
          complianceFrameworks: []
        },

        internal: {
          level: 1,
          description: 'Internal business information',
          handlingRequirements: {
            encryption: false,
            accessControl: 'basic',
            auditLogging: true,
            retentionPeriod: 1825 // 5 years
          },
          complianceFrameworks: ['Business_Records']
        },

        confidential: {
          level: 2,
          description: 'Sensitive business information requiring protection',
          handlingRequirements: {
            encryption: true,
            accessControl: 'role_based',
            auditLogging: true,
            retentionPeriod: 2555, // 7 years
            secureDelete: true
          },
          complianceFrameworks: ['SOX', 'Business_Confidential']
        },

        restricted: {
          level: 3,
          description: 'Highly sensitive information with strict access controls',
          handlingRequirements: {
            encryption: true,
            accessControl: 'multi_factor',
            auditLogging: true,
            retentionPeriod: 2555, // 7 years
            secureDelete: true,
            approvalRequired: true
          },
          complianceFrameworks: ['SOX', 'Financial_Records', 'Executive_Information']
        }
      },

      // Data categories with specific handling requirements
      dataCategories: {
        personal_data: {
          category: 'personal_data',
          description: 'Personally identifiable information subject to privacy regulations',
          sensitivityLevel: 'confidential',
          specialHandling: {
            gdprApplicable: true,
            ccpaApplicable: true,
            rightToErasure: true,
            dataSubjectRights: true,
            consentTracking: true,
            retentionPeriod: 2190 // 6 years for GDPR
          },
          complianceFrameworks: ['GDPR', 'CCPA', 'Privacy_Regulations']
        },

        financial_data: {
          category: 'financial_data',
          description: 'Financial transactions and accounting information',
          sensitivityLevel: 'restricted',
          specialHandling: {
            soxApplicable: true,
            taxRecordRetention: true,
            auditTrailRequired: true,
            encryptionRequired: true,
            retentionPeriod: 2920 // 8 years for tax records
          },
          complianceFrameworks: ['SOX', 'Tax_Regulations', 'Financial_Compliance']
        },

        health_information: {
          category: 'health_information',
          description: 'Protected health information subject to HIPAA',
          sensitivityLevel: 'restricted',
          specialHandling: {
            hipaaApplicable: true,
            encryptionRequired: true,
            accessLoggingRequired: true,
            minimumNecessaryRule: true,
            retentionPeriod: 2190 // 6 years for health records
          },
          complianceFrameworks: ['HIPAA', 'Health_Privacy']
        },

        business_records: {
          category: 'business_records',
          description: 'General business operational data',
          sensitivityLevel: 'internal',
          specialHandling: {
            businessRecordRetention: true,
            auditSupport: true,
            retentionPeriod: 1825 // 5 years
          },
          complianceFrameworks: ['Business_Records']
        }
      },

      // Automated classification rules
      classificationRules: {
        piiDetection: {
          enabled: true,
          patterns: [
            { field: 'email', pattern: /^[^\s@]+@[^\s@]+\.[^\s@]+$/, classification: 'personal_data' },
            { field: 'phone', pattern: /^\+?[\d\s\-\(\)]{10,}$/, classification: 'personal_data' },
            { field: 'ssn', pattern: /^\d{3}-?\d{2}-?\d{4}$/, classification: 'personal_data' },
            { field: 'credit_card', pattern: /^\d{4}[\s\-]?\d{4}[\s\-]?\d{4}[\s\-]?\d{4}$/, classification: 'financial_data' }
          ]
        },

        financialDataDetection: {
          enabled: true,
          indicators: [
            { fieldNames: ['amount', 'price', 'total', 'payment'], classification: 'financial_data' },
            { fieldNames: ['account_number', 'routing_number'], classification: 'financial_data' },
            { collectionNames: ['payments', 'transactions', 'invoices'], classification: 'financial_data' }
          ]
        },

        healthDataDetection: {
          enabled: true,
          indicators: [
            { fieldNames: ['medical_record', 'diagnosis', 'treatment'], classification: 'health_information' },
            { fieldNames: ['patient_id', 'medical_history'], classification: 'health_information' }
          ]
        }
      },

      // Governance metadata
      governance: {
        frameworkOwner: 'data_governance_team',
        lastUpdated: new Date(),
        nextReview: new Date(Date.now() + 180 * 24 * 60 * 60 * 1000), // 6 months
        approvalStatus: 'approved',
        version: '2.1'
      }
    };

    await this.collections.dataClassifications.replaceOne(
      { frameworkName: 'enterprise_data_classification' },
      classificationFramework,
      { upsert: true }
    );

    console.log('Data classification framework established');
  }

  async executeAutomatedRetentionPolicy(policyName = null) {
    console.log(`Executing automated retention policies${policyName ? ` for: ${policyName}` : ''}...`);
    const executionStart = new Date();

    try {
      // Get active retention policies
      const policies = policyName ? 
        await this.collections.retentionPolicies.find({ policyName: policyName, 'governance.complianceStatus': 'approved' }).toArray() :
        await this.collections.retentionPolicies.find({ 'governance.complianceStatus': 'approved' }).toArray();

      const executionResults = [];

      for (const policy of policies) {
        console.log(`Processing retention policy: ${policy.policyName}`);

        const policyResult = await this.executeIndividualRetentionPolicy(policy);
        executionResults.push({
          policyName: policy.policyName,
          ...policyResult
        });

        // Log policy execution
        await this.logRetentionPolicyExecution(policy, policyResult);
      }

      // Generate comprehensive execution summary
      const executionSummary = await this.generateRetentionExecutionSummary(executionResults, executionStart);

      return executionSummary;

    } catch (error) {
      console.error('Error executing retention policies:', error);
      await this.logRetentionPolicyError(error, { policyName, executionStart });
      throw error;
    }
  }

  async executeIndividualRetentionPolicy(policy) {
    console.log(`Executing policy: ${policy.policyName}`);
    const policyStart = new Date();

    const results = {
      documentsProcessed: 0,
      documentsArchived: 0,
      documentsDeleted: 0,
      documentsSkipped: 0,
      errors: [],
      legalHoldsRespected: 0,
      complianceActionsPerformed: 0
    };

    try {
      for (const collectionName of policy.targetCollections) {
        const collection = this.db.collection(collectionName);

        // Process each retention criteria
        for (const [classification, criteria] of Object.entries(policy.retentionCriteria)) {
          console.log(`Processing classification: ${classification} for collection: ${collectionName}`);

          const classificationResult = await this.processRetentionCriteria(
            collection, 
            classification, 
            criteria, 
            policy.policyConfig
          );

          // Aggregate results
          results.documentsProcessed += classificationResult.documentsProcessed;
          results.documentsArchived += classificationResult.documentsArchived;
          results.documentsDeleted += classificationResult.documentsDeleted;
          results.documentsSkipped += classificationResult.documentsSkipped;
          results.legalHoldsRespected += classificationResult.legalHoldsRespected;
          results.complianceActionsPerformed += classificationResult.complianceActionsPerformed;

          if (classificationResult.errors.length > 0) {
            results.errors.push(...classificationResult.errors);
          }
        }
      }

      results.processingTime = Date.now() - policyStart.getTime();
      results.success = true;

      return results;

    } catch (error) {
      console.error(`Error executing policy ${policy.policyName}:`, error);
      results.success = false;
      results.error = error.message;
      results.processingTime = Date.now() - policyStart.getTime();
      return results;
    }
  }

  async processRetentionCriteria(collection, classification, criteria, policyConfig) {
    console.log(`Processing retention criteria for classification: ${classification}`);

    const results = {
      documentsProcessed: 0,
      documentsArchived: 0,
      documentsDeleted: 0,
      documentsSkipped: 0,
      legalHoldsRespected: 0,
      complianceActionsPerformed: 0,
      errors: []
    };

    try {
      // Calculate retention cutoff date
      const retentionCutoffDate = new Date(Date.now() - criteria.retentionPeriod * 24 * 60 * 60 * 1000);

      // Build query for documents eligible for retention processing
      const retentionQuery = {
        'dataGovernance.classification': classification,
        'dataGovernance.createdAt': { $lt: retentionCutoffDate },

        // Exclude documents under legal hold
        ...(policyConfig.enableLegalHoldRespect && {
          'dataGovernance.legalHold.status': { $ne: 'active' }
        }),

        // Include GDPR-specific filtering
        ...(policyConfig.enableGdprCompliance && classification === 'gdpr_subject' && {
          $or: [
            { 'dataGovernance.gdpr.consentStatus': 'withdrawn' },
            { 'dataGovernance.gdpr.retentionExpiry': { $lt: new Date() } }
          ]
        })
      };

      // Process documents in batches
      const batchSize = policyConfig.batchProcessingSize || 1000;
      let batchOffset = 0;
      let hasMoreDocuments = true;

      while (hasMoreDocuments) {
        const documentsToProcess = await collection.find(retentionQuery)
          .skip(batchOffset)
          .limit(batchSize)
          .toArray();

        if (documentsToProcess.length === 0) {
          hasMoreDocuments = false;
          break;
        }

        // Process each document
        for (const document of documentsToProcess) {
          try {
            const processingResult = await this.processDocumentRetention(
              collection, 
              document, 
              classification, 
              criteria, 
              policyConfig
            );

            // Update results based on processing outcome
            results.documentsProcessed++;

            switch (processingResult.action) {
              case 'archived':
                results.documentsArchived++;
                break;
              case 'deleted':
                results.documentsDeleted++;
                break;
              case 'skipped':
                results.documentsSkipped++;
                break;
              case 'legal_hold_respected':
                results.legalHoldsRespected++;
                results.documentsSkipped++;
                break;
            }

            if (processingResult.complianceAction) {
              results.complianceActionsPerformed++;
            }

          } catch (error) {
            console.error(`Error processing document ${document._id}:`, error);
            results.errors.push({
              documentId: document._id,
              error: error.message,
              classification: classification
            });
          }
        }

        batchOffset += batchSize;

        // Add processing delay to avoid overwhelming the database
        await new Promise(resolve => setTimeout(resolve, 100));
      }

      return results;

    } catch (error) {
      console.error(`Error processing retention criteria for ${classification}:`, error);
      results.errors.push({
        classification: classification,
        error: error.message
      });
      return results;
    }
  }

  async processDocumentRetention(collection, document, classification, criteria, policyConfig) {
    console.log(`Processing document retention for ${document._id}`);

    try {
      // Check for legal holds
      if (policyConfig.enableLegalHoldRespect && document.dataGovernance?.legalHold?.status === 'active') {
        await this.logGovernanceEvent({
          documentId: document._id,
          collection: collection.collectionName,
          action: 'retention_blocked_legal_hold',
          classification: classification,
          legalHoldId: document.dataGovernance.legalHold.holdId,
          timestamp: new Date()
        });

        return { action: 'legal_hold_respected', complianceAction: true };
      }

      // Check GDPR right to erasure
      if (policyConfig.enableGdprCompliance && 
          document.dataGovernance?.gdpr?.rightToErasureRequested) {

        await this.executeGdprErasure(collection, document);

        await this.logGovernanceEvent({
          documentId: document._id,
          collection: collection.collectionName,
          action: 'gdpr_right_to_erasure',
          classification: classification,
          timestamp: new Date()
        });

        return { action: 'deleted', complianceAction: true };
      }

      // Determine appropriate retention action
      if (criteria.secureDelete || policyConfig.requireManualApproval) {
        // Archive first, then schedule for deletion
        await this.archiveDocument(collection, document, criteria);

        return { action: 'archived', complianceAction: false };
      } else {
        // Direct deletion for non-sensitive data
        await this.deleteDocumentWithAuditTrail(collection, document, criteria);

        return { action: 'deleted', complianceAction: false };
      }

    } catch (error) {
      console.error(`Error processing document retention for ${document._id}:`, error);
      throw error;
    }
  }

  async archiveDocument(collection, document, criteria) {
    console.log(`Archiving document ${document._id} to cold storage...`);

    try {
      // Prepare archived document with governance metadata
      const archivedDocument = {
        ...document,
        archivedMetadata: {
          originalCollection: collection.collectionName,
          archiveDate: new Date(),
          archiveReason: 'automated_retention_policy',
          retentionCriteria: criteria,
          archiveId: new ObjectId()
        },
        dataGovernance: {
          ...document.dataGovernance,
          lifecycleStage: 'archived',
          archiveTimestamp: new Date(),
          scheduledDeletion: criteria.secureDelete ? 
            new Date(Date.now() + 30 * 24 * 60 * 60 * 1000) : null // 30 day grace period
        }
      };

      // Insert into archive collection
      const archiveCollectionName = `archived_${collection.collectionName}`;
      await this.db.collection(archiveCollectionName).insertOne(archivedDocument);

      // Remove from active collection
      await collection.deleteOne({ _id: document._id });

      // Log archival event
      await this.logGovernanceEvent({
        documentId: document._id,
        collection: collection.collectionName,
        action: 'document_archived',
        archiveCollection: archiveCollectionName,
        archiveId: archivedDocument.archivedMetadata.archiveId,
        retentionCriteria: criteria,
        timestamp: new Date()
      });

      console.log(`Document ${document._id} archived successfully`);

    } catch (error) {
      console.error(`Error archiving document ${document._id}:`, error);
      throw error;
    }
  }

  async executeGdprErasure(collection, document) {
    console.log(`Executing GDPR right to erasure for document ${document._id}...`);

    try {
      // Log GDPR erasure before deletion (compliance requirement)
      await this.logGovernanceEvent({
        documentId: document._id,
        collection: collection.collectionName,
        action: 'gdpr_right_to_erasure_executed',
        gdprRequestId: document.dataGovernance?.gdpr?.erasureRequestId,
        dataSubject: document.dataGovernance?.gdpr?.dataSubject,
        timestamp: new Date(),
        legalBasis: 'GDPR Article 17 - Right to Erasure'
      });

      // Perform secure deletion
      await this.secureDeleteDocument(collection, document);

      // Update GDPR compliance tracking
      await this.updateGdprComplianceStatus(
        document.dataGovernance?.gdpr?.erasureRequestId, 
        'completed'
      );

      console.log(`GDPR erasure completed for document ${document._id}`);

    } catch (error) {
      console.error(`Error executing GDPR erasure for document ${document._id}:`, error);
      throw error;
    }
  }

  async secureDeleteDocument(collection, document) {
    console.log(`Performing secure deletion for document ${document._id}...`);

    try {
      // Create deletion audit record
      const deletionAudit = {
        _id: new ObjectId(),
        originalDocumentId: document._id,
        originalCollection: collection.collectionName,
        deletionTimestamp: new Date(),
        deletionMethod: 'secure_deletion',
        deletionReason: 'automated_retention_policy',
        documentHash: this.generateDocumentHash(document),
        complianceFrameworks: document.dataGovernance?.complianceFrameworks || [],
        auditRetentionPeriod: new Date(Date.now() + 10 * 365 * 24 * 60 * 60 * 1000) // 10 years
      };

      // Store deletion audit record
      await this.collections.complianceAuditLog.insertOne(deletionAudit);

      // Delete the actual document
      await collection.deleteOne({ _id: document._id });

      console.log(`Secure deletion completed for document ${document._id}`);

    } catch (error) {
      console.error(`Error performing secure deletion for document ${document._id}:`, error);
      throw error;
    }
  }

  async setupIntelligentArchiving() {
    console.log('Setting up intelligent archiving with automated tiering...');

    try {
      // Create TTL indexes for different tiers
      const archivingIndexes = [
        {
          collection: 'customer_interactions',
          index: { "dataGovernance.warmToColumnTierDate": 1 },
          options: { 
            expireAfterSeconds: 0,
            background: true,
            name: "warm_to_column_tiering"
          }
        },
        {
          collection: 'customer_interactions',
          index: { "dataGovernance.coldToFrozenTierDate": 1 },
          options: { 
            expireAfterSeconds: 0,
            background: true,
            name: "cold_to_frozen_tiering"
          }
        },
        {
          collection: 'archived_customer_interactions',
          index: { "archivedMetadata.scheduledDeletion": 1 },
          options: { 
            expireAfterSeconds: 0,
            background: true,
            name: "archived_data_deletion"
          }
        }
      ];

      for (const indexConfig of archivingIndexes) {
        await this.db.collection(indexConfig.collection).createIndex(
          indexConfig.index,
          indexConfig.options
        );
      }

      console.log('Intelligent archiving indexes created successfully');

    } catch (error) {
      console.error('Error setting up intelligent archiving:', error);
      throw error;
    }
  }

  async generateComplianceReport(reportType = 'comprehensive', dateRange = null) {
    console.log(`Generating ${reportType} compliance report...`);

    try {
      const reportStart = dateRange?.start || new Date(Date.now() - 30 * 24 * 60 * 60 * 1000); // 30 days ago
      const reportEnd = dateRange?.end || new Date();

      const complianceReport = {
        reportId: new ObjectId(),
        reportType: reportType,
        generatedAt: new Date(),
        reportPeriod: { start: reportStart, end: reportEnd },
        complianceFrameworks: []
      };

      // Data governance metrics
      complianceReport.dataGovernanceMetrics = await this.generateDataGovernanceMetrics(reportStart, reportEnd);

      // Retention policy compliance
      complianceReport.retentionCompliance = await this.generateRetentionComplianceMetrics(reportStart, reportEnd);

      // GDPR compliance metrics
      if (this.governanceConfig.gdprCompliance) {
        complianceReport.gdprCompliance = await this.generateGdprComplianceMetrics(reportStart, reportEnd);
        complianceReport.complianceFrameworks.push('GDPR');
      }

      // SOX compliance metrics
      if (this.governanceConfig.soxCompliance) {
        complianceReport.soxCompliance = await this.generateSoxComplianceMetrics(reportStart, reportEnd);
        complianceReport.complianceFrameworks.push('SOX');
      }

      // Data lifecycle metrics
      complianceReport.lifecycleMetrics = await this.generateLifecycleMetrics(reportStart, reportEnd);

      // Risk and audit metrics
      complianceReport.riskMetrics = await this.generateRiskMetrics(reportStart, reportEnd);

      // Store compliance report
      await this.collections.governanceMetrics.insertOne(complianceReport);

      return complianceReport;

    } catch (error) {
      console.error('Error generating compliance report:', error);
      throw error;
    }
  }

  async generateDataGovernanceMetrics(startDate, endDate) {
    console.log('Generating data governance metrics...');

    const metrics = await this.collections.lifecycleEvents.aggregate([
      {
        $match: {
          timestamp: { $gte: startDate, $lte: endDate }
        }
      },
      {
        $group: {
          _id: '$action',
          count: { $sum: 1 },
          collections: { $addToSet: '$collection' },
          complianceFrameworks: { $addToSet: '$retentionCriteria.complianceFrameworks' },
          avgProcessingTime: { $avg: '$processingTime' }
        }
      },
      {
        $project: {
          action: '$_id',
          count: 1,
          collectionsCount: { $size: '$collections' },
          complianceFrameworksCount: { $size: '$complianceFrameworks' },
          avgProcessingTimeMs: { $round: ['$avgProcessingTime', 2] }
        }
      }
    ]).toArray();

    return {
      totalGovernanceEvents: metrics.reduce((sum, m) => sum + m.count, 0),
      actionBreakdown: metrics,
      period: { start: startDate, end: endDate }
    };
  }

  // Utility methods for governance operations

  generateDocumentHash(document) {
    const crypto = require('crypto');
    const documentString = JSON.stringify(document, Object.keys(document).sort());
    return crypto.createHash('sha256').update(documentString).digest('hex');
  }

  async logGovernanceEvent(eventData) {
    try {
      const event = {
        _id: new ObjectId(),
        ...eventData,
        timestamp: eventData.timestamp || new Date()
      };

      await this.collections.lifecycleEvents.insertOne(event);

    } catch (error) {
      console.error('Error logging governance event:', error);
      // Don't throw - logging shouldn't break governance operations
    }
  }

  async logRetentionPolicyExecution(policy, results) {
    try {
      const executionLog = {
        _id: new ObjectId(),
        policyName: policy.policyName,
        executionTimestamp: new Date(),
        results: results,
        policyConfiguration: policy.policyConfig,
        governance: {
          executedBy: 'automated_system',
          complianceStatus: results.success ? 'successful' : 'failed',
          auditTrail: true
        }
      };

      await this.collections.complianceAuditLog.insertOne(executionLog);

    } catch (error) {
      console.error('Error logging retention policy execution:', error);
    }
  }
}

// Enterprise-ready data lifecycle automation
class EnterpriseDataLifecycleAutomation extends AdvancedDataLifecycleManager {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableCloudStorageIntegration: true,
      enableCostOptimization: true,
      enableComplianceOrchestration: true,
      enableExecutiveDashboards: true,
      enableAutomatedReporting: true
    };

    this.setupEnterpriseAutomation();
  }

  async setupEnterpriseAutomation() {
    console.log('Setting up enterprise data lifecycle automation...');

    // Setup automated scheduling
    await this.setupAutomatedScheduling();

    // Setup cost optimization
    await this.setupCostOptimization();

    // Setup compliance orchestration
    await this.setupComplianceOrchestration();

    console.log('Enterprise automation configured successfully');
  }

  async setupAutomatedScheduling() {
    console.log('Setting up automated retention scheduling...');

    // Implementation would include:
    // - Cron-like scheduling system
    // - Load balancing across retention operations
    // - Maintenance window awareness
    // - Performance impact monitoring

    const schedulingConfig = {
      retentionSchedule: {
        daily: { time: '02:00', timezone: 'UTC', enabled: true },
        weekly: { day: 'Sunday', time: '01:00', timezone: 'UTC', enabled: true },
        monthly: { day: 1, time: '00:00', timezone: 'UTC', enabled: true }
      },

      maintenanceWindows: [
        { start: '01:00', end: '05:00', timezone: 'UTC', priority: 'high' },
        { start: '13:00', end: '14:00', timezone: 'UTC', priority: 'medium' }
      ],

      performanceThresholds: {
        maxConcurrentOperations: 3,
        maxDocumentsPerMinute: 10000,
        maxMemoryUsage: '2GB',
        cpuThrottling: 80
      }
    };

    // Store scheduling configuration
    await this.collections.governanceMetrics.replaceOne(
      { configType: 'scheduling' },
      { configType: 'scheduling', ...schedulingConfig, lastUpdated: new Date() },
      { upsert: true }
    );
  }

  async setupCostOptimization() {
    console.log('Setting up automated cost optimization...');

    const costOptimizationConfig = {
      storageTiering: {
        hotStorage: { maxAge: 30, costPerGB: 0.023 }, // 30 days
        warmStorage: { maxAge: 90, costPerGB: 0.012 }, // 90 days
        coldStorage: { maxAge: 365, costPerGB: 0.004 }, // 1 year
        frozenStorage: { maxAge: 2555, costPerGB: 0.001 } // 7 years
      },

      optimizationRules: {
        enableAutomatedTiering: true,
        enableCostAlerts: true,
        enableUsageAnalytics: true,
        optimizationSchedule: 'weekly'
      }
    };

    await this.collections.governanceMetrics.replaceOne(
      { configType: 'cost_optimization' },
      { configType: 'cost_optimization', ...costOptimizationConfig, lastUpdated: new Date() },
      { upsert: true }
    );
  }
}

// Benefits of MongoDB Advanced Data Lifecycle Management:
// - Automated retention policies with native TTL and governance integration
// - Comprehensive compliance framework support (GDPR, CCPA, SOX, HIPAA)
// - Intelligent data tiering and cost optimization
// - Enterprise-grade audit trails and compliance reporting
// - Automated data classification and sensitivity detection
// - Legal hold management with automated compliance tracking
// - Native integration with MongoDB's storage and archiving capabilities
// - SQL-compatible lifecycle management through QueryLeaf integration
// - Real-time governance monitoring and alerting
// - Scalable automation for enterprise data volumes

module.exports = {
  AdvancedDataLifecycleManager,
  EnterpriseDataLifecycleAutomation
};

Understanding MongoDB Data Archiving Architecture

Advanced Lifecycle Management and Automation Patterns

Implement sophisticated data lifecycle management for enterprise MongoDB deployments:

// Production-ready MongoDB data lifecycle management with comprehensive automation
class ProductionDataLifecycleManager extends EnterpriseDataLifecycleAutomation {
  constructor(db, productionConfig) {
    super(db, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      enableHighAvailability: true,
      enableDisasterRecovery: true,
      enableGeographicCompliance: true,
      enableRealTimeMonitoring: true,
      enablePredictiveAnalytics: true
    };

    this.setupProductionOptimizations();
    this.initializeAdvancedAutomation();
  }

  async implementPredictiveDataLifecycleManagement() {
    console.log('Implementing predictive data lifecycle management...');

    const predictiveStrategy = {
      // Data growth prediction
      dataGrowthPrediction: {
        enableTrendAnalysis: true,
        enableSeasonalAdjustments: true,
        enableCapacityPlanning: true,
        predictionHorizon: 365 // days
      },

      // Access pattern analysis
      accessPatternAnalysis: {
        enableHotDataIdentification: true,
        enableColdDataPrediction: true,
        enableArchivalPrediction: true,
        analysisWindow: 90 // days
      },

      // Cost optimization predictions
      costOptimizationPredictions: {
        enableCostProjections: true,
        enableSavingsAnalysis: true,
        enableROICalculations: true,
        optimizationRecommendations: true
      }
    };

    return await this.deployPredictiveStrategy(predictiveStrategy);
  }

  async setupAdvancedComplianceOrchestration() {
    console.log('Setting up advanced compliance orchestration...');

    const complianceOrchestration = {
      // Multi-jurisdiction compliance
      jurisdictionalCompliance: {
        enableGdprCompliance: true,
        enableCcpaCompliance: true,
        enablePipedaCompliance: true, // Canada
        enableLgpdCompliance: true,  // Brazil
        enableRegionalDataResidency: true
      },

      // Automated compliance workflows
      complianceWorkflows: {
        enableAutomaticDataSubjectRights: true,
        enableAutomaticRetentionEnforcement: true,
        enableAutomaticAuditPreperation: true,
        enableComplianceReporting: true
      },

      // Risk management integration
      riskManagement: {
        enableRiskAssessments: true,
        enableThreatModeling: true,
        enableComplianceGapAnalysis: true,
        enableContinuousMonitoring: true
      }
    };

    return await this.deployComplianceOrchestration(complianceOrchestration);
  }
}

SQL-Style Data Lifecycle Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB data archiving and lifecycle management:

-- QueryLeaf advanced data lifecycle management with SQL-familiar syntax

-- Configure automated data lifecycle policies
CREATE DATA_LIFECYCLE_POLICY customer_data_retention AS (
  -- Data classification and retention rules
  RETENTION_RULES = JSON_OBJECT(
    'confidential', JSON_OBJECT(
      'retention_period_days', 2555,  -- 7 years
      'compliance_frameworks', JSON_ARRAY('SOX', 'Financial_Records'),
      'secure_delete', true,
      'encryption_required', true,
      'legal_hold_check', true
    ),
    'personal_data', JSON_OBJECT(
      'retention_period_days', 2190,  -- 6 years  
      'compliance_frameworks', JSON_ARRAY('GDPR', 'CCPA'),
      'right_to_erasure', true,
      'secure_delete', true,
      'data_subject_rights', true
    ),
    'business_records', JSON_OBJECT(
      'retention_period_days', 1825,  -- 5 years
      'compliance_frameworks', JSON_ARRAY('Business_Records'),
      'secure_delete', false,
      'audit_trail', true
    )
  ),

  -- Automated execution configuration
  AUTOMATION_CONFIG = JSON_OBJECT(
    'execution_schedule', 'daily',
    'execution_time', '02:00',
    'batch_size', 1000,
    'enable_notifications', true,
    'enable_audit_logging', true,
    'respect_legal_holds', true,
    'enable_cost_optimization', true
  ),

  -- Compliance and governance settings
  GOVERNANCE_CONFIG = JSON_OBJECT(
    'policy_owner', 'data_governance_team',
    'approval_status', 'approved',
    'last_review_date', CURRENT_DATE,
    'next_review_date', CURRENT_DATE + INTERVAL '1 year',
    'compliance_officer', '[email protected]'
  )
);

-- Advanced data classification with automated detection
WITH automated_data_classification AS (
  SELECT 
    _id,
    customer_id,
    interaction_type,
    interaction_data,
    created_at,

    -- Automated PII detection
    CASE 
      WHEN interaction_data ? 'email' OR 
           interaction_data ? 'phone' OR
           interaction_data ? 'ssn' OR
           interaction_data ? 'address' THEN 'personal_data'
      WHEN interaction_data ? 'payment_info' OR
           interaction_data ? 'credit_card' OR
           interaction_data ? 'bank_account' THEN 'confidential'
      WHEN interaction_type IN ('support', 'complaint', 'service_inquiry') THEN 'business_records'
      ELSE 'internal'
    END as auto_classification,

    -- GDPR applicability detection
    CASE 
      WHEN interaction_data->>'customer_region' IN ('EU', 'EEA') OR
           interaction_data ? 'gdpr_consent' THEN true
      ELSE false
    END as gdpr_applicable,

    -- Financial data detection
    CASE 
      WHEN interaction_type IN ('payment', 'billing', 'refund') OR
           interaction_data ? 'transaction_id' OR
           interaction_data ? 'invoice_number' THEN true
      ELSE false
    END as financial_data,

    -- Health data detection (if applicable)
    CASE 
      WHEN interaction_data ? 'medical_info' OR
           interaction_data ? 'health_record' OR
           interaction_type = 'health_inquiry' THEN true
      ELSE false
    END as health_data,

    -- Calculate data sensitivity score
    (
      CASE WHEN interaction_data ? 'email' THEN 1 ELSE 0 END +
      CASE WHEN interaction_data ? 'phone' THEN 1 ELSE 0 END +
      CASE WHEN interaction_data ? 'address' THEN 1 ELSE 0 END +
      CASE WHEN interaction_data ? 'payment_info' THEN 2 ELSE 0 END +
      CASE WHEN interaction_data ? 'ssn' THEN 3 ELSE 0 END +
      CASE WHEN interaction_data ? 'health_record' THEN 2 ELSE 0 END
    ) as sensitivity_score

  FROM customer_interactions
  WHERE data_governance.classification IS NULL  -- Unclassified data
),

enhanced_classification AS (
  SELECT 
    adc.*,

    -- Final classification determination
    CASE 
      WHEN health_data THEN 'restricted'
      WHEN financial_data AND sensitivity_score >= 3 THEN 'restricted'
      WHEN financial_data THEN 'confidential'
      WHEN gdpr_applicable AND sensitivity_score >= 2 THEN 'personal_data'
      WHEN sensitivity_score >= 3 THEN 'confidential'
      WHEN sensitivity_score >= 1 THEN 'personal_data'
      ELSE auto_classification
    END as final_classification,

    -- Compliance framework assignment
    ARRAY(
      SELECT framework FROM (
        SELECT 'GDPR' as framework WHERE gdpr_applicable
        UNION ALL
        SELECT 'SOX' as framework WHERE financial_data
        UNION ALL
        SELECT 'HIPAA' as framework WHERE health_data
        UNION ALL
        SELECT 'CCPA' as framework WHERE auto_classification = 'personal_data'
        UNION ALL
        SELECT 'Business_Records' as framework WHERE auto_classification = 'business_records'
      ) frameworks
    ) as compliance_frameworks,

    -- Retention period calculation
    CASE 
      WHEN health_data THEN 2190  -- 6 years for health data
      WHEN financial_data THEN 2555  -- 7 years for financial data
      WHEN gdpr_applicable THEN 2190  -- 6 years for GDPR data
      WHEN auto_classification = 'confidential' THEN 2555  -- 7 years
      WHEN auto_classification = 'business_records' THEN 1825  -- 5 years
      ELSE 1095  -- 3 years default
    END as retention_period_days,

    -- Special handling flags
    JSON_BUILD_OBJECT(
      'gdpr_applicable', gdpr_applicable,
      'right_to_erasure', gdpr_applicable,
      'financial_audit_support', financial_data,
      'health_privacy_protected', health_data,
      'secure_delete_required', sensitivity_score >= 2,
      'encryption_required', sensitivity_score >= 2 OR financial_data OR health_data
    ) as special_handling

  FROM automated_data_classification adc
)

-- Update documents with automated classification
UPDATE customer_interactions 
SET 
  data_governance = JSON_SET(
    COALESCE(data_governance, '{}'),
    '$.classification', ec.final_classification,
    '$.compliance_frameworks', ec.compliance_frameworks,
    '$.retention_period_days', ec.retention_period_days,
    '$.special_handling', ec.special_handling,
    '$.classification_timestamp', CURRENT_TIMESTAMP,
    '$.classification_method', 'automated',
    '$.sensitivity_score', ec.sensitivity_score,

    -- Calculate retention expiry
    '$.retention_expiry', CURRENT_TIMESTAMP + MAKE_INTERVAL(days => ec.retention_period_days),

    -- Set lifecycle stage
    '$.lifecycle_stage', 'active',
    '$.last_classification_update', CURRENT_TIMESTAMP
  )
FROM enhanced_classification ec
WHERE customer_interactions._id = ec._id;

-- Advanced retention policy execution with comprehensive compliance checks
WITH retention_candidates AS (
  SELECT 
    _id,
    customer_id,
    interaction_type,
    data_governance,
    created_at,

    -- Calculate days since creation
    EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) as age_in_days,

    -- Check retention eligibility
    CASE 
      WHEN data_governance->>'retention_expiry' IS NOT NULL AND
           data_governance->>'retention_expiry' < CURRENT_TIMESTAMP THEN 'expired'
      WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           CAST(data_governance->>'retention_period_days' AS INTEGER) THEN 'expired'
      WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           (CAST(data_governance->>'retention_period_days' AS INTEGER) - 30) THEN 'expiring_soon'
      ELSE 'active'
    END as retention_status,

    -- Check for legal holds
    CASE 
      WHEN EXISTS (
        SELECT 1 FROM legal_holds lh 
        WHERE lh.customer_id = ci.customer_id 
        AND lh.status = 'active'
        AND lh.data_types && (data_governance->>'compliance_frameworks')::jsonb
      ) THEN 'legal_hold_active'
      ELSE 'no_legal_hold'
    END as legal_hold_status,

    -- Check GDPR right to erasure requests
    CASE 
      WHEN data_governance->>'gdpr_applicable' = 'true' AND
           EXISTS (
             SELECT 1 FROM gdpr_requests gr 
             WHERE gr.customer_id = ci.customer_id 
             AND gr.request_type = 'erasure'
             AND gr.status = 'approved'
           ) THEN 'gdpr_erasure_required'
      ELSE 'no_gdpr_action_required'
    END as gdpr_status,

    -- Calculate processing priority
    CASE 
      WHEN data_governance->>'gdpr_applicable' = 'true' AND
           EXISTS (
             SELECT 1 FROM gdpr_requests gr 
             WHERE gr.customer_id = ci.customer_id 
             AND gr.request_type = 'erasure'
             AND gr.status = 'approved'
           ) THEN 1  -- Highest priority for GDPR erasure
      WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           CAST(data_governance->>'retention_period_days' AS INTEGER) + 90 THEN 2  -- Overdue retention
      WHEN data_governance->>'special_handling'->>'secure_delete_required' = 'true' AND
           EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           CAST(data_governance->>'retention_period_days' AS INTEGER) THEN 3  -- Secure delete required
      WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           CAST(data_governance->>'retention_period_days' AS INTEGER) THEN 4  -- Standard retention
      ELSE 5  -- No action required
    END as processing_priority

  FROM customer_interactions ci
  WHERE data_governance IS NOT NULL
    AND data_governance->>'classification' IS NOT NULL
),

legal_hold_validation AS (
  SELECT 
    rc.*,

    -- Detailed legal hold information
    COALESCE(
      (
        SELECT JSON_AGG(
          JSON_BUILD_OBJECT(
            'hold_id', lh.hold_id,
            'hold_type', lh.hold_type,
            'initiated_by', lh.initiated_by,
            'reason', lh.reason,
            'expected_duration', lh.expected_duration
          )
        )
        FROM legal_holds lh 
        WHERE lh.customer_id = rc.customer_id 
        AND lh.status = 'active'
        AND lh.data_types && (rc.data_governance->>'compliance_frameworks')::jsonb
      ),
      '[]'::json
    ) as active_legal_holds,

    -- Compliance validation
    CASE 
      WHEN rc.legal_hold_status = 'legal_hold_active' THEN 'blocked_legal_hold'
      WHEN rc.gdpr_status = 'gdpr_erasure_required' THEN 'gdpr_immediate_action'
      WHEN rc.retention_status = 'expired' THEN 'retention_action_required'
      WHEN rc.retention_status = 'expiring_soon' THEN 'prepare_for_retention'
      ELSE 'no_action_required'
    END as required_action,

    -- Audit and compliance tracking
    JSON_BUILD_OBJECT(
      'compliance_check_timestamp', CURRENT_TIMESTAMP,
      'retention_policy_applied', 'customer_data_retention',
      'legal_review_required', rc.legal_hold_status = 'legal_hold_active',
      'gdpr_compliance_check', rc.data_governance->>'gdpr_applicable' = 'true',
      'financial_audit_support', rc.data_governance->>'special_handling'->>'financial_audit_support' = 'true'
    ) as compliance_audit_trail

  FROM retention_candidates rc
  WHERE rc.processing_priority <= 4  -- Only process items requiring action
),

archival_preparation AS (
  SELECT 
    lhv.*,

    -- Determine archival strategy
    CASE 
      WHEN required_action = 'gdpr_immediate_action' THEN 'immediate_secure_deletion'
      WHEN required_action = 'retention_action_required' AND 
           data_governance->>'special_handling'->>'secure_delete_required' = 'true' THEN 'archive_then_secure_delete'
      WHEN required_action = 'retention_action_required' THEN 'archive_standard'
      WHEN required_action = 'prepare_for_retention' THEN 'prepare_archival'
      ELSE 'no_archival_action'
    END as archival_strategy,

    -- Calculate archival timeline
    CASE 
      WHEN required_action = 'gdpr_immediate_action' THEN CURRENT_TIMESTAMP + INTERVAL '3 days'  -- GDPR 72-hour requirement
      WHEN required_action = 'retention_action_required' THEN CURRENT_TIMESTAMP + INTERVAL '30 days'
      WHEN required_action = 'prepare_for_retention' THEN 
        data_governance->>'retention_expiry'::timestamp + INTERVAL '7 days'
      ELSE NULL
    END as scheduled_archival_date,

    -- Compliance requirements for archival
    JSON_BUILD_OBJECT(
      'audit_trail_required', data_governance->>'special_handling'->>'financial_audit_support' = 'true',
      'encryption_required', data_governance->>'special_handling'->>'encryption_required' = 'true',
      'secure_deletion_required', data_governance->>'special_handling'->>'secure_delete_required' = 'true',
      'gdpr_compliance_required', data_governance->>'gdpr_applicable' = 'true',
      'legal_hold_override_blocked', legal_hold_status = 'legal_hold_active',
      'compliance_frameworks_affected', data_governance->>'compliance_frameworks'
    ) as archival_compliance_requirements

  FROM legal_hold_validation lhv
  WHERE required_action != 'no_action_required'
    AND required_action != 'blocked_legal_hold'
)

-- Create archival execution plan
INSERT INTO data_archival_queue (
  document_id,
  customer_id,
  collection_name,
  archival_strategy,
  scheduled_execution_date,
  processing_priority,
  compliance_requirements,
  legal_holds,
  audit_trail,
  created_at
)
SELECT 
  ap._id,
  ap.customer_id,
  'customer_interactions',
  ap.archival_strategy,
  ap.scheduled_archival_date,
  ap.processing_priority,
  ap.archival_compliance_requirements,
  ap.active_legal_holds,
  ap.compliance_audit_trail,
  CURRENT_TIMESTAMP
FROM archival_preparation ap
WHERE ap.archival_strategy != 'no_archival_action'
ORDER BY ap.processing_priority, ap.scheduled_archival_date;

-- Execute automated archival based on queue
WITH archival_execution_batch AS (
  SELECT 
    daq.*,
    ci.interaction_type,
    ci.interaction_data,
    ci.data_governance,

    -- Generate archival metadata
    JSON_BUILD_OBJECT(
      'archival_id', GENERATE_UUID(),
      'original_collection', 'customer_interactions',
      'archival_timestamp', CURRENT_TIMESTAMP,
      'archival_method', 'automated_retention_policy',
      'archival_strategy', daq.archival_strategy,
      'compliance_frameworks', daq.compliance_requirements->>'compliance_frameworks_affected',
      'retention_policy_applied', 'customer_data_retention',
      'archival_batch_id', GENERATE_UUID()
    ) as archival_metadata

  FROM data_archival_queue daq
  JOIN customer_interactions ci ON daq.document_id = ci._id
  WHERE daq.scheduled_execution_date <= CURRENT_TIMESTAMP
    AND daq.processing_status = 'pending'
    AND daq.archival_strategy IN ('archive_standard', 'archive_then_secure_delete')
  ORDER BY daq.processing_priority, daq.scheduled_execution_date
  LIMIT 1000  -- Process in batches
),

archival_insertions AS (
  -- Insert into archive collection
  INSERT INTO archived_customer_interactions (
    original_id,
    customer_id,
    interaction_type,
    interaction_data,
    original_created_at,
    archival_metadata,
    data_governance,
    compliance_audit_trail,
    scheduled_deletion
  )
  SELECT 
    aeb.document_id,
    aeb.customer_id,
    aeb.interaction_type,
    aeb.interaction_data,
    aeb.created_at,
    aeb.archival_metadata,
    aeb.data_governance,
    aeb.audit_trail,

    -- Calculate deletion date for secure delete items
    CASE 
      WHEN aeb.archival_strategy = 'archive_then_secure_delete' THEN
        CURRENT_TIMESTAMP + INTERVAL '30 days'  -- 30-day grace period
      ELSE NULL
    END
  FROM archival_execution_batch aeb
  RETURNING original_id, archival_metadata->>'archival_id' as archival_id
),

source_deletions AS (
  -- Remove from original collection after successful archival
  DELETE FROM customer_interactions 
  WHERE _id IN (
    SELECT aeb.document_id 
    FROM archival_execution_batch aeb
  )
  RETURNING _id, customer_id
),

queue_updates AS (
  -- Update archival queue status
  UPDATE data_archival_queue 
  SET 
    processing_status = 'completed',
    executed_at = CURRENT_TIMESTAMP,
    execution_method = 'automated_batch',
    archival_confirmation = true
  WHERE document_id IN (
    SELECT aeb.document_id 
    FROM archival_execution_batch aeb
  )
  RETURNING document_id, processing_priority
)

-- Generate archival execution summary
SELECT 
  COUNT(*) as documents_archived,
  COUNT(DISTINCT aeb.customer_id) as customers_affected,

  -- Archival strategy breakdown
  COUNT(*) FILTER (WHERE aeb.archival_strategy = 'archive_standard') as standard_archival_count,
  COUNT(*) FILTER (WHERE aeb.archival_strategy = 'archive_then_secure_delete') as secure_archival_count,

  -- Compliance framework impact
  JSON_AGG(DISTINCT aeb.compliance_requirements->>'compliance_frameworks_affected') as frameworks_affected,

  -- Processing metrics
  AVG(aeb.processing_priority) as avg_processing_priority,
  MIN(aeb.scheduled_execution_date) as earliest_scheduled_date,
  MAX(aeb.scheduled_execution_date) as latest_scheduled_date,

  -- Audit and governance summary
  JSON_BUILD_OBJECT(
    'execution_timestamp', CURRENT_TIMESTAMP,
    'execution_method', 'automated_sql_batch',
    'retention_policy_applied', 'customer_data_retention',
    'compliance_verified', true,
    'legal_holds_respected', true,
    'audit_trail_complete', true
  ) as execution_summary

FROM archival_execution_batch aeb;

-- Real-time governance monitoring and compliance dashboard
WITH governance_metrics AS (
  SELECT 
    -- Data classification status
    COUNT(*) as total_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' IS NOT NULL) as classified_documents,
    ROUND(
      (COUNT(*) FILTER (WHERE data_governance->>'classification' IS NOT NULL) * 100.0 / NULLIF(COUNT(*), 0)),
      2
    ) as classification_percentage,

    -- Classification breakdown
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'public') as public_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'internal') as internal_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'confidential') as confidential_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'restricted') as restricted_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'personal_data') as personal_data_documents,

    -- Retention status
    COUNT(*) FILTER (
      WHERE data_governance->>'retention_expiry' < CURRENT_TIMESTAMP::text
    ) as expired_retention_count,
    COUNT(*) FILTER (
      WHERE data_governance->>'retention_expiry' < (CURRENT_TIMESTAMP + INTERVAL '30 days')::text
      AND data_governance->>'retention_expiry' > CURRENT_TIMESTAMP::text
    ) as expiring_soon_count,

    -- Compliance framework coverage
    COUNT(DISTINCT customer_id) FILTER (
      WHERE data_governance->>'gdpr_applicable' = 'true'
    ) as gdpr_subject_customers,
    COUNT(*) FILTER (
      WHERE data_governance->>'compliance_frameworks' ? 'SOX'
    ) as sox_covered_documents,
    COUNT(*) FILTER (
      WHERE data_governance->>'compliance_frameworks' ? 'HIPAA'
    ) as hipaa_covered_documents

  FROM customer_interactions
),

legal_hold_metrics AS (
  SELECT 
    COUNT(DISTINCT customer_id) as customers_under_legal_hold,
    COUNT(*) as active_legal_holds,
    JSON_AGG(DISTINCT hold_type) as hold_types,
    AVG(EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_date)) as avg_hold_duration_days,
    COUNT(*) FILTER (WHERE status = 'pending_review') as holds_pending_review

  FROM legal_holds
  WHERE status = 'active'
),

archival_metrics AS (
  SELECT 
    COUNT(*) as total_archived_documents,
    COUNT(DISTINCT customer_id) as customers_with_archived_data,
    SUM(
      CASE WHEN scheduled_deletion IS NOT NULL THEN 1 ELSE 0 END
    ) as documents_scheduled_for_deletion,

    -- Archival age analysis
    AVG(EXTRACT(DAYS FROM CURRENT_TIMESTAMP - archival_metadata->>'archival_timestamp'::timestamp)) as avg_archival_age_days,
    COUNT(*) FILTER (
      WHERE archival_metadata->>'archival_timestamp'::timestamp > CURRENT_TIMESTAMP - INTERVAL '30 days'
    ) as recently_archived_count,

    -- Storage optimization metrics
    SUM(LENGTH(interaction_data::text)) / (1024 * 1024) as archived_data_size_mb,
    COUNT(*) FILTER (
      WHERE data_governance->>'special_handling'->>'encryption_required' = 'true'
    ) as encrypted_archived_documents

  FROM archived_customer_interactions
),

compliance_alerts AS (
  SELECT 
    COUNT(*) FILTER (
      WHERE data_governance->>'retention_expiry' < CURRENT_TIMESTAMP::text
      AND NOT EXISTS (
        SELECT 1 FROM legal_holds lh 
        WHERE lh.customer_id = ci.customer_id 
        AND lh.status = 'active'
      )
    ) as overdue_retention_alerts,

    COUNT(*) FILTER (
      WHERE data_governance->>'gdpr_applicable' = 'true'
      AND EXISTS (
        SELECT 1 FROM gdpr_requests gr 
        WHERE gr.customer_id = ci.customer_id 
        AND gr.request_type = 'erasure'
        AND gr.status = 'approved'
        AND gr.created_date < CURRENT_TIMESTAMP - INTERVAL '72 hours'
      )
    ) as overdue_gdpr_erasure_alerts,

    COUNT(*) FILTER (
      WHERE data_governance->>'classification' IS NULL
      AND created_at < CURRENT_TIMESTAMP - INTERVAL '7 days'
    ) as unclassified_data_alerts

  FROM customer_interactions ci
),

cost_optimization_metrics AS (
  SELECT 
    -- Storage tier analysis
    COUNT(*) FILTER (
      WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) <= 30
    ) as hot_storage_documents,
    COUNT(*) FILTER (
      WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) BETWEEN 31 AND 90
    ) as warm_storage_documents,
    COUNT(*) FILTER (
      WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) BETWEEN 91 AND 365
    ) as cold_storage_documents,
    COUNT(*) FILTER (
      WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) > 365
    ) as frozen_storage_candidates,

    -- Cost projections (estimated)
    ROUND(
      (COUNT(*) FILTER (WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) <= 30) * 0.023 +
       COUNT(*) FILTER (WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) BETWEEN 31 AND 90) * 0.012 +
       COUNT(*) FILTER (WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) BETWEEN 91 AND 365) * 0.004 +
       COUNT(*) FILTER (WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) > 365) * 0.001) * 
      (SUM(LENGTH(interaction_data::text)) / COUNT(*)) / (1024 * 1024 * 1024),
      2
    ) as estimated_monthly_storage_cost_usd

  FROM customer_interactions
)

-- Comprehensive governance dashboard
SELECT 
  CURRENT_TIMESTAMP as dashboard_generated_at,

  -- Data governance overview
  JSON_BUILD_OBJECT(
    'total_documents', gm.total_documents,
    'classification_coverage_percent', gm.classification_percentage,
    'classification_breakdown', JSON_BUILD_OBJECT(
      'public', gm.public_documents,
      'internal', gm.internal_documents,
      'confidential', gm.confidential_documents,
      'restricted', gm.restricted_documents,
      'personal_data', gm.personal_data_documents
    ),
    'unclassified_documents', gm.total_documents - gm.classified_documents
  ) as data_governance_status,

  -- Retention management status
  JSON_BUILD_OBJECT(
    'expired_retention_count', gm.expired_retention_count,
    'expiring_soon_count', gm.expiring_soon_count,
    'retention_compliance_rate', ROUND(
      ((gm.total_documents - gm.expired_retention_count) * 100.0 / NULLIF(gm.total_documents, 0)),
      2
    )
  ) as retention_status,

  -- Compliance framework coverage
  JSON_BUILD_OBJECT(
    'gdpr_subject_customers', gm.gdpr_subject_customers,
    'sox_covered_documents', gm.sox_covered_documents,
    'hipaa_covered_documents', gm.hipaa_covered_documents,
    'legal_holds_active', lhm.active_legal_holds,
    'customers_under_legal_hold', lhm.customers_under_legal_hold
  ) as compliance_coverage,

  -- Archival and lifecycle metrics
  JSON_BUILD_OBJECT(
    'total_archived_documents', am.total_archived_documents,
    'customers_with_archived_data', am.customers_with_archived_data,
    'documents_scheduled_for_deletion', am.documents_scheduled_for_deletion,
    'recently_archived_count', am.recently_archived_count,
    'archived_data_size_mb', ROUND(am.archived_data_size_mb, 2)
  ) as archival_metrics,

  -- Compliance alerts and action items
  JSON_BUILD_OBJECT(
    'overdue_retention_alerts', ca.overdue_retention_alerts,
    'overdue_gdpr_erasure_alerts', ca.overdue_gdpr_erasure_alerts,
    'unclassified_data_alerts', ca.unclassified_data_alerts,
    'total_active_alerts', ca.overdue_retention_alerts + ca.overdue_gdpr_erasure_alerts + ca.unclassified_data_alerts
  ) as compliance_alerts,

  -- Cost optimization insights
  JSON_BUILD_OBJECT(
    'storage_tier_distribution', JSON_BUILD_OBJECT(
      'hot_storage', com.hot_storage_documents,
      'warm_storage', com.warm_storage_documents,
      'cold_storage', com.cold_storage_documents,
      'frozen_candidates', com.frozen_storage_candidates
    ),
    'estimated_monthly_cost_usd', com.estimated_monthly_storage_cost_usd,
    'optimization_opportunity_percent', ROUND(
      (com.frozen_storage_candidates * 100.0 / NULLIF(
        com.hot_storage_documents + com.warm_storage_documents + 
        com.cold_storage_documents + com.frozen_storage_candidates, 0
      )),
      2
    )
  ) as cost_optimization,

  -- Recommendations and action items
  JSON_BUILD_ARRAY(
    CASE WHEN gm.classification_percentage < 95 THEN 
      'Improve data classification coverage - currently at ' || gm.classification_percentage || '%'
    END,
    CASE WHEN gm.expired_retention_count > 0 THEN 
      'Process ' || gm.expired_retention_count || ' documents with expired retention periods'
    END,
    CASE WHEN ca.overdue_gdpr_erasure_alerts > 0 THEN 
      'URGENT: Complete ' || ca.overdue_gdpr_erasure_alerts || ' overdue GDPR erasure requests'
    END,
    CASE WHEN com.frozen_storage_candidates > com.hot_storage_documents * 0.1 THEN
      'Optimize storage costs by archiving ' || com.frozen_storage_candidates || ' old documents'
    END
  ) as action_recommendations

FROM governance_metrics gm
CROSS JOIN legal_hold_metrics lhm  
CROSS JOIN archival_metrics am
CROSS JOIN compliance_alerts ca
CROSS JOIN cost_optimization_metrics com;

-- QueryLeaf provides comprehensive data lifecycle management capabilities:
-- 1. Automated data classification with PII and sensitivity detection
-- 2. Policy-driven retention management with compliance framework support
-- 3. Advanced legal hold integration with automated compliance tracking
-- 4. GDPR, CCPA, SOX, and HIPAA compliance automation
-- 5. Intelligent archiving with cost optimization and storage tiering
-- 6. Real-time governance monitoring and compliance dashboards
-- 7. Automated audit trails and compliance reporting
-- 8. SQL-familiar syntax for complex data lifecycle operations
-- 9. Integration with MongoDB's native TTL and archiving capabilities
-- 10. Executive-level governance insights and optimization recommendations

Best Practices for Enterprise Data Governance

Compliance and Regulatory Alignment

Essential principles for effective MongoDB data lifecycle management in regulated environments:

  1. Data Classification: Implement automated data classification based on content analysis, sensitivity scoring, and regulatory requirements
  2. Retention Policies: Design comprehensive retention policies that align with business requirements and regulatory mandates
  3. Legal Hold Management: Establish automated legal hold processes that override retention policies when litigation or investigations are active
  4. Audit Trails: Maintain comprehensive audit trails for all data lifecycle events to support compliance reporting and investigations
  5. Access Controls: Implement role-based access controls for data governance operations with proper segregation of duties
  6. Compliance Monitoring: Deploy real-time monitoring for compliance violations and automated alerting for critical governance events

Automation and Operational Excellence

Optimize data lifecycle automation for enterprise scale and reliability:

  1. Automated Execution: Implement automated retention policy execution with intelligent scheduling and performance optimization
  2. Cost Optimization: Deploy intelligent storage tiering and cost optimization strategies that balance compliance with operational efficiency
  3. Risk Management: Establish risk-based prioritization for data governance operations with automated escalation procedures
  4. Performance Impact: Monitor and minimize performance impact of lifecycle operations on production systems
  5. Disaster Recovery: Ensure data governance operations are integrated with disaster recovery and business continuity planning
  6. Continuous Improvement: Implement feedback loops and metrics collection to continuously optimize governance processes

Conclusion

MongoDB data archiving and lifecycle management provides comprehensive enterprise-grade capabilities for automated retention policies, compliance-aware data governance, and intelligent cost optimization that eliminate the complexity of traditional database archiving while ensuring regulatory compliance and operational efficiency. The native integration with TTL collections, automated tiering, and comprehensive audit trails enables sophisticated data governance frameworks that scale with business growth.

Key MongoDB Data Lifecycle Management benefits include:

  • Automated Retention: Policy-driven retention with native TTL support and intelligent archiving strategies
  • Compliance Automation: Built-in support for GDPR, CCPA, SOX, HIPAA, and other regulatory frameworks
  • Cost Optimization: Intelligent storage tiering with automated cost management and optimization recommendations
  • Audit and Governance: Comprehensive audit trails and compliance reporting for enterprise governance requirements
  • Legal Hold Integration: Automated legal hold management with retention policy overrides and compliance tracking
  • SQL Accessibility: Familiar SQL-style data lifecycle operations through QueryLeaf for accessible enterprise governance

Whether you're managing customer data, financial records, healthcare information, or any sensitive enterprise data requiring governance and compliance, MongoDB data lifecycle management with QueryLeaf's familiar SQL interface provides the foundation for comprehensive, automated, and compliant data governance.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB data lifecycle operations while providing SQL-familiar syntax for retention policies, compliance automation, and governance reporting. Advanced archiving strategies, cost optimization, and regulatory compliance features are seamlessly handled through familiar SQL patterns, making enterprise data governance both powerful and accessible to SQL-oriented teams.

The integration of MongoDB's robust data lifecycle capabilities with SQL-style governance operations makes it an ideal platform for applications requiring both comprehensive data governance and familiar database management patterns, ensuring your data lifecycle management remains compliant, efficient, and cost-effective as data volumes and regulatory requirements continue to evolve.

MongoDB Capped Collections: High-Performance Logging and Circular Buffer Management for Enterprise Data Streams

Modern applications generate continuous streams of time-series data, logs, events, and real-time messages that require efficient storage, retrieval, and automatic management without manual intervention. Traditional relational databases struggle with high-volume streaming data scenarios, requiring complex archival procedures, partition management, and manual cleanup processes that add operational complexity and performance overhead to data pipeline architectures.

MongoDB capped collections provide native circular buffer functionality with guaranteed insertion order, automatic size management, and optimized storage patterns designed for high-throughput streaming applications. Unlike traditional approaches that require external log rotation systems or complex partitioning strategies, capped collections automatically manage storage limits while maintaining insertion order and providing efficient tail-able cursor capabilities for real-time data consumption.

The Traditional High-Volume Logging Challenge

Conventional relational database approaches to high-volume logging and streaming data face significant operational limitations:

-- Traditional PostgreSQL high-volume logging - complex partition management and cleanup overhead

-- Application log management with manual partitioning and rotation
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    log_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    log_level VARCHAR(20) NOT NULL,
    application VARCHAR(100) NOT NULL,
    component VARCHAR(100) NOT NULL,

    -- Log content and metadata
    log_message TEXT NOT NULL,
    log_data JSONB,
    user_id INTEGER,
    session_id VARCHAR(100),
    request_id VARCHAR(100),

    -- Performance tracking
    execution_time_ms INTEGER,
    memory_usage_mb DECIMAL(10,2),
    cpu_usage_percent DECIMAL(5,2),

    -- Context information
    server_hostname VARCHAR(200),
    process_id INTEGER,
    thread_id INTEGER,
    environment VARCHAR(50) DEFAULT 'production',

    -- Correlation and tracing
    trace_id VARCHAR(100),
    parent_span_id VARCHAR(100),
    operation_name VARCHAR(200),

    CONSTRAINT valid_log_level CHECK (log_level IN ('DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL')),
    CONSTRAINT valid_environment CHECK (environment IN ('development', 'testing', 'staging', 'production'))
) PARTITION BY RANGE (log_timestamp);

-- Create partitions for log data (manual partition management)
CREATE TABLE application_logs_2025_01 PARTITION OF application_logs
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE application_logs_2025_02 PARTITION OF application_logs  
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

CREATE TABLE application_logs_2025_03 PARTITION OF application_logs
    FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

-- Performance indexes for log queries (per partition)
CREATE INDEX idx_app_logs_2025_01_timestamp ON application_logs_2025_01 (log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_level_app ON application_logs_2025_01 (log_level, application, log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_user_session ON application_logs_2025_01 (user_id, session_id, log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_trace ON application_logs_2025_01 (trace_id);

-- Real-time event stream with manual buffer management
CREATE TABLE event_stream_buffer (
    event_id BIGSERIAL PRIMARY KEY,
    event_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    event_type VARCHAR(100) NOT NULL,
    event_source VARCHAR(100) NOT NULL,

    -- Event payload
    event_data JSONB NOT NULL,
    event_version VARCHAR(20) DEFAULT '1.0',
    event_schema_version INTEGER DEFAULT 1,

    -- Stream metadata
    stream_name VARCHAR(200) NOT NULL,
    partition_key VARCHAR(200),
    sequence_number BIGINT,

    -- Processing status
    processed BOOLEAN DEFAULT FALSE,
    processing_attempts INTEGER DEFAULT 0,
    last_processed TIMESTAMP,
    processing_error TEXT,

    -- Buffer management
    buffer_position INTEGER,
    retention_priority INTEGER DEFAULT 5, -- 1 highest, 10 lowest

    -- Performance metadata
    event_size_bytes INTEGER GENERATED ALWAYS AS (length(event_data::text)) STORED,
    ingestion_latency_ms INTEGER
);

-- Complex buffer management procedure with manual overflow handling
CREATE OR REPLACE FUNCTION manage_event_stream_buffer()
RETURNS INTEGER AS $$
DECLARE
    buffer_max_size INTEGER := 1000000; -- 1 million events
    buffer_max_age INTERVAL := '7 days';
    cleanup_batch_size INTEGER := 10000;
    current_buffer_size INTEGER;
    events_to_remove INTEGER := 0;
    removed_events INTEGER := 0;
    cleanup_cursor CURSOR FOR
        SELECT event_id, event_timestamp, event_size_bytes
        FROM event_stream_buffer
        WHERE (event_timestamp < CURRENT_TIMESTAMP - buffer_max_age
               OR (processed = TRUE AND processing_attempts >= 3))
        ORDER BY retention_priority DESC, event_timestamp ASC
        LIMIT cleanup_batch_size;

    event_record RECORD;
    total_size_removed BIGINT := 0;

BEGIN
    RAISE NOTICE 'Starting event stream buffer management...';

    -- Check current buffer size
    SELECT COUNT(*), SUM(event_size_bytes) 
    INTO current_buffer_size, total_size_removed
    FROM event_stream_buffer;

    RAISE NOTICE 'Current buffer: % events, % bytes', current_buffer_size, total_size_removed;

    -- Calculate events to remove if over capacity
    IF current_buffer_size > buffer_max_size THEN
        events_to_remove := current_buffer_size - buffer_max_size + (buffer_max_size * 0.1)::INTEGER;
        RAISE NOTICE 'Buffer over capacity, removing % events', events_to_remove;
    END IF;

    -- Remove old and processed events
    FOR event_record IN cleanup_cursor LOOP
        BEGIN
            -- Archive event before deletion (if required)
            INSERT INTO event_stream_archive (
                original_event_id, event_timestamp, event_type, event_source,
                event_data, stream_name, archived_at, archive_reason
            ) VALUES (
                event_record.event_id, event_record.event_timestamp, 
                (SELECT event_type FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT event_source FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT event_data FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT stream_name FROM event_stream_buffer WHERE event_id = event_record.event_id),
                CURRENT_TIMESTAMP, 'buffer_management'
            );

            -- Remove event from buffer
            DELETE FROM event_stream_buffer WHERE event_id = event_record.event_id;

            removed_events := removed_events + 1;
            total_size_removed := total_size_removed + event_record.event_size_bytes;

            -- Exit if we've removed enough events
            EXIT WHEN events_to_remove > 0 AND removed_events >= events_to_remove;

        EXCEPTION WHEN OTHERS THEN
            RAISE WARNING 'Error processing event % during buffer cleanup: %', 
                event_record.event_id, SQLERRM;
        END;
    END LOOP;

    -- Update buffer positions for remaining events
    WITH position_update AS (
        SELECT event_id, 
               ROW_NUMBER() OVER (ORDER BY event_timestamp ASC) as new_position
        FROM event_stream_buffer
    )
    UPDATE event_stream_buffer 
    SET buffer_position = pu.new_position
    FROM position_update pu
    WHERE event_stream_buffer.event_id = pu.event_id;

    -- Log buffer management results
    INSERT INTO buffer_management_log (
        management_timestamp, events_removed, bytes_reclaimed,
        buffer_size_after, management_duration_ms
    ) VALUES (
        CURRENT_TIMESTAMP, removed_events, total_size_removed,
        (SELECT COUNT(*) FROM event_stream_buffer),
        EXTRACT(MILLISECONDS FROM (CURRENT_TIMESTAMP - (SELECT CURRENT_TIMESTAMP)))
    );

    RAISE NOTICE 'Buffer management completed: % events removed, % bytes reclaimed', 
        removed_events, total_size_removed;

    RETURN removed_events;
END;
$$ LANGUAGE plpgsql;

-- Scheduled buffer management (requires external cron job)
CREATE TABLE buffer_management_schedule (
    schedule_name VARCHAR(100) PRIMARY KEY,
    management_function VARCHAR(200) NOT NULL,
    schedule_cron VARCHAR(100) NOT NULL,
    last_execution TIMESTAMP,
    next_execution TIMESTAMP,

    -- Configuration
    enabled BOOLEAN DEFAULT TRUE,
    max_execution_time INTERVAL DEFAULT '30 minutes',
    buffer_size_threshold INTEGER,

    -- Performance tracking
    average_execution_time INTERVAL,
    average_events_processed INTEGER,
    consecutive_failures INTEGER DEFAULT 0,
    last_error_message TEXT
);

INSERT INTO buffer_management_schedule (schedule_name, management_function, schedule_cron) VALUES
('event_buffer_cleanup', 'manage_event_stream_buffer()', '*/15 * * * *'), -- Every 15 minutes
('log_partition_cleanup', 'cleanup_old_log_partitions()', '0 2 * * 0'),   -- Weekly at 2 AM
('archive_processed_events', 'archive_old_processed_events()', '0 1 * * *'); -- Daily at 1 AM

-- Manual partition management for log tables
CREATE OR REPLACE FUNCTION create_monthly_log_partitions(months_ahead INTEGER DEFAULT 3)
RETURNS INTEGER AS $$
DECLARE
    partition_count INTEGER := 0;
    partition_date DATE;
    partition_name TEXT;
    partition_start DATE;
    partition_end DATE;
    month_counter INTEGER := 0;

BEGIN
    -- Create partitions for upcoming months
    WHILE month_counter <= months_ahead LOOP
        partition_date := DATE_TRUNC('month', CURRENT_DATE) + (month_counter || ' months')::INTERVAL;
        partition_start := partition_date;
        partition_end := partition_start + INTERVAL '1 month';

        partition_name := 'application_logs_' || TO_CHAR(partition_date, 'YYYY_MM');

        -- Check if partition already exists
        IF NOT EXISTS (
            SELECT 1 FROM pg_tables 
            WHERE tablename = partition_name 
            AND schemaname = 'public'
        ) THEN
            -- Create partition
            EXECUTE format(
                'CREATE TABLE %I PARTITION OF application_logs FOR VALUES FROM (%L) TO (%L)',
                partition_name, partition_start, partition_end
            );

            -- Create indexes on new partition
            EXECUTE format(
                'CREATE INDEX %I ON %I (log_timestamp DESC)',
                'idx_' || partition_name || '_timestamp', partition_name
            );

            EXECUTE format(
                'CREATE INDEX %I ON %I (log_level, application, log_timestamp DESC)',
                'idx_' || partition_name || '_level_app', partition_name
            );

            partition_count := partition_count + 1;

            RAISE NOTICE 'Created partition: % for period % to %', 
                partition_name, partition_start, partition_end;
        END IF;

        month_counter := month_counter + 1;
    END LOOP;

    RETURN partition_count;
END;
$$ LANGUAGE plpgsql;

-- Complex log rotation and cleanup
CREATE OR REPLACE FUNCTION cleanup_old_log_partitions(retention_months INTEGER DEFAULT 6)
RETURNS INTEGER AS $$
DECLARE
    partition_record RECORD;
    dropped_partitions INTEGER := 0;
    retention_threshold DATE;
    partition_cursor CURSOR FOR
        SELECT schemaname, tablename,
               SUBSTRING(tablename FROM 'application_logs_([0-9]{4}_[0-9]{2})$') as period_str
        FROM pg_tables 
        WHERE tablename LIKE 'application_logs_2%'
        AND schemaname = 'public';

BEGIN
    retention_threshold := DATE_TRUNC('month', CURRENT_DATE) - (retention_months || ' months')::INTERVAL;

    RAISE NOTICE 'Cleaning up log partitions older than %', retention_threshold;

    FOR partition_record IN partition_cursor LOOP
        DECLARE
            partition_date DATE;
        BEGIN
            -- Parse partition date from table name
            partition_date := TO_DATE(partition_record.period_str, 'YYYY_MM');

            -- Check if partition is old enough to drop
            IF partition_date < retention_threshold THEN
                -- Archive partition data before dropping (if required)
                EXECUTE format(
                    'INSERT INTO application_logs_archive SELECT * FROM %I.%I',
                    partition_record.schemaname, partition_record.tablename
                );

                -- Drop the partition
                EXECUTE format('DROP TABLE %I.%I', 
                    partition_record.schemaname, partition_record.tablename);

                dropped_partitions := dropped_partitions + 1;

                RAISE NOTICE 'Dropped old partition: %', partition_record.tablename;
            END IF;

        EXCEPTION WHEN OTHERS THEN
            RAISE WARNING 'Error processing partition %: %', 
                partition_record.tablename, SQLERRM;
        END;
    END LOOP;

    RETURN dropped_partitions;
END;
$$ LANGUAGE plpgsql;

-- Monitor buffer and partition performance
WITH buffer_performance AS (
    SELECT 
        'event_stream_buffer' as buffer_name,
        COUNT(*) as total_events,
        SUM(event_size_bytes) as total_size_bytes,
        AVG(event_size_bytes) as avg_event_size,
        MIN(event_timestamp) as oldest_event,
        MAX(event_timestamp) as newest_event,

        -- Processing metrics
        COUNT(*) FILTER (WHERE processed = TRUE) as processed_events,
        COUNT(*) FILTER (WHERE processing_error IS NOT NULL) as error_events,
        AVG(processing_attempts) as avg_processing_attempts,

        -- Buffer efficiency
        EXTRACT(EPOCH FROM (MAX(event_timestamp) - MIN(event_timestamp))) / 3600 as timespan_hours,
        COUNT(*) / NULLIF(EXTRACT(EPOCH FROM (MAX(event_timestamp) - MIN(event_timestamp))) / 3600, 0) as events_per_hour

    FROM event_stream_buffer
),

partition_performance AS (
    SELECT 
        schemaname || '.' || tablename as partition_name,
        pg_total_relation_size(schemaname||'.'||tablename) as partition_size_bytes,

        -- Estimate row count (approximate)
        CASE 
            WHEN pg_total_relation_size(schemaname||'.'||tablename) > 0 THEN
                pg_total_relation_size(schemaname||'.'||tablename) / 1024 -- Rough estimate
            ELSE 0
        END as estimated_rows,

        SUBSTRING(tablename FROM 'application_logs_([0-9]{4}_[0-9]{2})$') as time_period

    FROM pg_tables 
    WHERE tablename LIKE 'application_logs_2%'
    AND schemaname = 'public'
)

SELECT 
    -- Buffer performance summary
    bp.buffer_name,
    bp.total_events,
    ROUND(bp.total_size_bytes / (1024 * 1024)::decimal, 2) as total_size_mb,
    ROUND(bp.avg_event_size::decimal, 2) as avg_event_size_bytes,
    bp.timespan_hours,
    ROUND(bp.events_per_hour::decimal, 2) as throughput_events_per_hour,

    -- Processing efficiency
    ROUND((bp.processed_events::decimal / bp.total_events::decimal) * 100, 1) as processing_success_rate,
    bp.error_events,
    ROUND(bp.avg_processing_attempts::decimal, 2) as avg_retry_attempts,

    -- Operational assessment
    CASE 
        WHEN bp.events_per_hour > 10000 THEN 'high_throughput'
        WHEN bp.events_per_hour > 1000 THEN 'medium_throughput' 
        ELSE 'low_throughput'
    END as throughput_classification,

    -- Management recommendations
    CASE 
        WHEN bp.total_events > 500000 THEN 'Buffer approaching capacity - increase cleanup frequency'
        WHEN bp.error_events > bp.total_events * 0.1 THEN 'High error rate - investigate processing issues'
        WHEN bp.avg_processing_attempts > 2 THEN 'Frequent retries - check downstream systems'
        ELSE 'Buffer operating within normal parameters'
    END as operational_recommendation

FROM buffer_performance bp

UNION ALL

SELECT 
    pp.partition_name,
    pp.estimated_rows as total_events,
    ROUND(pp.partition_size_bytes / (1024 * 1024)::decimal, 2) as total_size_mb,
    CASE WHEN pp.estimated_rows > 0 THEN 
        ROUND(pp.partition_size_bytes::decimal / pp.estimated_rows::decimal, 2) 
    ELSE 0 END as avg_event_size_bytes,
    NULL as timespan_hours,
    NULL as throughput_events_per_hour,
    NULL as processing_success_rate,
    NULL as error_events,
    NULL as avg_retry_attempts,

    -- Partition classification
    CASE 
        WHEN pp.partition_size_bytes > 1024 * 1024 * 1024 THEN 'large_partition' -- > 1GB
        WHEN pp.partition_size_bytes > 100 * 1024 * 1024 THEN 'medium_partition' -- > 100MB
        ELSE 'small_partition'
    END as throughput_classification,

    -- Partition management recommendations
    CASE 
        WHEN pp.partition_size_bytes > 5 * 1024 * 1024 * 1024 THEN 'Large partition - consider archival' -- > 5GB
        WHEN pp.time_period < TO_CHAR(CURRENT_DATE - INTERVAL '6 months', 'YYYY_MM') THEN 'Old partition - candidate for cleanup'
        ELSE 'Partition within normal size parameters'
    END as operational_recommendation

FROM partition_performance pp
ORDER BY total_size_mb DESC;

-- Traditional logging limitations:
-- 1. Complex partition management requiring manual creation and maintenance procedures  
-- 2. Resource-intensive cleanup operations affecting application performance and availability
-- 3. Manual buffer overflow handling with complex archival and rotation logic
-- 4. Limited scalability for high-volume streaming data scenarios requiring constant maintenance
-- 5. Operational overhead of monitoring partition sizes, buffer utilization, and cleanup scheduling
-- 6. Complex indexing strategies required for efficient time-series queries across partitions
-- 7. Risk of data loss during partition management operations and buffer overflow conditions
-- 8. Difficult integration with real-time streaming applications requiring tail-able cursors
-- 9. Performance degradation as partition counts increase requiring complex query optimization
-- 10. Manual coordination of cleanup schedules across multiple data retention policies

MongoDB capped collections provide native circular buffer functionality with automatic size management and optimized performance:

// MongoDB Capped Collections - Native circular buffer management for high-performance streaming data
const { MongoClient, ObjectId } = require('mongodb');

// Enterprise-grade MongoDB Capped Collections Manager for High-Performance Data Streams
class MongoCappedCollectionManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'streaming_platform');

    this.config = {
      // Capped collection configuration
      enableTailableCursors: config.enableTailableCursors !== false,
      enableOplogIntegration: config.enableOplogIntegration || false,
      enableMetricsCollection: config.enableMetricsCollection !== false,

      // Performance optimization
      enableIndexOptimization: config.enableIndexOptimization !== false,
      enableCompressionOptimization: config.enableCompressionOptimization || false,
      enableShardingSupport: config.enableShardingSupport || false,

      // Monitoring and alerts
      enablePerformanceMonitoring: config.enablePerformanceMonitoring !== false,
      enableCapacityAlerts: config.enableCapacityAlerts !== false,
      alertThresholdPercent: config.alertThresholdPercent || 85,

      // Advanced features
      enableDataArchiving: config.enableDataArchiving || false,
      enableReplicationOptimization: config.enableReplicationOptimization || false,
      enableBulkInsertOptimization: config.enableBulkInsertOptimization !== false
    };

    // Collection management state
    this.cappedCollections = new Map();
    this.tailableCursors = new Map();
    this.performanceMetrics = new Map();
    this.capacityMonitors = new Map();

    this.initializeManager();
  }

  async initializeManager() {
    console.log('Initializing MongoDB Capped Collections Manager for high-performance streaming...');

    try {
      // Setup capped collections for different streaming scenarios
      await this.setupApplicationLogsCappedCollection();
      await this.setupEventStreamCappedCollection();
      await this.setupRealTimeMetricsCappedCollection();
      await this.setupAuditTrailCappedCollection();
      await this.setupPerformanceMonitoringCollection();

      // Initialize performance monitoring
      if (this.config.enablePerformanceMonitoring) {
        await this.initializePerformanceMonitoring();
      }

      // Setup capacity monitoring
      if (this.config.enableCapacityAlerts) {
        await this.initializeCapacityMonitoring();
      }

      console.log('Capped Collections Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing capped collections manager:', error);
      throw error;
    }
  }

  async setupApplicationLogsCappedCollection() {
    console.log('Setting up application logs capped collection...');

    try {
      const collectionName = 'application_logs';
      const cappedOptions = {
        capped: true,
        size: 1024 * 1024 * 1024, // 1GB size limit
        max: 1000000,              // 1 million document limit

        // Storage optimization
        storageEngine: {
          wiredTiger: {
            configString: 'block_compressor=snappy'
          }
        }
      };

      // Create capped collection with optimized configuration
      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Create optimal indexes for log queries (minimal indexing for capped collections)
      await collection.createIndexes([
        { key: { logLevel: 1, timestamp: 1 }, background: true },
        { key: { application: 1, component: 1 }, background: true },
        { key: { traceId: 1 }, background: true, sparse: true }
      ]);

      // Store collection configuration
      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'application_logging',
        performanceProfile: 'high_throughput',

        // Monitoring configuration
        monitoring: {
          trackInsertRate: true,
          trackSizeUtilization: true,
          trackQueryPerformance: true
        }
      });

      console.log(`Application logs capped collection created: ${cappedOptions.size} bytes, ${cappedOptions.tailable} documents max`);

    } catch (error) {
      if (error.code === 48) {
        // Collection already exists and is capped
        console.log('Application logs capped collection already exists');
        const collection = this.db.collection('application_logs');
        this.cappedCollections.set('application_logs', {
          collection: collection,
          existing: true,
          useCase: 'application_logging'
        });
      } else {
        console.error('Error creating application logs capped collection:', error);
        throw error;
      }
    }
  }

  async setupEventStreamCappedCollection() {
    console.log('Setting up event stream capped collection...');

    try {
      const collectionName = 'event_stream';
      const cappedOptions = {
        capped: true,
        size: 2 * 1024 * 1024 * 1024, // 2GB size limit  
        max: 5000000,                  // 5 million document limit

        // Optimized for streaming workloads
        writeConcern: { w: 1, j: false }, // Fast writes for streaming
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Minimal indexing optimized for insertion order and tailable cursors
      await collection.createIndexes([
        { key: { eventType: 1, timestamp: 1 }, background: true },
        { key: { streamName: 1 }, background: true },
        { key: { correlationId: 1 }, background: true, sparse: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'event_streaming',
        performanceProfile: 'ultra_high_throughput',

        // Advanced streaming features
        streaming: {
          enableTailableCursors: true,
          enableChangeStreams: true,
          bufferOptimized: true,
          realTimeConsumption: true
        }
      });

      console.log(`Event stream capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Event stream capped collection already exists');
        const collection = this.db.collection('event_stream');
        this.cappedCollections.set('event_stream', {
          collection: collection,
          existing: true,
          useCase: 'event_streaming'
        });
      } else {
        console.error('Error creating event stream capped collection:', error);
        throw error;
      }
    }
  }

  async setupRealTimeMetricsCappedCollection() {
    console.log('Setting up real-time metrics capped collection...');

    try {
      const collectionName = 'realtime_metrics';
      const cappedOptions = {
        capped: true,
        size: 512 * 1024 * 1024, // 512MB size limit
        max: 2000000,             // 2 million document limit
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Optimized indexes for metrics queries
      await collection.createIndexes([
        { key: { metricType: 1, timestamp: 1 }, background: true },
        { key: { source: 1, timestamp: -1 }, background: true },
        { key: { aggregationLevel: 1 }, background: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'metrics_streaming',
        performanceProfile: 'time_series_optimized',

        // Metrics-specific configuration
        metrics: {
          enableAggregation: true,
          timeSeriesOptimized: true,
          enableRealTimeAlerts: true
        }
      });

      console.log(`Real-time metrics capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Real-time metrics capped collection already exists');
        const collection = this.db.collection('realtime_metrics');
        this.cappedCollections.set('realtime_metrics', {
          collection: collection,
          existing: true,
          useCase: 'metrics_streaming'
        });
      } else {
        console.error('Error creating real-time metrics capped collection:', error);
        throw error;
      }
    }
  }

  async setupAuditTrailCappedCollection() {
    console.log('Setting up audit trail capped collection...');

    try {
      const collectionName = 'audit_trail';
      const cappedOptions = {
        capped: true,
        size: 256 * 1024 * 1024, // 256MB size limit
        max: 500000,              // 500k document limit

        // Enhanced durability for audit data
        writeConcern: { w: 'majority', j: true }
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Audit-optimized indexes
      await collection.createIndexes([
        { key: { auditType: 1, timestamp: 1 }, background: true },
        { key: { userId: 1, timestamp: -1 }, background: true },
        { key: { resourceId: 1 }, background: true, sparse: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'audit_logging',
        performanceProfile: 'compliance_optimized',

        // Audit-specific features
        audit: {
          immutableInsertOrder: true,
          tamperEvident: true,
          complianceMode: true
        }
      });

      console.log(`Audit trail capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Audit trail capped collection already exists');
        const collection = this.db.collection('audit_trail');
        this.cappedCollections.set('audit_trail', {
          collection: collection,
          existing: true,
          useCase: 'audit_logging'
        });
      } else {
        console.error('Error creating audit trail capped collection:', error);
        throw error;
      }
    }
  }

  async logApplicationEvent(logData) {
    console.log('Logging application event to capped collection...');

    try {
      const logsCollection = this.cappedCollections.get('application_logs').collection;

      const logEntry = {
        _id: new ObjectId(),
        timestamp: new Date(),
        logLevel: logData.level || 'INFO',
        application: logData.application,
        component: logData.component,

        // Log content
        message: logData.message,
        logData: logData.data || {},

        // Context information
        userId: logData.userId,
        sessionId: logData.sessionId,
        requestId: logData.requestId,

        // Performance tracking
        executionTime: logData.executionTime || null,
        memoryUsage: logData.memoryUsage || null,
        cpuUsage: logData.cpuUsage || null,

        // Server context
        hostname: logData.hostname || require('os').hostname(),
        processId: process.pid,
        environment: logData.environment || 'production',

        // Distributed tracing
        traceId: logData.traceId,
        spanId: logData.spanId,
        operation: logData.operation,

        // Capped collection metadata
        insertionOrder: true,
        streamingOptimized: true
      };

      // High-performance insert optimized for capped collections
      const result = await logsCollection.insertOne(logEntry, {
        writeConcern: { w: 1, j: false } // Fast writes for logging
      });

      // Update performance metrics
      await this.updateCollectionMetrics('application_logs', 'insert', logEntry);

      console.log(`Application log inserted: ${result.insertedId}`);

      return {
        logId: result.insertedId,
        timestamp: logEntry.timestamp,
        cappedCollection: true,
        insertionOrder: logEntry.insertionOrder
      };

    } catch (error) {
      console.error('Error logging application event:', error);
      throw error;
    }
  }

  async streamEvent(eventData) {
    console.log('Streaming event to capped collection...');

    try {
      const eventCollection = this.cappedCollections.get('event_stream').collection;

      const streamEvent = {
        _id: new ObjectId(),
        timestamp: new Date(),
        eventType: eventData.type,
        eventSource: eventData.source,

        // Event payload
        eventData: eventData.payload || {},
        eventVersion: eventData.version || '1.0',
        schemaVersion: eventData.schemaVersion || 1,

        // Stream metadata
        streamName: eventData.streamName,
        partitionKey: eventData.partitionKey,
        sequenceNumber: Date.now(), // Monotonic sequence

        // Processing metadata
        processed: false,
        processingAttempts: 0,

        // Correlation and tracing
        correlationId: eventData.correlationId,
        causationId: eventData.causationId,

        // Performance optimization
        eventSizeBytes: JSON.stringify(eventData.payload || {}).length,
        ingestionLatency: eventData.ingestionLatency || null,

        // Streaming optimization
        tailableReady: true,
        bufferOptimized: true
      };

      // Ultra-high-performance insert for streaming
      const result = await eventCollection.insertOne(streamEvent, {
        writeConcern: { w: 1, j: false }
      });

      // Update streaming metrics
      await this.updateCollectionMetrics('event_stream', 'stream', streamEvent);

      console.log(`Stream event inserted: ${result.insertedId}`);

      return {
        eventId: result.insertedId,
        sequenceNumber: streamEvent.sequenceNumber,
        streamName: streamEvent.streamName,
        cappedOptimized: true
      };

    } catch (error) {
      console.error('Error streaming event:', error);
      throw error;
    }
  }

  async recordMetric(metricData) {
    console.log('Recording real-time metric to capped collection...');

    try {
      const metricsCollection = this.cappedCollections.get('realtime_metrics').collection;

      const metric = {
        _id: new ObjectId(),
        timestamp: new Date(),
        metricType: metricData.type,
        metricName: metricData.name,

        // Metric values
        value: metricData.value,
        unit: metricData.unit || 'count',
        tags: metricData.tags || {},

        // Source information
        source: metricData.source,
        sourceType: metricData.sourceType || 'application',

        // Aggregation metadata
        aggregationLevel: metricData.aggregationLevel || 'raw',
        aggregationWindow: metricData.aggregationWindow || null,

        // Time series optimization
        timeSeriesOptimized: true,
        bucketTimestamp: new Date(Math.floor(Date.now() / (60 * 1000)) * 60 * 1000), // 1-minute buckets

        // Performance metadata
        collectionTimestamp: Date.now(),
        processingLatency: metricData.processingLatency || null
      };

      // Time-series optimized insert
      const result = await metricsCollection.insertOne(metric, {
        writeConcern: { w: 1, j: false }
      });

      // Update metrics collection performance
      await this.updateCollectionMetrics('realtime_metrics', 'metric', metric);

      console.log(`Real-time metric recorded: ${result.insertedId}`);

      return {
        metricId: result.insertedId,
        metricType: metric.metricType,
        timestamp: metric.timestamp,
        timeSeriesOptimized: true
      };

    } catch (error) {
      console.error('Error recording metric:', error);
      throw error;
    }
  }

  async createTailableCursor(collectionName, options = {}) {
    console.log(`Creating tailable cursor for collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found in capped collections`);
      }

      if (!collectionConfig.tailableSupported) {
        throw new Error(`Collection ${collectionName} does not support tailable cursors`);
      }

      const collection = collectionConfig.collection;

      // Configure tailable cursor options
      const tailableOptions = {
        tailable: true,
        awaitData: true,
        noCursorTimeout: true,
        maxTimeMS: options.maxTimeMS || 1000,
        batchSize: options.batchSize || 100,

        // Starting position
        sort: { $natural: 1 }, // Natural insertion order
        ...(options.filter || {})
      };

      // Create cursor starting from specified position or latest
      let cursor;
      if (options.fromTimestamp) {
        cursor = collection.find({ 
          timestamp: { $gte: options.fromTimestamp },
          ...(options.additionalFilter || {})
        }, tailableOptions);
      } else if (options.fromLatest) {
        // Start from the end of the collection
        const lastDoc = await collection.findOne({}, { sort: { $natural: -1 } });
        if (lastDoc) {
          cursor = collection.find({ 
            _id: { $gt: lastDoc._id },
            ...(options.additionalFilter || {})
          }, tailableOptions);
        } else {
          cursor = collection.find(options.additionalFilter || {}, tailableOptions);
        }
      } else {
        cursor = collection.find(options.additionalFilter || {}, tailableOptions);
      }

      // Store cursor for management
      const cursorId = `${collectionName}_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
      this.tailableCursors.set(cursorId, {
        cursor: cursor,
        collectionName: collectionName,
        options: tailableOptions,
        createdAt: new Date(),
        active: true,

        // Performance tracking
        documentsRead: 0,
        lastActivity: new Date()
      });

      console.log(`Tailable cursor created: ${cursorId} for collection ${collectionName}`);

      return {
        cursorId: cursorId,
        cursor: cursor,
        collectionName: collectionName,
        tailableEnabled: true,
        awaitData: tailableOptions.awaitData
      };

    } catch (error) {
      console.error(`Error creating tailable cursor for ${collectionName}:`, error);
      throw error;
    }
  }

  async streamFromTailableCursor(cursorId, eventHandler, errorHandler) {
    console.log(`Starting streaming from tailable cursor: ${cursorId}`);

    try {
      const cursorInfo = this.tailableCursors.get(cursorId);
      if (!cursorInfo || !cursorInfo.active) {
        throw new Error(`Tailable cursor ${cursorId} not found or inactive`);
      }

      const cursor = cursorInfo.cursor;
      let streaming = true;

      // Process documents as they arrive
      while (streaming && cursorInfo.active) {
        try {
          const hasNext = await cursor.hasNext();

          if (hasNext) {
            const document = await cursor.next();

            // Update cursor activity
            cursorInfo.documentsRead++;
            cursorInfo.lastActivity = new Date();

            // Call event handler
            if (eventHandler) {
              const continueStreaming = await eventHandler(document, {
                cursorId: cursorId,
                collectionName: cursorInfo.collectionName,
                documentsRead: cursorInfo.documentsRead
              });

              if (continueStreaming === false) {
                streaming = false;
              }
            }

          } else {
            // Wait for new data (cursor will block until new documents arrive)
            await new Promise(resolve => setTimeout(resolve, 100));
          }

        } catch (cursorError) {
          console.error(`Error in tailable cursor streaming:`, cursorError);

          if (errorHandler) {
            const shouldContinue = await errorHandler(cursorError, {
              cursorId: cursorId,
              collectionName: cursorInfo.collectionName
            });

            if (!shouldContinue) {
              streaming = false;
            }
          } else {
            streaming = false;
          }
        }
      }

      console.log(`Streaming completed for cursor: ${cursorId}`);

    } catch (error) {
      console.error(`Error streaming from tailable cursor ${cursorId}:`, error);
      throw error;
    }
  }

  async bulkInsertToStream(collectionName, documents, options = {}) {
    console.log(`Performing bulk insert to capped collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found in capped collections`);
      }

      const collection = collectionConfig.collection;

      // Prepare documents with capped collection optimization
      const optimizedDocuments = documents.map(doc => ({
        _id: new ObjectId(),
        timestamp: doc.timestamp || new Date(),
        ...doc,

        // Capped collection metadata
        insertionOrder: true,
        bulkInserted: true,
        batchId: options.batchId || new ObjectId().toString()
      }));

      // Perform optimized bulk insert
      const bulkOptions = {
        ordered: options.ordered !== false,
        writeConcern: { w: 1, j: false }, // Optimized for throughput
        bypassDocumentValidation: options.bypassValidation || false
      };

      const result = await collection.insertMany(optimizedDocuments, bulkOptions);

      // Update bulk performance metrics
      await this.updateCollectionMetrics(collectionName, 'bulk_insert', {
        documentsInserted: optimizedDocuments.length,
        batchSize: optimizedDocuments.length,
        bulkOperation: true
      });

      console.log(`Bulk insert completed: ${result.insertedCount} documents inserted to ${collectionName}`);

      return {
        insertedCount: result.insertedCount,
        insertedIds: Object.values(result.insertedIds),
        batchId: options.batchId,
        cappedOptimized: true,
        insertionOrder: true
      };

    } catch (error) {
      console.error(`Error performing bulk insert to ${collectionName}:`, error);
      throw error;
    }
  }

  async getCollectionStats(collectionName) {
    console.log(`Retrieving statistics for capped collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found`);
      }

      const collection = collectionConfig.collection;

      // Get MongoDB collection statistics
      const stats = await this.db.command({ collStats: collectionName });

      // Get collection configuration
      const cappedOptions = collectionConfig.cappedOptions;

      // Calculate utilization metrics
      const sizeUtilization = (stats.size / cappedOptions.size) * 100;
      const countUtilization = cappedOptions.max ? (stats.count / cappedOptions.max) * 100 : 0;

      // Get recent activity metrics
      const performanceMetrics = this.performanceMetrics.get(collectionName) || {};

      const collectionStats = {
        collectionName: collectionName,
        cappedCollection: stats.capped,
        useCase: collectionConfig.useCase,
        performanceProfile: collectionConfig.performanceProfile,

        // Size and capacity metrics
        currentSize: stats.size,
        maxSize: cappedOptions.size,
        sizeUtilization: Math.round(sizeUtilization * 100) / 100,

        currentCount: stats.count,
        maxCount: cappedOptions.max || null,
        countUtilization: Math.round(countUtilization * 100) / 100,

        // Storage details
        avgDocumentSize: stats.avgObjSize,
        storageSize: stats.storageSize,
        totalIndexSize: stats.totalIndexSize,
        indexSizes: stats.indexSizes,

        // Performance indicators
        insertRate: performanceMetrics.insertRate || 0,
        queryRate: performanceMetrics.queryRate || 0,
        lastInsertTime: performanceMetrics.lastInsertTime || null,

        // Capped collection specific
        insertionOrder: collectionConfig.insertionOrder,
        tailableSupported: collectionConfig.tailableSupported,

        // Operational status
        healthStatus: this.assessCollectionHealth(sizeUtilization, countUtilization),
        recommendations: this.generateRecommendations(collectionName, sizeUtilization, performanceMetrics)
      };

      console.log(`Statistics retrieved for ${collectionName}: ${collectionStats.currentCount} documents, ${collectionStats.sizeUtilization}% capacity`);

      return collectionStats;

    } catch (error) {
      console.error(`Error retrieving statistics for ${collectionName}:`, error);
      throw error;
    }
  }

  // Utility methods for capped collection management

  async updateCollectionMetrics(collectionName, operation, metadata) {
    if (!this.config.enableMetricsCollection) return;

    const now = new Date();
    const metrics = this.performanceMetrics.get(collectionName) || {
      insertCount: 0,
      insertRate: 0,
      queryCount: 0,
      queryRate: 0,
      lastInsertTime: null,
      lastQueryTime: null,
      operationHistory: []
    };

    // Update operation counts and rates
    if (operation === 'insert' || operation === 'stream' || operation === 'bulk_insert') {
      metrics.insertCount += metadata.documentsInserted || 1;
      metrics.lastInsertTime = now;

      // Calculate insert rate (operations per second over last minute)
      const oneMinuteAgo = new Date(now.getTime() - 60000);
      const recentInserts = metrics.operationHistory.filter(
        op => op.type === 'insert' && op.timestamp > oneMinuteAgo
      ).length;
      metrics.insertRate = recentInserts;
    }

    // Record operation in history
    metrics.operationHistory.push({
      type: operation,
      timestamp: now,
      metadata: metadata
    });

    // Keep only last 1000 operations for performance
    if (metrics.operationHistory.length > 1000) {
      metrics.operationHistory = metrics.operationHistory.slice(-1000);
    }

    this.performanceMetrics.set(collectionName, metrics);
  }

  assessCollectionHealth(sizeUtilization, countUtilization) {
    const maxUtilization = Math.max(sizeUtilization, countUtilization);

    if (maxUtilization >= 95) return 'critical';
    if (maxUtilization >= 85) return 'warning';
    if (maxUtilization >= 70) return 'caution';
    return 'healthy';
  }

  generateRecommendations(collectionName, sizeUtilization, performanceMetrics) {
    const recommendations = [];

    if (sizeUtilization > 85) {
      recommendations.push('Consider increasing capped collection size limit');
    }

    if (performanceMetrics.insertRate > 10000) {
      recommendations.push('High insert rate detected - consider bulk insert optimization');
    }

    if (sizeUtilization < 30 && performanceMetrics.insertRate < 100) {
      recommendations.push('Collection may be oversized for current workload');
    }

    return recommendations;
  }

  async closeTailableCursor(cursorId) {
    console.log(`Closing tailable cursor: ${cursorId}`);

    try {
      const cursorInfo = this.tailableCursors.get(cursorId);
      if (cursorInfo) {
        cursorInfo.active = false;
        await cursorInfo.cursor.close();
        this.tailableCursors.delete(cursorId);
        console.log(`Tailable cursor closed: ${cursorId}`);
      }
    } catch (error) {
      console.error(`Error closing tailable cursor ${cursorId}:`, error);
    }
  }

  async cleanup() {
    console.log('Cleaning up Capped Collections Manager...');

    // Close all tailable cursors
    for (const [cursorId, cursorInfo] of this.tailableCursors) {
      try {
        cursorInfo.active = false;
        await cursorInfo.cursor.close();
      } catch (error) {
        console.error(`Error closing cursor ${cursorId}:`, error);
      }
    }

    // Clear all management state
    this.cappedCollections.clear();
    this.tailableCursors.clear();
    this.performanceMetrics.clear();
    this.capacityMonitors.clear();

    console.log('Capped Collections Manager cleanup completed');
  }
}

// Example usage demonstrating high-performance streaming with capped collections
async function demonstrateHighPerformanceStreaming() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const cappedManager = new MongoCappedCollectionManager(client, {
    database: 'high_performance_streaming',
    enableTailableCursors: true,
    enableMetricsCollection: true,
    enablePerformanceMonitoring: true
  });

  try {
    // Demonstrate high-volume application logging
    console.log('Demonstrating high-performance application logging...');
    const logPromises = [];
    for (let i = 0; i < 1000; i++) {
      logPromises.push(cappedManager.logApplicationEvent({
        level: ['INFO', 'WARN', 'ERROR'][Math.floor(Math.random() * 3)],
        application: 'web-api',
        component: 'user-service',
        message: `Processing user request ${i}`,
        data: {
          userId: `user_${Math.floor(Math.random() * 1000)}`,
          operation: 'profile_update',
          executionTime: Math.floor(Math.random() * 100) + 10
        },
        traceId: `trace_${i}`,
        requestId: `req_${Date.now()}_${i}`
      }));
    }
    await Promise.all(logPromises);
    console.log('High-volume logging completed');

    // Demonstrate event streaming with tailable cursor
    console.log('Demonstrating real-time event streaming...');
    const tailableCursor = await cappedManager.createTailableCursor('event_stream', {
      fromLatest: true,
      batchSize: 50
    });

    // Start streaming events in background
    const streamingPromise = cappedManager.streamFromTailableCursor(
      tailableCursor.cursorId,
      async (document, context) => {
        console.log(`Streamed event: ${document.eventType} from ${document.eventSource}`);
        return context.documentsRead < 100; // Stop after 100 events
      },
      async (error, context) => {
        console.error(`Streaming error:`, error.message);
        return false; // Stop on error
      }
    );

    // Generate stream events
    const eventPromises = [];
    for (let i = 0; i < 100; i++) {
      eventPromises.push(cappedManager.streamEvent({
        type: ['page_view', 'user_action', 'system_event'][Math.floor(Math.random() * 3)],
        source: 'web_application',
        streamName: 'user_activity',
        payload: {
          userId: `user_${Math.floor(Math.random() * 100)}`,
          action: 'click',
          page: '/dashboard',
          timestamp: new Date()
        },
        correlationId: `correlation_${i}`
      }));

      // Add small delay to demonstrate real-time streaming
      if (i % 10 === 0) {
        await new Promise(resolve => setTimeout(resolve, 10));
      }
    }

    await Promise.all(eventPromises);
    await streamingPromise;

    // Demonstrate bulk metrics insertion
    console.log('Demonstrating bulk metrics recording...');
    const metrics = [];
    for (let i = 0; i < 500; i++) {
      metrics.push({
        type: 'performance',
        name: 'response_time',
        value: Math.floor(Math.random() * 1000) + 50,
        unit: 'milliseconds',
        source: 'api-gateway',
        tags: {
          endpoint: '/api/users',
          method: 'GET',
          status_code: 200
        }
      });
    }

    await cappedManager.bulkInsertToStream('realtime_metrics', metrics, {
      batchId: 'metrics_batch_' + Date.now()
    });

    // Get collection statistics
    const logsStats = await cappedManager.getCollectionStats('application_logs');
    const eventsStats = await cappedManager.getCollectionStats('event_stream');
    const metricsStats = await cappedManager.getCollectionStats('realtime_metrics');

    console.log('High-Performance Streaming Results:');
    console.log('Application Logs Stats:', {
      count: logsStats.currentCount,
      sizeUtilization: logsStats.sizeUtilization,
      healthStatus: logsStats.healthStatus
    });
    console.log('Event Stream Stats:', {
      count: eventsStats.currentCount,
      sizeUtilization: eventsStats.sizeUtilization,
      healthStatus: eventsStats.healthStatus
    });
    console.log('Metrics Stats:', {
      count: metricsStats.currentCount,
      sizeUtilization: metricsStats.sizeUtilization,
      healthStatus: metricsStats.healthStatus
    });

    return {
      logsStats,
      eventsStats,
      metricsStats,
      tailableCursorDemo: true,
      bulkInsertDemo: true
    };

  } catch (error) {
    console.error('Error demonstrating high-performance streaming:', error);
    throw error;
  } finally {
    await cappedManager.cleanup();
    await client.close();
  }
}

// Benefits of MongoDB Capped Collections:
// - Native circular buffer functionality eliminates manual buffer overflow management
// - Guaranteed insertion order maintains chronological data integrity for time-series applications  
// - Automatic size management prevents storage bloat without external cleanup procedures
// - Tailable cursors enable real-time streaming applications with minimal latency
// - Optimized storage patterns provide superior performance for high-volume append-only workloads
// - Zero-maintenance operation reduces operational overhead compared to traditional logging systems
// - Built-in FIFO behavior ensures oldest data is automatically removed when capacity limits are reached
// - Integration with MongoDB's replication and sharding for distributed streaming architectures

module.exports = {
  MongoCappedCollectionManager,
  demonstrateHighPerformanceStreaming
};

SQL-Style Capped Collection Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB capped collections and circular buffer management:

-- QueryLeaf capped collections with SQL-familiar circular buffer management syntax

-- Configure capped collection settings and performance optimization
SET capped_collection_monitoring = true;
SET enable_tailable_cursors = true; 
SET enable_performance_metrics = true;
SET default_capped_size_mb = 1024; -- 1GB default
SET default_capped_max_documents = 1000000;
SET enable_bulk_insert_optimization = true;

-- Create capped collections with circular buffer functionality
WITH capped_collection_definitions AS (
  SELECT 
    collection_name,
    capped_size_bytes,
    max_document_count,
    use_case,
    performance_profile,

    -- Collection optimization settings
    JSON_BUILD_OBJECT(
      'capped', true,
      'size', capped_size_bytes,
      'max', max_document_count,
      'storageEngine', JSON_BUILD_OBJECT(
        'wiredTiger', JSON_BUILD_OBJECT(
          'configString', 'block_compressor=snappy'
        )
      )
    ) as creation_options,

    -- Index configuration for capped collections
    CASE use_case
      WHEN 'application_logging' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('logLevel', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('application', 1, 'component', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('traceId', 1), 'sparse', true)
      ]
      WHEN 'event_streaming' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('eventType', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('streamName', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('correlationId', 1), 'sparse', true)
      ]
      WHEN 'metrics_collection' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('metricType', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('source', 1, 'timestamp', -1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('aggregationLevel', 1))
      ]
      ELSE ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('timestamp', 1))
      ]
    END as index_configuration

  FROM (VALUES
    ('application_logs_capped', 1024 * 1024 * 1024, 1000000, 'application_logging', 'high_throughput'),
    ('event_stream_capped', 2048 * 1024 * 1024, 5000000, 'event_streaming', 'ultra_high_throughput'),
    ('realtime_metrics_capped', 512 * 1024 * 1024, 2000000, 'metrics_collection', 'time_series_optimized'),
    ('audit_trail_capped', 256 * 1024 * 1024, 500000, 'audit_logging', 'compliance_optimized'),
    ('system_events_capped', 128 * 1024 * 1024, 250000, 'system_monitoring', 'operational_tracking')
  ) AS collections(collection_name, capped_size_bytes, max_document_count, use_case, performance_profile)
),

-- High-performance application logging with capped collections
application_logs_streaming AS (
  INSERT INTO application_logs_capped
  SELECT 
    GENERATE_UUID() as log_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '1 hour') as timestamp,

    -- Log classification and severity
    (ARRAY['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL'])
      [1 + floor(random() * 5)] as log_level,
    (ARRAY['web-api', 'auth-service', 'data-processor', 'notification-service'])
      [1 + floor(random() * 4)] as application,
    (ARRAY['controller', 'service', 'repository', 'middleware'])
      [1 + floor(random() * 4)] as component,

    -- Log content and context
    'Processing request for user operation ' || generate_series(1, 10000) as message,
    JSON_BUILD_OBJECT(
      'userId', 'user_' || (1 + floor(random() * 1000)),
      'operation', (ARRAY['create', 'read', 'update', 'delete', 'search'])[1 + floor(random() * 5)],
      'executionTime', floor(random() * 500) + 10,
      'memoryUsage', ROUND((random() * 100 + 50)::decimal, 2),
      'requestSize', floor(random() * 10000) + 100
    ) as log_data,

    -- Request correlation and tracing
    'req_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) || '_' || generate_series(1, 10000) as request_id,
    'session_' || (1 + floor(random() * 1000)) as session_id,
    'trace_' || generate_series(1, 10000) as trace_id,
    'span_' || generate_series(1, 10000) as span_id,

    -- Server and environment context
    ('server_' || (1 + floor(random() * 10))) as hostname,
    (1000 + floor(random() * 9000)) as process_id,
    'production' as environment,

    -- Capped collection metadata
    true as insertion_order_guaranteed,
    true as circular_buffer_managed,
    'high_throughput' as performance_optimized
  RETURNING log_id, timestamp, log_level, application
),

-- Real-time event streaming with automatic buffer management
event_stream_operations AS (
  INSERT INTO event_stream_capped
  SELECT 
    GENERATE_UUID() as event_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '30 minutes') as timestamp,

    -- Event classification
    (ARRAY['page_view', 'user_action', 'system_event', 'api_call', 'data_change'])
      [1 + floor(random() * 5)] as event_type,
    (ARRAY['web_app', 'mobile_app', 'api_gateway', 'background_service'])
      [1 + floor(random() * 4)] as event_source,

    -- Event payload and metadata
    JSON_BUILD_OBJECT(
      'userId', 'user_' || (1 + floor(random() * 500)),
      'action', (ARRAY['click', 'view', 'submit', 'navigate', 'search'])[1 + floor(random() * 5)],
      'page', (ARRAY['/dashboard', '/profile', '/settings', '/reports', '/admin'])[1 + floor(random() * 5)],
      'duration', floor(random() * 5000) + 100,
      'userAgent', 'Mozilla/5.0 (Enterprise Browser)',
      'ipAddress', '192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254))
    ) as event_data,

    -- Streaming metadata
    (ARRAY['user_activity', 'system_monitoring', 'api_analytics', 'security_events'])
      [1 + floor(random() * 4)] as stream_name,
    'partition_' || (1 + floor(random() * 10)) as partition_key,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) * 1000000 + generate_series(1, 50000) as sequence_number,

    -- Processing and correlation
    false as processed,
    0 as processing_attempts,
    'correlation_' || generate_series(1, 50000) as correlation_id,

    -- Performance optimization metadata
    JSON_LENGTH(event_data::text) as event_size_bytes,
    floor(random() * 50) + 5 as ingestion_latency_ms,

    -- Capped collection optimization
    true as tailable_cursor_ready,
    true as buffer_optimized,
    true as insertion_order_maintained
  RETURNING event_id, event_type, stream_name, sequence_number
),

-- High-frequency metrics collection with time-series optimization  
metrics_collection_operations AS (
  INSERT INTO realtime_metrics_capped
  SELECT 
    GENERATE_UUID() as metric_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '15 minutes') as timestamp,

    -- Metric classification
    (ARRAY['performance', 'business', 'system', 'security', 'custom'])
      [1 + floor(random() * 5)] as metric_type,
    (ARRAY['response_time', 'throughput', 'error_rate', 'cpu_usage', 'memory_usage', 'disk_io', 'network_latency'])
      [1 + floor(random() * 7)] as metric_name,

    -- Metric values and units
    CASE 
      WHEN metric_name IN ('response_time', 'network_latency') THEN random() * 1000 + 10
      WHEN metric_name = 'cpu_usage' THEN random() * 100
      WHEN metric_name = 'memory_usage' THEN random() * 16 + 2  -- GB
      WHEN metric_name = 'error_rate' THEN random() * 5
      WHEN metric_name = 'throughput' THEN random() * 10000 + 100
      ELSE random() * 1000
    END as value,

    CASE 
      WHEN metric_name IN ('response_time', 'network_latency') THEN 'milliseconds'
      WHEN metric_name IN ('cpu_usage', 'error_rate') THEN 'percent'
      WHEN metric_name = 'memory_usage' THEN 'gigabytes'
      WHEN metric_name = 'throughput' THEN 'requests_per_second'
      ELSE 'count'
    END as unit,

    -- Source and tagging
    (ARRAY['api-gateway', 'web-server', 'database', 'cache', 'queue'])
      [1 + floor(random() * 5)] as source,
    'application' as source_type,

    JSON_BUILD_OBJECT(
      'environment', 'production',
      'region', (ARRAY['us-east-1', 'us-west-2', 'eu-west-1'])[1 + floor(random() * 3)],
      'service', (ARRAY['auth', 'users', 'orders', 'notifications'])[1 + floor(random() * 4)],
      'instance', 'instance_' || (1 + floor(random() * 20))
    ) as tags,

    -- Time series optimization
    'raw' as aggregation_level,
    NULL as aggregation_window,

    -- Bucketing for time-series efficiency (1-minute buckets)
    DATE_TRUNC('minute', CURRENT_TIMESTAMP) as bucket_timestamp,

    -- Performance metadata
    true as time_series_optimized,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) * 1000 as collection_timestamp_ms,
    floor(random() * 10) + 1 as processing_latency_ms
  RETURNING metric_id, metric_type, metric_name, value, source
),

-- Monitor capped collection performance and utilization
capped_collection_monitoring AS (
  SELECT 
    collection_name,
    use_case,
    performance_profile,

    -- Collection capacity analysis
    capped_size_bytes as max_size_bytes,
    max_document_count as max_documents,

    -- Simulated current utilization (in production would query actual stats)
    CASE collection_name
      WHEN 'application_logs_capped' THEN floor(random() * 800000) + 100000  -- 100k-900k docs
      WHEN 'event_stream_capped' THEN floor(random() * 4000000) + 500000   -- 500k-4.5M docs  
      WHEN 'realtime_metrics_capped' THEN floor(random() * 1500000) + 200000 -- 200k-1.7M docs
      WHEN 'audit_trail_capped' THEN floor(random() * 300000) + 50000       -- 50k-350k docs
      ELSE floor(random() * 100000) + 10000
    END as current_document_count,

    -- Estimated current size (simplified calculation)
    CASE collection_name  
      WHEN 'application_logs_capped' THEN floor(random() * 800000000) + 100000000  -- 100MB-800MB
      WHEN 'event_stream_capped' THEN floor(random() * 1600000000) + 200000000    -- 200MB-1.6GB
      WHEN 'realtime_metrics_capped' THEN floor(random() * 400000000) + 50000000  -- 50MB-400MB
      WHEN 'audit_trail_capped' THEN floor(random() * 200000000) + 25000000       -- 25MB-200MB
      ELSE floor(random() * 50000000) + 10000000
    END as current_size_bytes,

    -- Performance simulation
    CASE performance_profile
      WHEN 'ultra_high_throughput' THEN floor(random() * 50000) + 10000  -- 10k-60k inserts/sec
      WHEN 'high_throughput' THEN floor(random() * 20000) + 5000         -- 5k-25k inserts/sec
      WHEN 'time_series_optimized' THEN floor(random() * 15000) + 3000   -- 3k-18k inserts/sec
      WHEN 'compliance_optimized' THEN floor(random() * 5000) + 1000     -- 1k-6k inserts/sec
      ELSE floor(random() * 2000) + 500                                  -- 500-2.5k inserts/sec
    END as estimated_insert_rate_per_sec

  FROM capped_collection_definitions
),

-- Calculate utilization metrics and health assessment
capped_utilization_analysis AS (
  SELECT 
    ccm.collection_name,
    ccm.use_case,
    ccm.performance_profile,

    -- Capacity utilization
    ccm.current_document_count,
    ccm.max_documents,
    ROUND((ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100, 1) as document_utilization_percent,

    ccm.current_size_bytes,
    ccm.max_size_bytes,
    ROUND((ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100, 1) as size_utilization_percent,

    -- Performance metrics
    ccm.estimated_insert_rate_per_sec,
    ROUND(ccm.current_size_bytes::decimal / ccm.current_document_count::decimal, 2) as avg_document_size_bytes,

    -- Storage efficiency
    ROUND(ccm.current_size_bytes / (1024 * 1024)::decimal, 2) as current_size_mb,
    ROUND(ccm.max_size_bytes / (1024 * 1024)::decimal, 2) as max_size_mb,

    -- Operational assessment
    CASE 
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 95 THEN 'critical'
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 85 THEN 'warning'
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 70 THEN 'caution'
      ELSE 'healthy'
    END as health_status,

    -- Throughput assessment
    CASE 
      WHEN ccm.estimated_insert_rate_per_sec > 25000 THEN 'ultra_high'
      WHEN ccm.estimated_insert_rate_per_sec > 10000 THEN 'high'
      WHEN ccm.estimated_insert_rate_per_sec > 5000 THEN 'medium'
      WHEN ccm.estimated_insert_rate_per_sec > 1000 THEN 'moderate'
      ELSE 'low'
    END as throughput_classification

  FROM capped_collection_monitoring ccm
),

-- Generate optimization recommendations
capped_optimization_recommendations AS (
  SELECT 
    cua.collection_name,
    cua.health_status,
    cua.throughput_classification,
    cua.document_utilization_percent,
    cua.size_utilization_percent,

    -- Capacity recommendations
    CASE 
      WHEN cua.size_utilization_percent > 90 THEN 'Increase capped collection size immediately'
      WHEN cua.document_utilization_percent > 90 THEN 'Increase document count limit immediately'
      WHEN cua.size_utilization_percent > 80 THEN 'Monitor closely and consider size increase'
      WHEN cua.size_utilization_percent < 30 AND cua.throughput_classification = 'low' THEN 'Consider reducing collection size for efficiency'
      ELSE 'Capacity within optimal range'
    END as capacity_recommendation,

    -- Performance recommendations
    CASE 
      WHEN cua.throughput_classification = 'ultra_high' THEN 'Optimize for maximum throughput with bulk inserts'
      WHEN cua.throughput_classification = 'high' THEN 'Enable write optimization and consider sharding'
      WHEN cua.throughput_classification = 'medium' THEN 'Standard configuration appropriate'
      WHEN cua.throughput_classification = 'low' THEN 'Consider consolidating with other collections'
      ELSE 'Review usage patterns'
    END as performance_recommendation,

    -- Operational recommendations
    CASE 
      WHEN cua.health_status = 'critical' THEN 'Immediate intervention required'
      WHEN cua.health_status = 'warning' THEN 'Plan capacity expansion within 24 hours'
      WHEN cua.health_status = 'caution' THEN 'Monitor usage trends and prepare for expansion'
      ELSE 'Continue monitoring with current configuration'
    END as operational_recommendation,

    -- Efficiency metrics
    ROUND(cua.estimated_insert_rate_per_sec::decimal / (cua.size_utilization_percent / 100::decimal), 2) as efficiency_ratio,

    -- Projected timeline to capacity
    CASE 
      WHEN cua.estimated_insert_rate_per_sec > 0 AND cua.size_utilization_percent < 95 THEN
        ROUND(
          (cua.max_documents - cua.current_document_count)::decimal / 
          (cua.estimated_insert_rate_per_sec::decimal * 3600), 
          1
        )
      ELSE NULL
    END as hours_to_document_capacity,

    -- Circular buffer efficiency
    CASE 
      WHEN cua.size_utilization_percent > 90 THEN 'Active circular buffer management'
      WHEN cua.size_utilization_percent > 70 THEN 'Approaching circular buffer activation' 
      ELSE 'Pre-circular buffer phase'
    END as circular_buffer_status

  FROM capped_utilization_analysis cua
)

-- Comprehensive capped collections management dashboard
SELECT 
  cor.collection_name,
  cor.use_case,
  cor.throughput_classification,
  cor.health_status,

  -- Current state
  cua.current_document_count as documents,
  cua.document_utilization_percent || '%' as doc_utilization,
  cua.current_size_mb || ' MB' as current_size,
  cua.size_utilization_percent || '%' as size_utilization,

  -- Performance metrics
  cua.estimated_insert_rate_per_sec as inserts_per_second,
  ROUND(cua.avg_document_size_bytes / 1024, 2) || ' KB' as avg_doc_size,
  cor.efficiency_ratio as efficiency_score,

  -- Capacity management
  cor.circular_buffer_status,
  COALESCE(cor.hours_to_document_capacity || ' hours', 'N/A') as time_to_capacity,

  -- Operational guidance
  cor.capacity_recommendation,
  cor.performance_recommendation,
  cor.operational_recommendation,

  -- Capped collection benefits
  JSON_BUILD_OBJECT(
    'guaranteed_insertion_order', true,
    'automatic_size_management', true,
    'circular_buffer_behavior', true,
    'tailable_cursor_support', true,
    'high_performance_writes', true,
    'zero_maintenance_required', true
  ) as capped_collection_features,

  -- Next actions
  CASE cor.health_status
    WHEN 'critical' THEN 'Execute capacity expansion immediately'
    WHEN 'warning' THEN 'Schedule capacity planning meeting'
    WHEN 'caution' THEN 'Increase monitoring frequency'
    ELSE 'Continue standard monitoring'
  END as immediate_actions,

  -- Optimization opportunities
  CASE 
    WHEN cor.throughput_classification = 'ultra_high' AND cua.size_utilization_percent < 50 THEN 
      'Optimize collection size for current throughput'
    WHEN cor.efficiency_ratio > 1000 THEN 
      'Excellent efficiency - consider as template for other collections'
    WHEN cor.efficiency_ratio < 100 THEN
      'Review configuration for efficiency improvements'
    ELSE 'Configuration optimized for current workload'
  END as optimization_opportunities

FROM capped_optimization_recommendations cor
JOIN capped_utilization_analysis cua ON cor.collection_name = cua.collection_name
ORDER BY 
  CASE cor.health_status
    WHEN 'critical' THEN 1
    WHEN 'warning' THEN 2  
    WHEN 'caution' THEN 3
    ELSE 4
  END,
  cua.size_utilization_percent DESC;

-- QueryLeaf provides comprehensive MongoDB capped collection capabilities:
-- 1. Native circular buffer functionality with SQL-familiar collection management syntax
-- 2. Automatic size and document count management without manual cleanup procedures
-- 3. High-performance streaming applications with tailable cursor and real-time processing support
-- 4. Time-series optimized storage patterns for metrics, logs, and event data
-- 5. Enterprise-grade monitoring with capacity utilization and performance analytics
-- 6. Guaranteed insertion order maintenance for chronological data integrity
-- 7. Integration with MongoDB's replication and sharding for distributed streaming architectures
-- 8. SQL-style capped collection operations for familiar database management workflows
-- 9. Advanced performance optimization with bulk insert and streaming operation support
-- 10. Zero-maintenance circular buffer management with automatic FIFO behavior and overflow handling

Best Practices for MongoDB Capped Collections Implementation

High-Performance Streaming Architecture

Essential practices for implementing capped collections effectively in production environments:

  1. Size Planning Strategy: Plan capped collection sizes based on data velocity, retention requirements, and query patterns for optimal performance
  2. Index Optimization: Use minimal, strategic indexing that supports query patterns without impacting insert performance
  3. Tailable Cursor Management: Implement robust tailable cursor patterns for real-time data consumption with proper error handling
  4. Monitoring and Alerting: Establish comprehensive monitoring for collection capacity, insertion rates, and performance metrics
  5. Integration Patterns: Design application integration that leverages natural insertion order and circular buffer behavior
  6. Performance Baselines: Establish performance baselines for insert rates, query response times, and storage utilization

Production Deployment and Scalability

Optimize capped collections for enterprise-scale streaming requirements:

  1. Capacity Management: Implement proactive capacity monitoring with automated alerting before reaching collection limits
  2. Replication Strategy: Configure capped collections across replica sets with considerations for network bandwidth and lag
  3. Sharding Considerations: Understand sharding limitations and alternatives for capped collections in distributed deployments
  4. Backup Integration: Design backup strategies that account for circular buffer behavior and data rotation patterns
  5. Operational Procedures: Create standardized procedures for capped collection management, capacity expansion, and performance tuning
  6. Disaster Recovery: Plan for capped collection recovery scenarios with considerations for data loss tolerance and restoration priorities

Conclusion

MongoDB capped collections provide enterprise-grade circular buffer functionality that eliminates manual buffer management complexity while delivering superior performance for high-volume streaming applications. The native FIFO behavior combined with guaranteed insertion order and tailable cursor support makes capped collections ideal for logging, event streaming, metrics collection, and real-time data processing scenarios.

Key MongoDB Capped Collection benefits include:

  • Circular Buffer Management: Automatic size management with FIFO behavior eliminates manual cleanup and rotation procedures
  • Guaranteed Insertion Order: Natural insertion order maintains chronological integrity for time-series and logging applications
  • High-Performance Writes: Optimized storage patterns provide maximum throughput for append-heavy workloads
  • Real-Time Streaming: Tailable cursors enable efficient real-time data consumption with minimal latency
  • Zero Maintenance: No manual intervention required for buffer overflow management or data rotation
  • SQL Accessibility: Familiar capped collection management through SQL-style syntax and operations

Whether you're building logging systems, event streaming platforms, metrics collection infrastructure, or real-time monitoring applications, MongoDB capped collections with QueryLeaf's familiar SQL interface provide the foundation for scalable, efficient, and maintainable streaming data architectures.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB capped collections while providing SQL-familiar syntax for circular buffer management, streaming operations, and performance monitoring. Advanced capped collection patterns, tailable cursor management, and high-throughput optimization techniques are seamlessly accessible through familiar SQL constructs, making sophisticated streaming data management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's native circular buffer capabilities with SQL-style streaming operations makes it an ideal platform for applications requiring both high-performance data ingestion and familiar operational patterns, ensuring your streaming architectures can handle enterprise-scale data volumes while maintaining operational simplicity and performance excellence.

MongoDB Bulk Operations and Batch Processing: High-Performance Data Operations and Enterprise-Scale Processing Optimization

Modern applications frequently require processing large volumes of data efficiently through bulk operations, batch processing, and high-throughput data manipulation operations that can handle millions of documents while maintaining performance, consistency, and system stability. Traditional approaches to large-scale data operations often rely on individual record processing, inefficient batching strategies, or complex application-level coordination that leads to poor performance, resource contention, and scalability limitations.

MongoDB provides sophisticated bulk operation capabilities that enable high-performance batch processing, efficient data migrations, and optimized large-scale data operations with minimal overhead and maximum throughput. Unlike traditional databases that require complex stored procedures or external batch processing frameworks, MongoDB's native bulk operations offer streamlined, scalable, and efficient data processing with built-in error handling, ordering guarantees, and performance optimization.

The Traditional Batch Processing Challenge

Conventional approaches to large-scale data operations suffer from significant performance and scalability limitations:

-- Traditional PostgreSQL batch processing - inefficient and resource-intensive approaches

-- Single-record processing with significant overhead and poor performance
CREATE TABLE products_import (
    import_id BIGSERIAL PRIMARY KEY,
    product_id UUID DEFAULT gen_random_uuid(),
    product_name VARCHAR(200) NOT NULL,
    category VARCHAR(100),
    price DECIMAL(10,2) NOT NULL,
    stock_quantity INTEGER NOT NULL DEFAULT 0,
    supplier_id UUID,
    description TEXT,

    -- Import tracking and status management
    import_batch_id VARCHAR(100),
    import_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    import_status VARCHAR(50) DEFAULT 'pending',
    processing_attempts INTEGER DEFAULT 0,

    -- Validation and error tracking
    validation_errors TEXT[],
    processing_error TEXT,
    needs_review BOOLEAN DEFAULT FALSE,

    -- Performance tracking
    processing_start_time TIMESTAMP,
    processing_end_time TIMESTAMP,
    processing_duration_ms INTEGER
);

-- Inefficient single-record insert approach (extremely slow for large datasets)
DO $$
DECLARE
    product_record RECORD;
    processing_start TIMESTAMP;
    processing_end TIMESTAMP;
    error_count INTEGER := 0;
    success_count INTEGER := 0;
    batch_size INTEGER := 1000;
    current_batch INTEGER := 0;
    total_records INTEGER;
BEGIN
    -- Get total record count for progress tracking
    SELECT COUNT(*) INTO total_records FROM raw_product_data;
    RAISE NOTICE 'Processing % total records', total_records;

    -- Process each record individually (inefficient approach)
    FOR product_record IN 
        SELECT * FROM raw_product_data 
        ORDER BY import_order ASC
    LOOP
        processing_start := CURRENT_TIMESTAMP;

        BEGIN
            -- Individual record validation (repeated overhead)
            IF product_record.product_name IS NULL OR LENGTH(product_record.product_name) = 0 THEN
                RAISE EXCEPTION 'Invalid product name';
            END IF;

            IF product_record.price <= 0 THEN
                RAISE EXCEPTION 'Invalid price: %', product_record.price;
            END IF;

            -- Single record insert (high overhead per operation)
            INSERT INTO products_import (
                product_name,
                category,
                price,
                stock_quantity,
                supplier_id,
                description,
                import_batch_id,
                import_status,
                processing_start_time
            ) VALUES (
                product_record.product_name,
                product_record.category,
                product_record.price,
                product_record.stock_quantity,
                product_record.supplier_id,
                product_record.description,
                'batch_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),
                'processing',
                processing_start
            );

            processing_end := CURRENT_TIMESTAMP;

            -- Update processing time (additional overhead)
            UPDATE products_import 
            SET processing_end_time = processing_end,
                processing_duration_ms = EXTRACT(MILLISECONDS FROM processing_end - processing_start),
                import_status = 'completed'
            WHERE product_id = (SELECT product_id FROM products_import 
                              WHERE product_name = product_record.product_name 
                              ORDER BY import_timestamp DESC LIMIT 1);

            success_count := success_count + 1;

        EXCEPTION WHEN OTHERS THEN
            error_count := error_count + 1;

            -- Error logging with additional overhead
            INSERT INTO import_errors (
                import_batch_id,
                error_record_data,
                error_message,
                error_timestamp
            ) VALUES (
                'batch_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),
                row_to_json(product_record),
                SQLERRM,
                CURRENT_TIMESTAMP
            );
        END;

        -- Progress reporting overhead (every record)
        current_batch := current_batch + 1;
        IF current_batch % batch_size = 0 THEN
            RAISE NOTICE 'Processed % of % records (% success, % errors)', 
                current_batch, total_records, success_count, error_count;
        END IF;
    END LOOP;

    RAISE NOTICE 'Processing complete: % success, % errors', success_count, error_count;

END $$;

-- Batch processing with limited effectiveness and complex management
CREATE OR REPLACE FUNCTION process_product_batch(
    batch_id VARCHAR,
    batch_size INTEGER DEFAULT 1000,
    max_batches INTEGER DEFAULT 100
) 
RETURNS TABLE(
    batch_number INTEGER,
    records_processed INTEGER,
    records_success INTEGER,
    records_failed INTEGER,
    processing_time_ms INTEGER,
    total_processing_time_ms BIGINT
) AS $$
DECLARE
    current_batch INTEGER := 1;
    batch_start_time TIMESTAMP;
    batch_end_time TIMESTAMP;
    batch_processing_time INTEGER;
    total_start_time TIMESTAMP := CURRENT_TIMESTAMP;
    records_in_batch INTEGER;
    success_in_batch INTEGER;
    errors_in_batch INTEGER;

BEGIN
    -- Create batch processing table (overhead)
    CREATE TEMP TABLE IF NOT EXISTS current_batch_data AS
    SELECT * FROM raw_product_data WHERE 1=0;

    WHILE current_batch <= max_batches LOOP
        batch_start_time := CURRENT_TIMESTAMP;

        -- Clear previous batch data
        TRUNCATE current_batch_data;

        -- Load batch data (complex offset/limit approach)
        INSERT INTO current_batch_data
        SELECT *
        FROM raw_product_data
        WHERE processed = FALSE
        ORDER BY import_priority DESC, created_at ASC
        LIMIT batch_size;

        -- Check if batch has data
        SELECT COUNT(*) INTO records_in_batch FROM current_batch_data;
        EXIT WHEN records_in_batch = 0;

        success_in_batch := 0;
        errors_in_batch := 0;

        -- Process batch with individual operations (still inefficient)
        DECLARE
            batch_record RECORD;
        BEGIN
            FOR batch_record IN SELECT * FROM current_batch_data LOOP
                BEGIN
                    -- Validation logic (repeated for every record)
                    PERFORM validate_product_data(
                        batch_record.product_name,
                        batch_record.category,
                        batch_record.price,
                        batch_record.stock_quantity
                    );

                    -- Individual insert (suboptimal)
                    INSERT INTO products_import (
                        product_name,
                        category, 
                        price,
                        stock_quantity,
                        supplier_id,
                        description,
                        import_batch_id,
                        import_status
                    ) VALUES (
                        batch_record.product_name,
                        batch_record.category,
                        batch_record.price,
                        batch_record.stock_quantity,
                        batch_record.supplier_id,
                        batch_record.description,
                        batch_id,
                        'completed'
                    );

                    success_in_batch := success_in_batch + 1;

                EXCEPTION WHEN OTHERS THEN
                    errors_in_batch := errors_in_batch + 1;

                    -- Log error (additional overhead)
                    INSERT INTO batch_processing_errors (
                        batch_id,
                        batch_number,
                        record_data,
                        error_message,
                        error_timestamp
                    ) VALUES (
                        batch_id,
                        current_batch,
                        row_to_json(batch_record),
                        SQLERRM,
                        CURRENT_TIMESTAMP
                    );
                END;
            END LOOP;

        END;

        -- Mark records as processed (additional update overhead)
        UPDATE raw_product_data
        SET processed = TRUE,
            processed_batch = current_batch,
            processed_timestamp = CURRENT_TIMESTAMP
        WHERE id IN (SELECT id FROM current_batch_data);

        batch_end_time := CURRENT_TIMESTAMP;
        batch_processing_time := EXTRACT(MILLISECONDS FROM batch_end_time - batch_start_time);

        -- Return batch results
        batch_number := current_batch;
        records_processed := records_in_batch;
        records_success := success_in_batch;
        records_failed := errors_in_batch;
        processing_time_ms := batch_processing_time;
        total_processing_time_ms := EXTRACT(MILLISECONDS FROM batch_end_time - total_start_time);

        RETURN NEXT;

        current_batch := current_batch + 1;
    END LOOP;

    -- Cleanup
    DROP TABLE IF EXISTS current_batch_data;

END;
$$ LANGUAGE plpgsql;

-- Execute batch processing with limited control and monitoring
SELECT 
    bp.*,
    ROUND(bp.records_processed::NUMERIC / (bp.processing_time_ms / 1000.0), 2) as records_per_second,
    ROUND(bp.records_success::NUMERIC / bp.records_processed * 100, 2) as success_rate_percent
FROM process_product_batch('import_batch_2025', 5000, 50) bp
ORDER BY bp.batch_number;

-- Traditional approach limitations:
-- 1. Individual record processing with high per-operation overhead
-- 2. Limited batch optimization and inefficient resource utilization
-- 3. Complex error handling with poor performance during error conditions
-- 4. No built-in ordering guarantees or transaction-level consistency
-- 5. Difficult to monitor and control processing performance
-- 6. Limited scalability for very large datasets (millions of records)
-- 7. Complex progress tracking and status management overhead
-- 8. No automatic retry or recovery mechanisms for failed batches
-- 9. Inefficient memory usage and connection resource management
-- 10. Poor integration with modern distributed processing patterns

-- Complex bulk update attempt with limited effectiveness
WITH bulk_price_updates AS (
    SELECT 
        product_id,
        category,
        current_price,

        -- Calculate new prices based on complex business logic
        CASE category
            WHEN 'electronics' THEN current_price * 1.15  -- 15% increase
            WHEN 'clothing' THEN 
                CASE 
                    WHEN current_price > 100 THEN current_price * 1.10  -- 10% for high-end
                    ELSE current_price * 1.20  -- 20% for regular
                END
            WHEN 'books' THEN 
                CASE
                    WHEN stock_quantity > 50 THEN current_price * 0.95  -- 5% discount for overstocked
                    WHEN stock_quantity < 5 THEN current_price * 1.25   -- 25% increase for rare
                    ELSE current_price * 1.05  -- 5% standard increase
                END
            ELSE current_price * 1.08  -- 8% default increase
        END as new_price,

        -- Audit trail information
        'bulk_price_update_2025' as update_reason,
        CURRENT_TIMESTAMP as update_timestamp

    FROM products
    WHERE active = TRUE
    AND last_price_update < CURRENT_TIMESTAMP - INTERVAL '6 months'
),

update_validation AS (
    SELECT 
        bpu.*,

        -- Validation checks
        CASE 
            WHEN bpu.new_price <= 0 THEN 'invalid_price_zero_negative'
            WHEN bpu.new_price > bpu.current_price * 3 THEN 'price_increase_too_large'
            WHEN bpu.new_price < bpu.current_price * 0.5 THEN 'price_decrease_too_large'
            ELSE 'valid'
        END as validation_status,

        -- Price change analysis
        bpu.new_price - bpu.current_price as price_change,
        ROUND(((bpu.new_price - bpu.current_price) / bpu.current_price * 100)::NUMERIC, 2) as price_change_percent

    FROM bulk_price_updates bpu
),

validated_updates AS (
    SELECT *
    FROM update_validation
    WHERE validation_status = 'valid'
),

failed_updates AS (
    SELECT *
    FROM update_validation  
    WHERE validation_status != 'valid'
)

-- Execute bulk update (still limited by SQL constraints)
UPDATE products
SET 
    current_price = vu.new_price,
    previous_price = products.current_price,
    last_price_update = vu.update_timestamp,
    price_change_reason = vu.update_reason,
    price_change_amount = vu.price_change,
    price_change_percent = vu.price_change_percent,
    updated_at = CURRENT_TIMESTAMP
FROM validated_updates vu
WHERE products.product_id = vu.product_id;

-- Log failed updates for review
INSERT INTO price_update_errors (
    product_id,
    attempted_price,
    current_price,
    validation_error,
    error_timestamp,
    requires_manual_review
)
SELECT 
    fu.product_id,
    fu.new_price,
    fu.current_price,
    fu.validation_status,
    CURRENT_TIMESTAMP,
    TRUE
FROM failed_updates fu;

-- Limitations of traditional bulk processing:
-- 1. Limited by SQL's capabilities for complex bulk operations
-- 2. No native support for partial success handling in single operations
-- 3. Complex validation and error handling logic
-- 4. Poor performance optimization for very large datasets
-- 5. Difficult to monitor progress of long-running bulk operations
-- 6. No built-in retry mechanisms for transient failures
-- 7. Limited flexibility in operation ordering and dependency management
-- 8. Complex memory management for large batch operations
-- 9. No automatic optimization based on data distribution or system load
-- 10. Difficult integration with distributed systems and microservices

MongoDB provides sophisticated bulk operation capabilities with comprehensive optimization and error handling:

// MongoDB Advanced Bulk Operations and High-Performance Batch Processing System
const { MongoClient, BulkWriteResult } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('bulk_operations_system');

// Comprehensive MongoDB Bulk Operations Manager
class AdvancedBulkOperationsManager {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      products: db.collection('products'),
      orders: db.collection('orders'),
      customers: db.collection('customers'),
      inventory: db.collection('inventory'),
      bulkOperationLog: db.collection('bulk_operation_log'),
      bulkOperationMetrics: db.collection('bulk_operation_metrics'),
      processingQueue: db.collection('processing_queue')
    };

    // Advanced bulk operations configuration
    this.config = {
      // Batch size optimization
      defaultBatchSize: config.defaultBatchSize || 1000,
      maxBatchSize: config.maxBatchSize || 10000,
      adaptiveBatchSizing: config.adaptiveBatchSizing !== false,

      // Performance optimization
      enableOrderedOperations: config.enableOrderedOperations !== false,
      enableParallelProcessing: config.enableParallelProcessing !== false,
      maxConcurrentBatches: config.maxConcurrentBatches || 5,

      // Error handling and recovery
      enableErrorRecovery: config.enableErrorRecovery !== false,
      maxRetries: config.maxRetries || 3,
      retryDelayMs: config.retryDelayMs || 1000,

      // Monitoring and metrics
      enableMetricsCollection: config.enableMetricsCollection !== false,
      enableProgressTracking: config.enableProgressTracking !== false,
      metricsReportingInterval: config.metricsReportingInterval || 10000,

      // Memory and resource management
      enableMemoryOptimization: config.enableMemoryOptimization !== false,
      maxMemoryUsageMB: config.maxMemoryUsageMB || 1024,
      enableGarbageCollection: config.enableGarbageCollection !== false
    };

    // Operational state management
    this.operationStats = {
      totalOperations: 0,
      successfulOperations: 0,
      failedOperations: 0,
      totalBatches: 0,
      avgBatchProcessingTime: 0,
      totalProcessingTime: 0
    };

    this.activeOperations = new Map();
    this.operationQueue = [];
    this.performanceMetrics = new Map();

    console.log('Advanced Bulk Operations Manager initialized');
  }

  async initializeBulkOperationsSystem() {
    console.log('Initializing comprehensive bulk operations system...');

    try {
      // Setup indexes for performance optimization
      await this.setupPerformanceIndexes();

      // Initialize metrics collection
      await this.initializeMetricsSystem();

      // Setup operation queue for large-scale processing
      await this.initializeProcessingQueue();

      // Configure memory and resource monitoring
      await this.setupResourceMonitoring();

      console.log('Bulk operations system initialized successfully');

    } catch (error) {
      console.error('Error initializing bulk operations system:', error);
      throw error;
    }
  }

  async performAdvancedBulkInsert(collectionName, documents, options = {}) {
    const operation = {
      operationId: `bulk_insert_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_insert',
      collectionName: collectionName,
      documentsCount: documents.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk insert operation: ${operation.operationId}`);
    console.log(`Inserting ${documents.length} documents into ${collectionName}`);

    try {
      // Register operation for tracking
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare documents
      const validatedDocuments = await this.validateAndPrepareDocuments(documents, 'insert');

      // Determine optimal batch configuration
      const batchConfig = await this.optimizeBatchConfiguration(validatedDocuments, options);

      // Execute bulk insert with advanced error handling
      const result = await this.executeBulkInsert(
        this.collections[collectionName], 
        validatedDocuments, 
        batchConfig,
        operation
      );

      // Update operation status
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      // Log operation results
      await this.logBulkOperation(operation);

      // Update performance metrics
      await this.updateOperationMetrics(operation);

      console.log(`Bulk insert completed: ${operation.operationId}`);
      console.log(`Inserted ${result.insertedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk insert failed: ${operation.operationId}`, error);

      // Handle operation failure
      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      // Cleanup operation tracking
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkInsert(collection, documents, batchConfig, operation) {
    const results = {
      insertedCount: 0,
      insertedIds: [],
      errors: [],
      batches: [],
      totalBatches: Math.ceil(documents.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk insert with ${results.totalBatches} batches of size ${batchConfig.batchSize}`);

    // Process documents in optimized batches
    for (let i = 0; i < documents.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = documents.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing batch ${batchNumber}/${results.totalBatches} (${batch.length} documents)`);

        // Create bulk write operations for batch
        const bulkOps = batch.map(doc => ({
          insertOne: {
            document: {
              ...doc,
              _bulkOperationId: operation.operationId,
              _batchNumber: batchNumber,
              _insertedAt: new Date()
            }
          }
        }));

        // Execute bulk write with proper options
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          bypassDocumentValidation: false,
          ...batchConfig.bulkWriteOptions
        });

        // Process batch results
        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          documentsCount: batch.length,
          insertedCount: batchResult.insertedCount,
          processingTime: batchProcessingTime,
          insertedIds: Object.values(batchResult.insertedIds || {}),
          throughput: batch.length / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.insertedCount += batchResult.insertedCount;
        results.insertedIds.push(...batchInfo.insertedIds);

        // Update operation progress
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          documentsProcessed: i + batch.length,
          totalDocuments: documents.length,
          completionPercent: Math.round(((i + batch.length) / documents.length) * 100)
        };

        // Report progress periodically
        if (batchNumber % 10 === 0 || batchNumber === results.totalBatches) {
          console.log(`Progress: ${operation.progress.completionPercent}% (${operation.progress.documentsProcessed}/${operation.progress.totalDocuments})`);
        }

        // Adaptive batch size optimization based on performance
        if (this.config.adaptiveBatchSizing) {
          batchConfig = await this.adaptBatchSize(batchConfig, batchInfo);
        }

        // Memory pressure management
        if (this.config.enableMemoryOptimization) {
          await this.manageMemoryPressure();
        }

      } catch (batchError) {
        console.error(`Batch ${batchNumber} failed:`, batchError);

        // Handle batch-level errors
        const batchErrorInfo = {
          batchNumber: batchNumber,
          documentsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code,
            details: batchError.writeErrors || []
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);

        // Determine if operation should continue
        if (batchConfig.ordered && !batchConfig.continueOnError) {
          throw new Error(`Bulk insert failed at batch ${batchNumber}: ${batchError.message}`);
        }

        // Retry failed batch if enabled
        if (this.config.enableErrorRecovery) {
          await this.retryFailedBatch(collection, batch, batchConfig, batchNumber, operation);
        }
      }
    }

    // Calculate final metrics
    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.avgBatchProcessingTime = results.batches
      .filter(b => b.processingTime)
      .reduce((sum, b) => sum + b.processingTime, 0) / results.batches.length;
    results.overallThroughput = results.insertedCount / (results.totalProcessingTime / 1000);
    results.successRate = (results.insertedCount / documents.length) * 100;

    return results;
  }

  async performAdvancedBulkUpdate(collectionName, updates, options = {}) {
    const operation = {
      operationId: `bulk_update_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_update',
      collectionName: collectionName,
      updatesCount: updates.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk update operation: ${operation.operationId}`);
    console.log(`Updating ${updates.length} documents in ${collectionName}`);

    try {
      // Register operation for tracking
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare update operations
      const validatedUpdates = await this.validateAndPrepareUpdates(updates);

      // Optimize batch configuration for updates
      const batchConfig = await this.optimizeBatchConfiguration(validatedUpdates, options);

      // Execute bulk update operations
      const result = await this.executeBulkUpdate(
        this.collections[collectionName],
        validatedUpdates,
        batchConfig,
        operation
      );

      // Complete operation tracking
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      // Log and report results
      await this.logBulkOperation(operation);
      await this.updateOperationMetrics(operation);

      console.log(`Bulk update completed: ${operation.operationId}`);
      console.log(`Updated ${result.modifiedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk update failed: ${operation.operationId}`, error);

      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkUpdate(collection, updates, batchConfig, operation) {
    const results = {
      matchedCount: 0,
      modifiedCount: 0,
      upsertedCount: 0,
      upsertedIds: [],
      errors: [],
      batches: [],
      totalBatches: Math.ceil(updates.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk update with ${results.totalBatches} batches`);

    // Process updates in optimized batches
    for (let i = 0; i < updates.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = updates.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing update batch ${batchNumber}/${results.totalBatches} (${batch.length} operations)`);

        // Create bulk write operations
        const bulkOps = batch.map(update => {
          const updateOp = {
            filter: update.filter,
            update: {
              ...update.update,
              $set: {
                ...update.update.$set,
                _bulkOperationId: operation.operationId,
                _batchNumber: batchNumber,
                _lastUpdated: new Date()
              }
            }
          };

          if (update.upsert) {
            return {
              updateOne: {
                ...updateOp,
                upsert: true
              }
            };
          } else if (update.multi) {
            return {
              updateMany: updateOp
            };
          } else {
            return {
              updateOne: updateOp
            };
          }
        });

        // Execute bulk write
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          bypassDocumentValidation: false,
          ...batchConfig.bulkWriteOptions
        });

        // Process batch results
        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          matchedCount: batchResult.matchedCount || 0,
          modifiedCount: batchResult.modifiedCount || 0,
          upsertedCount: batchResult.upsertedCount || 0,
          processingTime: batchProcessingTime,
          throughput: batch.length / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.matchedCount += batchInfo.matchedCount;
        results.modifiedCount += batchInfo.modifiedCount;
        results.upsertedCount += batchInfo.upsertedCount;

        if (batchResult.upsertedIds) {
          results.upsertedIds.push(...Object.values(batchResult.upsertedIds));
        }

        // Update progress tracking
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          operationsProcessed: i + batch.length,
          totalOperations: updates.length,
          completionPercent: Math.round(((i + batch.length) / updates.length) * 100)
        };

        // Progress reporting
        if (batchNumber % 5 === 0 || batchNumber === results.totalBatches) {
          console.log(`Update progress: ${operation.progress.completionPercent}% (${operation.progress.operationsProcessed}/${operation.progress.totalOperations})`);
        }

      } catch (batchError) {
        console.error(`Update batch ${batchNumber} failed:`, batchError);

        const batchErrorInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code,
            writeErrors: batchError.writeErrors || []
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);

        if (batchConfig.ordered && !batchConfig.continueOnError) {
          throw new Error(`Bulk update failed at batch ${batchNumber}: ${batchError.message}`);
        }
      }
    }

    // Calculate final metrics
    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.avgBatchProcessingTime = results.batches
      .filter(b => b.processingTime)
      .reduce((sum, b) => sum + b.processingTime, 0) / results.batches.length;
    results.overallThroughput = results.modifiedCount / (results.totalProcessingTime / 1000);
    results.successRate = (results.modifiedCount / updates.length) * 100;

    return results;
  }

  async performAdvancedBulkDelete(collectionName, filters, options = {}) {
    const operation = {
      operationId: `bulk_delete_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_delete',
      collectionName: collectionName,
      filtersCount: filters.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk delete operation: ${operation.operationId}`);
    console.log(`Deleting documents with ${filters.length} filter conditions in ${collectionName}`);

    try {
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare delete operations
      const validatedFilters = await this.validateAndPrepareDeletes(filters);

      // Optimize batch configuration for deletes
      const batchConfig = await this.optimizeBatchConfiguration(validatedFilters, options);

      // Execute bulk delete operations
      const result = await this.executeBulkDelete(
        this.collections[collectionName],
        validatedFilters,
        batchConfig,
        operation
      );

      // Complete operation
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      await this.logBulkOperation(operation);
      await this.updateOperationMetrics(operation);

      console.log(`Bulk delete completed: ${operation.operationId}`);
      console.log(`Deleted ${result.deletedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk delete failed: ${operation.operationId}`, error);

      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkDelete(collection, filters, batchConfig, operation) {
    const results = {
      deletedCount: 0,
      errors: [],
      batches: [],
      totalBatches: Math.ceil(filters.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk delete with ${results.totalBatches} batches`);

    for (let i = 0; i < filters.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = filters.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing delete batch ${batchNumber}/${results.totalBatches} (${batch.length} operations)`);

        // Create bulk delete operations
        const bulkOps = batch.map(filter => ({
          deleteMany: {
            filter: filter
          }
        }));

        // Execute bulk write
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          ...batchConfig.bulkWriteOptions
        });

        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          deletedCount: batchResult.deletedCount || 0,
          processingTime: batchProcessingTime,
          throughput: (batchResult.deletedCount || 0) / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.deletedCount += batchInfo.deletedCount;

        // Update progress
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          operationsProcessed: i + batch.length,
          totalOperations: filters.length,
          completionPercent: Math.round(((i + batch.length) / filters.length) * 100)
        };

      } catch (batchError) {
        console.error(`Delete batch ${batchNumber} failed:`, batchError);

        const batchErrorInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);
      }
    }

    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.overallThroughput = results.deletedCount / (results.totalProcessingTime / 1000);

    return results;
  }

  async validateAndPrepareDocuments(documents, operationType) {
    console.log(`Validating and preparing ${documents.length} documents for ${operationType}`);

    const validatedDocuments = [];
    const validationErrors = [];

    for (let i = 0; i < documents.length; i++) {
      const doc = documents[i];

      try {
        // Basic validation
        if (!doc || typeof doc !== 'object') {
          throw new Error('Document must be a valid object');
        }

        // Add operation metadata
        const preparedDoc = {
          ...doc,
          _operationType: operationType,
          _operationTimestamp: new Date(),
          _validatedAt: new Date()
        };

        // Type-specific validation
        if (operationType === 'insert') {
          // Ensure no _id conflicts for inserts
          if (preparedDoc._id) {
            // Keep existing _id but validate it's unique
          }
        }

        validatedDocuments.push(preparedDoc);

      } catch (error) {
        validationErrors.push({
          index: i,
          document: doc,
          error: error.message
        });
      }
    }

    if (validationErrors.length > 0) {
      console.warn(`Found ${validationErrors.length} validation errors out of ${documents.length} documents`);

      // Log validation errors
      await this.collections.bulkOperationLog.insertOne({
        operationType: 'validation',
        validationErrors: validationErrors,
        timestamp: new Date()
      });
    }

    console.log(`Validation complete: ${validatedDocuments.length} valid documents`);
    return validatedDocuments;
  }

  async optimizeBatchConfiguration(data, options) {
    const dataSize = data.length;
    let optimalBatchSize = this.config.defaultBatchSize;

    // Adaptive batch size based on data volume
    if (this.config.adaptiveBatchSizing) {
      if (dataSize > 100000) {
        optimalBatchSize = Math.min(this.config.maxBatchSize, 5000);
      } else if (dataSize > 10000) {
        optimalBatchSize = 2000;
      } else if (dataSize > 1000) {
        optimalBatchSize = 1000;
      } else {
        optimalBatchSize = Math.max(100, dataSize);
      }
    }

    // Consider memory constraints
    if (this.config.enableMemoryOptimization) {
      const estimatedMemoryPerDoc = 1; // KB estimate
      const totalMemoryMB = (dataSize * estimatedMemoryPerDoc) / 1024;

      if (totalMemoryMB > this.config.maxMemoryUsageMB) {
        const memoryAdjustedBatchSize = Math.floor(
          (this.config.maxMemoryUsageMB * 1024) / estimatedMemoryPerDoc
        );
        optimalBatchSize = Math.min(optimalBatchSize, memoryAdjustedBatchSize);
      }
    }

    const batchConfig = {
      batchSize: optimalBatchSize,
      ordered: options.ordered !== false,
      continueOnError: options.continueOnError === true,
      bulkWriteOptions: {
        writeConcern: options.writeConcern || { w: 'majority' },
        ...(options.bulkWriteOptions || {})
      }
    };

    console.log(`Optimized batch configuration: size=${batchConfig.batchSize}, ordered=${batchConfig.ordered}`);
    return batchConfig;
  }

  async logBulkOperation(operation) {
    try {
      await this.collections.bulkOperationLog.insertOne({
        operationId: operation.operationId,
        operationType: operation.operationType,
        collectionName: operation.collectionName,
        status: operation.status,
        startTime: operation.startTime,
        endTime: operation.endTime,
        processingTime: operation.processingTime,
        result: operation.result,
        error: operation.error,
        progress: operation.progress,
        createdAt: new Date()
      });
    } catch (error) {
      console.warn('Error logging bulk operation:', error.message);
    }
  }

  async updateOperationMetrics(operation) {
    try {
      // Update global statistics
      this.operationStats.totalOperations++;
      if (operation.status === 'completed') {
        this.operationStats.successfulOperations++;
      } else {
        this.operationStats.failedOperations++;
      }

      if (operation.result && operation.result.batches) {
        this.operationStats.totalBatches += operation.result.batches.length;

        const avgBatchTime = operation.result.avgBatchProcessingTime;
        if (avgBatchTime) {
          this.operationStats.avgBatchProcessingTime = 
            (this.operationStats.avgBatchProcessingTime + avgBatchTime) / 2;
        }
      }

      // Store detailed metrics
      await this.collections.bulkOperationMetrics.insertOne({
        operationId: operation.operationId,
        operationType: operation.operationType,
        collectionName: operation.collectionName,
        metrics: {
          processingTime: operation.processingTime,
          throughput: operation.result ? operation.result.overallThroughput : null,
          successRate: operation.result ? operation.result.successRate : null,
          batchCount: operation.result ? operation.result.batches.length : null,
          avgBatchTime: operation.result ? operation.result.avgBatchProcessingTime : null
        },
        timestamp: new Date()
      });

    } catch (error) {
      console.warn('Error updating operation metrics:', error.message);
    }
  }

  async generateBulkOperationsReport() {
    console.log('Generating bulk operations performance report...');

    try {
      const report = {
        timestamp: new Date(),
        globalStats: { ...this.operationStats },
        activeOperations: this.activeOperations.size,

        // Recent operations analysis
        recentOperations: await this.collections.bulkOperationLog.find({
          startTime: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
        }).sort({ startTime: -1 }).limit(50).toArray(),

        // Performance metrics
        performanceMetrics: await this.collections.bulkOperationMetrics.aggregate([
          {
            $match: {
              timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
            }
          },
          {
            $group: {
              _id: '$operationType',
              count: { $sum: 1 },
              avgProcessingTime: { $avg: '$metrics.processingTime' },
              avgThroughput: { $avg: '$metrics.throughput' },
              avgSuccessRate: { $avg: '$metrics.successRate' },
              totalBatches: { $sum: '$metrics.batchCount' }
            }
          }
        ]).toArray()
      };

      // Calculate health indicators
      report.healthIndicators = {
        successRate: this.operationStats.totalOperations > 0 ? 
          (this.operationStats.successfulOperations / this.operationStats.totalOperations * 100).toFixed(2) : 0,
        avgProcessingTime: this.operationStats.avgBatchProcessingTime,
        systemLoad: this.activeOperations.size,
        status: this.activeOperations.size > 10 ? 'high_load' : 
                this.operationStats.failedOperations > this.operationStats.successfulOperations ? 'degraded' : 'healthy'
      };

      return report;

    } catch (error) {
      console.error('Error generating bulk operations report:', error);
      return {
        timestamp: new Date(),
        error: error.message,
        globalStats: this.operationStats
      };
    }
  }

  // Additional helper methods for comprehensive bulk operations management
  async setupPerformanceIndexes() {
    console.log('Setting up performance indexes for bulk operations...');

    // Index for operation logging and metrics
    await this.collections.bulkOperationLog.createIndex(
      { operationId: 1, startTime: -1 },
      { background: true }
    );

    await this.collections.bulkOperationMetrics.createIndex(
      { operationType: 1, timestamp: -1 },
      { background: true }
    );
  }

  async adaptBatchSize(currentConfig, batchInfo) {
    // Adaptive batch size optimization based on performance
    if (batchInfo.throughput < 100) { // documents per second
      currentConfig.batchSize = Math.max(100, Math.floor(currentConfig.batchSize * 0.8));
    } else if (batchInfo.throughput > 1000) {
      currentConfig.batchSize = Math.min(this.config.maxBatchSize, Math.floor(currentConfig.batchSize * 1.2));
    }

    return currentConfig;
  }

  async manageMemoryPressure() {
    if (this.config.enableGarbageCollection) {
      if (global.gc) {
        global.gc();
      }
    }
  }
}

// Benefits of MongoDB Advanced Bulk Operations:
// - Native bulk operation support with minimal overhead and maximum throughput
// - Sophisticated error handling with partial success support and retry mechanisms
// - Adaptive batch sizing and performance optimization based on data characteristics
// - Comprehensive operation tracking and monitoring with detailed metrics
// - Memory and resource management for large-scale data processing
// - Built-in transaction-level consistency and ordering guarantees
// - Flexible operation types (insert, update, delete, upsert) with advanced filtering
// - Scalable architecture supporting millions of documents efficiently
// - Integration with MongoDB's native indexing and query optimization
// - SQL-compatible bulk operations through QueryLeaf integration

module.exports = {
  AdvancedBulkOperationsManager
};

Understanding MongoDB Bulk Operations Architecture

Advanced Bulk Processing and Performance Optimization Patterns

Implement sophisticated bulk operation patterns for production MongoDB deployments:

// Enterprise-grade MongoDB bulk operations with advanced optimization
class EnterpriseBulkOperationsOrchestrator extends AdvancedBulkOperationsManager {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableDistributedProcessing: true,
      enableDataPartitioning: true,
      enableAutoSharding: true,
      enableComplianceTracking: true,
      enableAuditLogging: true
    };

    this.setupEnterpriseFeatures();
  }

  async implementDistributedBulkProcessing() {
    console.log('Implementing distributed bulk processing across shards...');

    // Advanced distributed processing configuration
    const distributedConfig = {
      shardAwareness: {
        enableShardKeyOptimization: true,
        balanceWorkloadAcrossShards: true,
        minimizeCrossShardOperations: true,
        optimizeForShardDistribution: true
      },

      parallelProcessing: {
        maxConcurrentShards: 8,
        adaptiveParallelism: true,
        loadBalancedDistribution: true,
        resourceAwareScheduling: true
      },

      consistencyManagement: {
        maintainTransactionalBoundaries: true,
        ensureShardConsistency: true,
        coordinateDistributedOperations: true,
        handlePartialFailures: true
      }
    };

    return await this.deployDistributedBulkProcessing(distributedConfig);
  }

  async setupEnterpriseComplianceFramework() {
    console.log('Setting up enterprise compliance framework...');

    const complianceConfig = {
      auditTrail: {
        comprehensiveOperationLogging: true,
        dataLineageTracking: true,
        complianceReporting: true,
        retentionPolicyEnforcement: true
      },

      securityControls: {
        operationAccessControl: true,
        dataEncryptionInTransit: true,
        auditLogEncryption: true,
        nonRepudiationSupport: true
      },

      governanceFramework: {
        operationApprovalWorkflows: true,
        dataClassificationEnforcement: true,
        regulatoryComplianceValidation: true,
        businessRuleValidation: true
      }
    };

    return await this.implementComplianceFramework(complianceConfig);
  }
}

SQL-Style Bulk Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB bulk operations and batch processing:

-- QueryLeaf advanced bulk operations with SQL-familiar syntax for MongoDB

-- Configure bulk operations with comprehensive performance optimization
CONFIGURE BULK_OPERATIONS
SET batch_size = 1000,
    max_batch_size = 10000,
    adaptive_batching = true,
    ordered_operations = true,
    parallel_processing = true,
    max_concurrent_batches = 5,
    error_recovery = true,
    metrics_collection = true;

-- Advanced bulk insert with intelligent batching and error handling
BEGIN BULK_OPERATION 'product_import_2025';

WITH product_validation AS (
  -- Comprehensive data validation and preparation
  SELECT 
    *,

    -- Data quality validation
    CASE 
      WHEN product_name IS NULL OR LENGTH(TRIM(product_name)) = 0 THEN 'invalid_name'
      WHEN category IS NULL OR LENGTH(TRIM(category)) = 0 THEN 'invalid_category'
      WHEN price IS NULL OR price <= 0 THEN 'invalid_price'
      WHEN stock_quantity IS NULL OR stock_quantity < 0 THEN 'invalid_stock'
      ELSE 'valid'
    END as validation_status,

    -- Data enrichment and standardization
    UPPER(TRIM(product_name)) as normalized_name,
    LOWER(TRIM(category)) as normalized_category,
    ROUND(price::NUMERIC, 2) as normalized_price,
    COALESCE(stock_quantity, 0) as normalized_stock,

    -- Business rule validation
    CASE 
      WHEN category = 'electronics' AND price > 10000 THEN 'requires_approval'
      WHEN stock_quantity > 1000 AND supplier_id IS NULL THEN 'requires_supplier'
      ELSE 'approved'
    END as business_validation,

    -- Generate unique identifiers and metadata
    gen_random_uuid() as product_id,
    CURRENT_TIMESTAMP as import_timestamp,
    'bulk_import_2025' as import_batch,
    ROW_NUMBER() OVER (ORDER BY product_name) as import_sequence

  FROM raw_product_import_data
  WHERE status = 'pending'
),

validated_products AS (
  SELECT *
  FROM product_validation
  WHERE validation_status = 'valid'
    AND business_validation = 'approved'
),

rejected_products AS (
  SELECT *
  FROM product_validation  
  WHERE validation_status != 'valid'
    OR business_validation != 'approved'
)

-- Execute high-performance bulk insert with advanced error handling
INSERT INTO products (
  product_id,
  product_name,
  category,
  price,
  stock_quantity,
  supplier_id,
  description,

  -- Metadata and tracking fields
  import_batch,
  import_timestamp,
  import_sequence,
  created_at,
  updated_at,

  -- Search and indexing optimization
  search_keywords,
  normalized_name,
  normalized_category
)
SELECT 
  vp.product_id,
  vp.normalized_name,
  vp.normalized_category,
  vp.normalized_price,
  vp.normalized_stock,
  vp.supplier_id,
  vp.description,

  -- Tracking information
  vp.import_batch,
  vp.import_timestamp,
  vp.import_sequence,
  vp.import_timestamp,
  vp.import_timestamp,

  -- Generated fields for optimization
  ARRAY_CAT(
    STRING_TO_ARRAY(LOWER(vp.normalized_name), ' '),
    STRING_TO_ARRAY(LOWER(vp.normalized_category), ' ')
  ) as search_keywords,
  vp.normalized_name,
  vp.normalized_category

FROM validated_products vp

-- Bulk insert configuration with advanced options
WITH BULK_OPTIONS (
  batch_size = 2000,
  ordered = true,
  continue_on_error = false,
  write_concern = '{ "w": "majority", "j": true }',
  bypass_document_validation = false,

  -- Performance optimization
  adaptive_batching = true,
  parallel_processing = true,
  memory_optimization = true,

  -- Error handling configuration
  retry_attempts = 3,
  retry_delay_ms = 1000,
  dead_letter_queue = true,

  -- Progress tracking
  progress_reporting = true,
  progress_interval = 1000,
  metrics_collection = true
);

-- Log rejected products for review and correction
INSERT INTO product_import_errors (
  import_batch,
  error_timestamp,
  validation_error,
  business_error,
  raw_data,
  requires_manual_review
)
SELECT 
  rp.import_batch,
  CURRENT_TIMESTAMP,
  rp.validation_status,
  rp.business_validation,
  ROW_TO_JSON(rp),
  true
FROM rejected_products rp;

COMMIT BULK_OPERATION;

-- Advanced bulk update with complex business logic and performance optimization
BEGIN BULK_OPERATION 'price_adjustment_2025';

WITH price_adjustment_analysis AS (
  -- Sophisticated price adjustment calculation
  SELECT 
    p.product_id,
    p.product_name,
    p.category,
    p.current_price,
    p.stock_quantity,
    p.last_price_update,
    p.supplier_id,

    -- Market analysis data
    ma.competitor_avg_price,
    ma.market_demand_score,
    ma.seasonal_factor,

    -- Inventory analysis
    CASE 
      WHEN p.stock_quantity = 0 THEN 'out_of_stock'
      WHEN p.stock_quantity < 10 THEN 'low_stock'
      WHEN p.stock_quantity > 100 THEN 'overstocked'
      ELSE 'normal_stock'
    END as stock_status,

    -- Calculate new price with complex business rules
    CASE p.category
      WHEN 'electronics' THEN
        CASE 
          WHEN ma.market_demand_score > 8 AND p.stock_quantity < 10 THEN p.current_price * 1.25
          WHEN ma.competitor_avg_price > p.current_price * 1.1 THEN p.current_price * 1.15
          WHEN p.stock_quantity > 100 THEN p.current_price * 0.90
          ELSE p.current_price * (1 + (ma.seasonal_factor * 0.1))
        END
      WHEN 'clothing' THEN
        CASE 
          WHEN ma.seasonal_factor > 1.2 THEN p.current_price * 1.20
          WHEN p.stock_quantity > 50 THEN p.current_price * 0.85
          WHEN ma.market_demand_score > 7 THEN p.current_price * 1.10
          ELSE p.current_price * 1.05
        END
      WHEN 'books' THEN
        CASE 
          WHEN p.stock_quantity > 200 THEN p.current_price * 0.75
          WHEN ma.market_demand_score > 9 THEN p.current_price * 1.15
          ELSE p.current_price * 1.02
        END
      ELSE p.current_price * (1 + LEAST(0.15, ma.market_demand_score * 0.02))
    END as calculated_new_price,

    -- Adjustment metadata
    'market_analysis_2025' as adjustment_reason,
    CURRENT_TIMESTAMP as adjustment_timestamp

  FROM products p
  LEFT JOIN market_analysis ma ON p.product_id = ma.product_id
  WHERE p.active = true
    AND p.last_price_update < CURRENT_TIMESTAMP - INTERVAL '3 months'
    AND ma.analysis_date >= CURRENT_DATE - INTERVAL '7 days'
),

validated_price_adjustments AS (
  SELECT 
    paa.*,

    -- Price change validation
    paa.calculated_new_price - paa.current_price as price_change,

    ROUND(
      ((paa.calculated_new_price - paa.current_price) / paa.current_price * 100)::NUMERIC, 
      2
    ) as price_change_percent,

    -- Validation rules
    CASE 
      WHEN paa.calculated_new_price <= 0 THEN 'invalid_negative_price'
      WHEN ABS(paa.calculated_new_price - paa.current_price) / paa.current_price > 0.5 THEN 'change_too_large'
      WHEN paa.calculated_new_price = paa.current_price THEN 'no_change_needed'
      ELSE 'valid'
    END as price_validation,

    -- Business impact assessment
    CASE 
      WHEN ABS(paa.calculated_new_price - paa.current_price) > 100 THEN 'high_impact'
      WHEN ABS(paa.calculated_new_price - paa.current_price) > 20 THEN 'medium_impact'
      ELSE 'low_impact'
    END as business_impact

  FROM price_adjustment_analysis paa
),

approved_adjustments AS (
  SELECT *
  FROM validated_price_adjustments
  WHERE price_validation = 'valid'
    AND (business_impact != 'high_impact' OR market_demand_score > 8)
)

-- Execute bulk update with comprehensive tracking and optimization
UPDATE products 
SET 
  current_price = aa.calculated_new_price,
  previous_price = products.current_price,
  last_price_update = aa.adjustment_timestamp,
  price_change_amount = aa.price_change,
  price_change_percent = aa.price_change_percent,
  price_adjustment_reason = aa.adjustment_reason,

  -- Update metadata
  updated_at = aa.adjustment_timestamp,
  version = products.version + 1,

  -- Search index optimization
  price_tier = CASE 
    WHEN aa.calculated_new_price < 25 THEN 'budget'
    WHEN aa.calculated_new_price < 100 THEN 'mid_range'
    WHEN aa.calculated_new_price < 500 THEN 'premium'
    ELSE 'luxury'
  END,

  -- Business intelligence fields
  last_market_analysis = aa.adjustment_timestamp,
  stock_price_ratio = aa.calculated_new_price / GREATEST(aa.stock_quantity, 1),
  competitive_position = CASE 
    WHEN aa.competitor_avg_price > 0 THEN
      CASE 
        WHEN aa.calculated_new_price < aa.competitor_avg_price * 0.9 THEN 'price_leader'
        WHEN aa.calculated_new_price > aa.competitor_avg_price * 1.1 THEN 'premium_positioned'
        ELSE 'market_aligned'
      END
    ELSE 'no_competition_data'
  END

FROM approved_adjustments aa
WHERE products.product_id = aa.product_id

-- Bulk update configuration
WITH BULK_OPTIONS (
  batch_size = 1500,
  ordered = false,  -- Allow parallel processing for updates
  continue_on_error = true,
  write_concern = '{ "w": "majority" }',

  -- Performance optimization for updates
  adaptive_batching = true,
  parallel_processing = true,
  max_concurrent_batches = 8,

  -- Update-specific optimizations
  minimize_index_updates = true,
  batch_index_updates = true,
  optimize_for_throughput = true,

  -- Progress and monitoring
  progress_reporting = true,
  progress_interval = 500,
  operation_timeout_ms = 300000  -- 5 minutes
);

-- Create price adjustment audit trail
INSERT INTO price_adjustment_audit (
  adjustment_batch,
  product_id,
  old_price,
  new_price,
  price_change,
  price_change_percent,
  adjustment_reason,
  business_impact,
  market_data_used,
  adjustment_timestamp,
  approved_by
)
SELECT 
  'bulk_adjustment_2025',
  aa.product_id,
  aa.current_price,
  aa.calculated_new_price,
  aa.price_change,
  aa.price_change_percent,
  aa.adjustment_reason,
  aa.business_impact,
  JSON_OBJECT(
    'competitor_avg_price', aa.competitor_avg_price,
    'market_demand_score', aa.market_demand_score,
    'seasonal_factor', aa.seasonal_factor,
    'stock_status', aa.stock_status
  ),
  aa.adjustment_timestamp,
  'automated_system'
FROM approved_adjustments aa;

COMMIT BULK_OPERATION;

-- Advanced bulk delete with safety checks and cascade handling
BEGIN BULK_OPERATION 'product_cleanup_2025';

WITH deletion_analysis AS (
  -- Identify products for deletion with comprehensive safety checks
  SELECT 
    p.product_id,
    p.product_name,
    p.category,
    p.stock_quantity,
    p.last_sale_date,
    p.created_at,
    p.supplier_id,

    -- Dependency analysis
    (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.product_id) as order_references,
    (SELECT COUNT(*) FROM shopping_cart_items sci WHERE sci.product_id = p.product_id) as cart_references,
    (SELECT COUNT(*) FROM product_reviews pr WHERE pr.product_id = p.product_id) as review_count,
    (SELECT COUNT(*) FROM wishlist_items wi WHERE wi.product_id = p.product_id) as wishlist_references,

    -- Business impact assessment
    COALESCE(p.total_sales_amount, 0) as lifetime_sales,
    COALESCE(p.total_units_sold, 0) as lifetime_units_sold,

    -- Deletion criteria evaluation
    CASE 
      WHEN p.status = 'discontinued' 
       AND p.stock_quantity = 0 
       AND (p.last_sale_date IS NULL OR p.last_sale_date < CURRENT_DATE - INTERVAL '2 years')
       THEN 'eligible_discontinued'

      WHEN p.created_at < CURRENT_DATE - INTERVAL '5 years'
       AND COALESCE(p.total_units_sold, 0) = 0
       AND p.stock_quantity = 0
       THEN 'eligible_never_sold'

      WHEN p.status = 'draft'
       AND p.created_at < CURRENT_DATE - INTERVAL '1 year'
       AND p.stock_quantity = 0
       THEN 'eligible_old_draft'

      ELSE 'not_eligible'
    END as deletion_eligibility,

    -- Safety check results
    CASE 
      WHEN (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.product_id) > 0 THEN 'has_order_references'
      WHEN (SELECT COUNT(*) FROM shopping_cart_items sci WHERE sci.product_id = p.product_id) > 0 THEN 'has_cart_references'
      WHEN p.stock_quantity > 0 THEN 'has_inventory'
      WHEN p.status = 'active' THEN 'still_active'
      ELSE 'safe_to_delete'
    END as safety_check

  FROM products p
  WHERE p.status IN ('discontinued', 'draft', 'inactive')
),

safe_deletions AS (
  SELECT *
  FROM deletion_analysis
  WHERE deletion_eligibility != 'not_eligible'
    AND safety_check = 'safe_to_delete'
    AND order_references = 0
    AND cart_references = 0
),

cascade_cleanup_required AS (
  SELECT 
    sd.*,
    ARRAY[
      CASE WHEN sd.review_count > 0 THEN 'product_reviews' END,
      CASE WHEN sd.wishlist_references > 0 THEN 'wishlist_items' END
    ]::TEXT[] as cascade_tables
  FROM safe_deletions sd
  WHERE sd.review_count > 0 OR sd.wishlist_references > 0
)

-- Archive products before deletion
INSERT INTO archived_products
SELECT 
  p.*,
  sd.deletion_eligibility as archive_reason,
  CURRENT_TIMESTAMP as archived_at,
  'bulk_cleanup_2025' as archive_batch
FROM products p
JOIN safe_deletions sd ON p.product_id = sd.product_id;

-- Execute cascade deletions first
DELETE FROM product_reviews 
WHERE product_id IN (
  SELECT product_id FROM cascade_cleanup_required 
  WHERE 'product_reviews' = ANY(cascade_tables)
)
WITH BULK_OPTIONS (
  batch_size = 500,
  continue_on_error = true,
  ordered = false
);

DELETE FROM wishlist_items
WHERE product_id IN (
  SELECT product_id FROM cascade_cleanup_required
  WHERE 'wishlist_items' = ANY(cascade_tables)  
)
WITH BULK_OPTIONS (
  batch_size = 500,
  continue_on_error = true,
  ordered = false
);

-- Execute main product deletion
DELETE FROM products 
WHERE product_id IN (
  SELECT product_id FROM safe_deletions
)
WITH BULK_OPTIONS (
  batch_size = 1000,
  continue_on_error = false,  -- Fail fast for main deletions
  ordered = false,

  -- Deletion-specific optimizations
  optimize_for_throughput = true,
  minimal_logging = false,  -- Keep full audit trail

  -- Safety configurations
  max_deletion_rate = 100,  -- Max deletions per second
  safety_checks = true,
  confirm_deletion_count = true
);

-- Log deletion operation results
INSERT INTO bulk_operation_audit (
  operation_type,
  operation_batch,
  collection_name,
  records_processed,
  records_affected,
  operation_timestamp,
  operation_metadata
)
SELECT 
  'bulk_delete',
  'product_cleanup_2025', 
  'products',
  (SELECT COUNT(*) FROM safe_deletions),
  @@ROWCOUNT,  -- Actual deleted count
  CURRENT_TIMESTAMP,
  JSON_OBJECT(
    'deletion_criteria', 'discontinued_and_never_sold',
    'safety_checks_passed', true,
    'cascade_cleanup_performed', true,
    'products_archived', true
  );

COMMIT BULK_OPERATION;

-- Comprehensive bulk operations monitoring and analysis
WITH bulk_operation_analytics AS (
  SELECT 
    DATE_TRUNC('hour', operation_timestamp) as time_bucket,
    operation_type,
    collection_name,

    -- Volume metrics
    COUNT(*) as operation_count,
    SUM(records_processed) as total_records_processed,
    SUM(records_affected) as total_records_affected,

    -- Performance metrics  
    AVG(processing_time_ms) as avg_processing_time_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY processing_time_ms) as p95_processing_time_ms,
    AVG(throughput_records_per_second) as avg_throughput,

    -- Success metrics
    COUNT(*) FILTER (WHERE status = 'completed') as successful_operations,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_operations,
    COUNT(*) FILTER (WHERE status = 'partial_success') as partial_success_operations,

    -- Resource utilization
    AVG(batch_count) as avg_batches_per_operation,
    AVG(memory_usage_mb) as avg_memory_usage_mb,
    AVG(cpu_usage_percent) as avg_cpu_usage_percent,

    -- Error analysis
    SUM(retry_attempts) as total_retry_attempts,
    COUNT(*) FILTER (WHERE error_type IS NOT NULL) as operations_with_errors

  FROM bulk_operation_log
  WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY DATE_TRUNC('hour', operation_timestamp), operation_type, collection_name
),

performance_trends AS (
  SELECT 
    operation_type,
    collection_name,

    -- Trend analysis
    AVG(avg_processing_time_ms) as overall_avg_processing_time,
    STDDEV(avg_processing_time_ms) as processing_time_variability,
    AVG(avg_throughput) as overall_avg_throughput,

    -- Capacity analysis
    MAX(total_records_processed) as max_records_in_hour,
    AVG(avg_memory_usage_mb) as typical_memory_usage,
    MAX(avg_memory_usage_mb) as peak_memory_usage,

    -- Reliability metrics
    ROUND(
      (SUM(successful_operations)::FLOAT / 
       NULLIF(SUM(operation_count), 0)) * 100, 
      2
    ) as success_rate_percent,

    SUM(total_retry_attempts) as total_retries,
    SUM(operations_with_errors) as error_count

  FROM bulk_operation_analytics
  GROUP BY operation_type, collection_name
)

SELECT 
  boa.time_bucket,
  boa.operation_type,
  boa.collection_name,

  -- Current period metrics
  boa.operation_count,
  boa.total_records_processed,
  boa.total_records_affected,

  -- Performance indicators
  ROUND(boa.avg_processing_time_ms::NUMERIC, 2) as avg_processing_time_ms,
  ROUND(boa.p95_processing_time_ms::NUMERIC, 2) as p95_processing_time_ms,
  ROUND(boa.avg_throughput::NUMERIC, 2) as avg_throughput_rps,

  -- Success metrics
  boa.successful_operations,
  boa.failed_operations,
  boa.partial_success_operations,
  ROUND(
    (boa.successful_operations::FLOAT / 
     NULLIF(boa.operation_count, 0)) * 100,
    2
  ) as success_rate_percent,

  -- Resource utilization
  ROUND(boa.avg_batches_per_operation::NUMERIC, 1) as avg_batches_per_operation,
  ROUND(boa.avg_memory_usage_mb::NUMERIC, 2) as avg_memory_usage_mb,
  ROUND(boa.avg_cpu_usage_percent::NUMERIC, 1) as avg_cpu_usage_percent,

  -- Performance comparison with trends
  pt.overall_avg_processing_time,
  pt.overall_avg_throughput,
  pt.success_rate_percent as historical_success_rate,

  -- Performance indicators
  CASE 
    WHEN boa.avg_processing_time_ms > pt.overall_avg_processing_time * 1.5 THEN 'degraded'
    WHEN boa.avg_processing_time_ms < pt.overall_avg_processing_time * 0.8 THEN 'improved'
    ELSE 'stable'
  END as performance_trend,

  -- Health status
  CASE 
    WHEN boa.failed_operations > boa.successful_operations THEN 'unhealthy'
    WHEN boa.avg_processing_time_ms > 60000 THEN 'slow'  -- > 1 minute
    WHEN boa.avg_throughput < 10 THEN 'low_throughput'
    WHEN (boa.successful_operations::FLOAT / NULLIF(boa.operation_count, 0)) < 0.95 THEN 'unreliable'
    ELSE 'healthy'
  END as health_status,

  -- Optimization recommendations
  ARRAY[
    CASE WHEN boa.avg_processing_time_ms > 30000 THEN 'Consider increasing batch size' END,
    CASE WHEN boa.avg_memory_usage_mb > 1024 THEN 'Monitor memory usage' END,
    CASE WHEN boa.total_retry_attempts > 0 THEN 'Investigate retry causes' END,
    CASE WHEN boa.avg_throughput < pt.overall_avg_throughput * 0.8 THEN 'Performance degradation detected' END
  ]::TEXT[] as recommendations

FROM bulk_operation_analytics boa
LEFT JOIN performance_trends pt ON 
  boa.operation_type = pt.operation_type AND 
  boa.collection_name = pt.collection_name
ORDER BY boa.time_bucket DESC, boa.operation_type, boa.collection_name;

-- Real-time bulk operations dashboard
CREATE VIEW bulk_operations_dashboard AS
WITH current_operations AS (
  SELECT 
    COUNT(*) as active_operations,
    SUM(CASE WHEN status = 'processing' THEN 1 ELSE 0 END) as processing_operations,
    SUM(CASE WHEN status = 'queued' THEN 1 ELSE 0 END) as queued_operations,
    AVG(progress_percent) as avg_progress_percent
  FROM active_bulk_operations
),

recent_performance AS (
  SELECT 
    COUNT(*) as operations_last_hour,
    AVG(processing_time_ms) as avg_processing_time_last_hour,
    AVG(throughput_records_per_second) as avg_throughput_last_hour,
    COUNT(*) FILTER (WHERE status = 'completed') as successful_operations_last_hour,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_operations_last_hour
  FROM bulk_operation_log
  WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
),

system_health AS (
  SELECT 
    CASE 
      WHEN co.processing_operations > 10 THEN 'high_load'
      WHEN co.queued_operations > 20 THEN 'queue_backlog'
      WHEN rp.failed_operations_last_hour > rp.successful_operations_last_hour THEN 'high_error_rate'
      WHEN rp.avg_processing_time_last_hour > 120000 THEN 'slow_performance'  -- > 2 minutes
      ELSE 'healthy'
    END as overall_status,

    co.active_operations,
    co.processing_operations,
    co.queued_operations,
    ROUND(co.avg_progress_percent::NUMERIC, 1) as avg_progress_percent,

    rp.operations_last_hour,
    ROUND(rp.avg_processing_time_last_hour::NUMERIC, 2) as avg_processing_time_ms,
    ROUND(rp.avg_throughput_last_hour::NUMERIC, 2) as avg_throughput_rps,
    rp.successful_operations_last_hour,
    rp.failed_operations_last_hour,

    CASE 
      WHEN rp.operations_last_hour > 0 THEN
        ROUND((rp.successful_operations_last_hour::FLOAT / rp.operations_last_hour * 100)::NUMERIC, 2)
      ELSE 0
    END as success_rate_last_hour

  FROM current_operations co
  CROSS JOIN recent_performance rp
)

SELECT 
  CURRENT_TIMESTAMP as dashboard_time,
  sh.overall_status,
  sh.active_operations,
  sh.processing_operations,
  sh.queued_operations,
  sh.avg_progress_percent,
  sh.operations_last_hour,
  sh.avg_processing_time_ms,
  sh.avg_throughput_rps,
  sh.successful_operations_last_hour,
  sh.failed_operations_last_hour,
  sh.success_rate_last_hour,

  -- Alert conditions
  ARRAY[
    CASE WHEN sh.processing_operations > 15 THEN 'High number of concurrent operations' END,
    CASE WHEN sh.queued_operations > 25 THEN 'Large operation queue detected' END,  
    CASE WHEN sh.success_rate_last_hour < 90 THEN 'Low success rate detected' END,
    CASE WHEN sh.avg_processing_time_ms > 180000 THEN 'Slow processing times detected' END
  ]::TEXT[] as current_alerts,

  -- Capacity indicators
  CASE 
    WHEN sh.active_operations > 20 THEN 'at_capacity'
    WHEN sh.active_operations > 10 THEN 'high_utilization'
    ELSE 'normal_capacity'
  END as capacity_status

FROM system_health sh;

-- QueryLeaf provides comprehensive MongoDB bulk operations capabilities:
-- 1. SQL-familiar syntax for complex bulk operations with advanced batching
-- 2. Intelligent performance optimization with adaptive batch sizing
-- 3. Comprehensive error handling and recovery mechanisms
-- 4. Real-time progress tracking and monitoring capabilities
-- 5. Advanced data validation and business rule enforcement
-- 6. Enterprise-grade audit trails and compliance logging
-- 7. Memory and resource management for large-scale operations
-- 8. Integration with MongoDB's native bulk operation optimizations
-- 9. Sophisticated cascade handling and dependency management
-- 10. Production-ready monitoring and alerting with health indicators

Best Practices for Production Bulk Operations

Bulk Operations Strategy Design

Essential principles for effective MongoDB bulk operations deployment:

  1. Batch Size Optimization: Configure adaptive batch sizing based on data characteristics, system resources, and performance requirements
  2. Error Handling Strategy: Implement comprehensive error recovery with retry logic, partial success handling, and dead letter queue management
  3. Resource Management: Monitor memory usage, connection pooling, and system resources during large-scale bulk operations
  4. Performance Monitoring: Track throughput, latency, and success rates with real-time alerting for performance degradation
  5. Data Validation: Implement robust validation pipelines that catch errors early and minimize processing overhead
  6. Transaction Management: Design bulk operations with appropriate consistency guarantees and transaction boundaries

Enterprise Bulk Processing Optimization

Optimize bulk operations for production enterprise environments:

  1. Distributed Processing: Implement shard-aware bulk operations that optimize workload distribution across MongoDB clusters
  2. Compliance Integration: Ensure bulk operations meet audit requirements with comprehensive logging and data lineage tracking
  3. Capacity Planning: Design bulk processing systems that can scale with data volume growth and peak processing requirements
  4. Security Controls: Implement access controls, encryption, and security monitoring for bulk data operations
  5. Operational Integration: Integrate bulk operations with monitoring, alerting, and incident response workflows
  6. Cost Optimization: Monitor and optimize resource usage for efficient bulk processing operations

Conclusion

MongoDB bulk operations provide sophisticated capabilities for high-performance batch processing, data migrations, and large-scale data operations that eliminate the complexity and performance limitations of traditional individual record processing approaches. Native bulk write operations offer scalable, efficient, and reliable data processing with comprehensive error handling and performance optimization.

Key MongoDB bulk operations benefits include:

  • High-Performance Processing: Native bulk operations with minimal overhead and maximum throughput for millions of documents
  • Advanced Error Handling: Sophisticated error recovery with partial success support and comprehensive retry mechanisms
  • Intelligent Optimization: Adaptive batch sizing and performance optimization based on data characteristics and system resources
  • Comprehensive Monitoring: Real-time operation tracking with detailed metrics and health indicators
  • Enterprise Scalability: Production-ready bulk processing that scales efficiently with data volume and system complexity
  • SQL Accessibility: Familiar SQL-style bulk operations through QueryLeaf for accessible high-performance data processing

Whether you're performing data migrations, batch updates, large-scale imports, or complex data transformations, MongoDB bulk operations with QueryLeaf's familiar SQL interface provide the foundation for reliable, efficient, and scalable high-performance data processing.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style bulk operations into MongoDB's native bulk write operations, making high-performance batch processing accessible to SQL-oriented development teams. Complex validation pipelines, error handling strategies, and performance optimizations are seamlessly handled through familiar SQL constructs, enabling sophisticated bulk data operations without requiring deep MongoDB bulk processing expertise.

The combination of MongoDB's robust bulk operation capabilities with SQL-style batch processing syntax makes it an ideal platform for applications requiring both high-performance data operations and familiar database management patterns, ensuring your bulk processing workflows can handle enterprise-scale data volumes while maintaining reliability and performance as your systems grow and evolve.

MongoDB Transactions and ACID Properties: Distributed Systems Consistency and Multi-Document Operations

Modern applications require transactional consistency across multiple operations to maintain data integrity, ensure business rule enforcement, and provide reliable state management in distributed environments. Traditional databases provide ACID transaction support, but scaling these capabilities across distributed systems introduces complexity in maintaining consistency while preserving performance and availability across multiple nodes and data centers.

MongoDB transactions provide comprehensive ACID properties with multi-document operations, distributed consistency guarantees, and session-based transaction management designed for modern distributed applications. Unlike traditional databases that struggle with distributed transactions, MongoDB's transaction implementation leverages replica sets and sharded clusters to provide enterprise-grade consistency while maintaining the flexibility and scalability of document-based data models.

The Traditional Transaction Management Challenge

Implementing consistent multi-table operations in traditional databases requires complex transaction coordination:

-- Traditional PostgreSQL transactions - complex multi-table coordination with limitations

-- Begin transaction for order processing workflow
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Order processing with inventory management
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL,
    order_number VARCHAR(50) UNIQUE NOT NULL,
    order_status VARCHAR(20) NOT NULL DEFAULT 'pending',
    order_total DECIMAL(15,2) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Payment information
    payment_method VARCHAR(50),
    payment_status VARCHAR(20) DEFAULT 'pending',
    payment_reference VARCHAR(100),
    payment_amount DECIMAL(15,2),

    -- Shipping information
    shipping_address JSONB,
    shipping_method VARCHAR(50),
    shipping_cost DECIMAL(10,2),
    estimated_delivery DATE,

    -- Business metadata
    sales_channel VARCHAR(50),
    promotions_applied JSONB,
    order_notes TEXT,

    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Order items with inventory tracking
CREATE TABLE order_items (
    item_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL,
    product_id UUID NOT NULL,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10,2) NOT NULL,
    line_total DECIMAL(15,2) NOT NULL,

    -- Product details snapshot
    product_sku VARCHAR(100),
    product_name VARCHAR(500),
    product_variant JSONB,

    -- Inventory management
    reserved_inventory BOOLEAN DEFAULT FALSE,
    reservation_id UUID,
    inventory_location VARCHAR(100),

    -- Pricing and promotions
    original_price DECIMAL(10,2),
    discount_amount DECIMAL(10,2) DEFAULT 0,
    tax_amount DECIMAL(10,2) DEFAULT 0,

    FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Inventory management table
CREATE TABLE inventory (
    inventory_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL,
    location_id VARCHAR(100) NOT NULL,
    available_quantity INTEGER NOT NULL CHECK (available_quantity >= 0),
    reserved_quantity INTEGER NOT NULL DEFAULT 0,
    total_quantity INTEGER GENERATED ALWAYS AS (available_quantity + reserved_quantity) STORED,

    -- Stock management
    reorder_point INTEGER DEFAULT 10,
    reorder_quantity INTEGER DEFAULT 50,
    last_restocked TIMESTAMP,

    -- Cost and valuation
    unit_cost DECIMAL(10,2),
    total_cost DECIMAL(15,2) GENERATED ALWAYS AS (total_quantity * unit_cost) STORED,

    -- Tracking
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (product_id) REFERENCES products(product_id),
    UNIQUE(product_id, location_id)
);

-- Payment transactions
CREATE TABLE payments (
    payment_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL,
    payment_method VARCHAR(50) NOT NULL,
    payment_amount DECIMAL(15,2) NOT NULL,
    payment_status VARCHAR(20) NOT NULL DEFAULT 'pending',

    -- Payment processing details
    payment_processor VARCHAR(50),
    processor_transaction_id VARCHAR(200),
    processor_response JSONB,

    -- Authorization and capture
    authorization_code VARCHAR(50),
    authorization_amount DECIMAL(15,2),
    capture_amount DECIMAL(15,2),
    refund_amount DECIMAL(15,2) DEFAULT 0,

    -- Timing
    authorized_at TIMESTAMP,
    captured_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

-- Complex transaction procedure for order processing
CREATE OR REPLACE FUNCTION process_customer_order(
    p_customer_id UUID,
    p_order_items JSONB,
    p_payment_info JSONB,
    p_shipping_info JSONB
) RETURNS TABLE(
    order_id UUID,
    order_number VARCHAR,
    total_amount DECIMAL,
    payment_status VARCHAR,
    inventory_status VARCHAR,
    success BOOLEAN,
    error_message TEXT
) AS $$
DECLARE
    v_order_id UUID;
    v_order_number VARCHAR(50);
    v_order_total DECIMAL(15,2) := 0;
    v_item JSONB;
    v_product_id UUID;
    v_quantity INTEGER;
    v_unit_price DECIMAL(10,2);
    v_available_inventory INTEGER;
    v_payment_id UUID;
    v_authorization_result JSONB;
    v_error_occurred BOOLEAN := FALSE;
    v_error_message TEXT := '';

BEGIN
    -- Generate order number and ID
    v_order_id := gen_random_uuid();
    v_order_number := 'ORD-' || to_char(CURRENT_TIMESTAMP, 'YYYYMMDD') || '-' || 
                      to_char(extract(epoch from CURRENT_TIMESTAMP)::integer % 10000, 'FM0000');

    -- Validate customer exists
    IF NOT EXISTS (SELECT 1 FROM customers WHERE customer_id = p_customer_id) THEN
        RETURN QUERY SELECT v_order_id, v_order_number, 0::DECIMAL(15,2), 'failed'::VARCHAR, 
                           'validation_failed'::VARCHAR, FALSE, 'Customer not found'::TEXT;
        RETURN;
    END IF;

    -- Start order processing transaction
    SAVEPOINT order_processing_start;

    BEGIN
        -- Validate and reserve inventory for each item
        FOR v_item IN SELECT * FROM jsonb_array_elements(p_order_items)
        LOOP
            v_product_id := (v_item->>'product_id')::UUID;
            v_quantity := (v_item->>'quantity')::INTEGER;
            v_unit_price := (v_item->>'unit_price')::DECIMAL(10,2);

            -- Check product exists and is active
            IF NOT EXISTS (
                SELECT 1 FROM products 
                WHERE product_id = v_product_id 
                AND status = 'active'
            ) THEN
                v_error_occurred := TRUE;
                v_error_message := 'Product ' || v_product_id || ' not found or inactive';
                EXIT;
            END IF;

            -- Check inventory availability with row-level locking
            SELECT available_quantity INTO v_available_inventory
            FROM inventory 
            WHERE product_id = v_product_id 
            AND location_id = 'main_warehouse'
            FOR UPDATE; -- Lock inventory record

            IF v_available_inventory < v_quantity THEN
                v_error_occurred := TRUE;
                v_error_message := 'Insufficient inventory for product ' || v_product_id || 
                                  ': requested ' || v_quantity || ', available ' || v_available_inventory;
                EXIT;
            END IF;

            -- Reserve inventory
            UPDATE inventory 
            SET available_quantity = available_quantity - v_quantity,
                reserved_quantity = reserved_quantity + v_quantity,
                updated_at = CURRENT_TIMESTAMP
            WHERE product_id = v_product_id 
            AND location_id = 'main_warehouse';

            v_order_total := v_order_total + (v_quantity * v_unit_price);
        END LOOP;

        -- If inventory validation failed, rollback and return error
        IF v_error_occurred THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'inventory_insufficient'::VARCHAR, FALSE, v_error_message;
            RETURN;
        END IF;

        -- Create order record
        INSERT INTO orders (
            order_id, customer_id, order_number, order_status, order_total,
            payment_method, shipping_address, shipping_method, shipping_cost,
            sales_channel, order_notes
        ) VALUES (
            v_order_id, p_customer_id, v_order_number, 'pending', v_order_total,
            p_payment_info->>'method',
            p_shipping_info->'address',
            p_shipping_info->>'method',
            (p_shipping_info->>'cost')::DECIMAL(10,2),
            'web',
            'Order processed via transaction system'
        );

        -- Create order items
        FOR v_item IN SELECT * FROM jsonb_array_elements(p_order_items)
        LOOP
            INSERT INTO order_items (
                order_id, product_id, quantity, unit_price, line_total,
                product_sku, product_name, reserved_inventory, inventory_location
            ) 
            SELECT 
                v_order_id,
                (v_item->>'product_id')::UUID,
                (v_item->>'quantity')::INTEGER,
                (v_item->>'unit_price')::DECIMAL(10,2),
                (v_item->>'quantity')::INTEGER * (v_item->>'unit_price')::DECIMAL(10,2),
                p.sku,
                p.name,
                TRUE,
                'main_warehouse'
            FROM products p 
            WHERE p.product_id = (v_item->>'product_id')::UUID;
        END LOOP;

        -- Process payment authorization
        INSERT INTO payments (
            payment_id, order_id, payment_method, payment_amount, payment_status,
            payment_processor, authorization_amount
        ) VALUES (
            gen_random_uuid(), v_order_id, 
            p_payment_info->>'method',
            v_order_total,
            'authorizing',
            p_payment_info->>'processor',
            v_order_total
        ) RETURNING payment_id INTO v_payment_id;

        -- Simulate payment processing (in real system would call external API)
        -- This creates a critical point where external system coordination is required
        IF (p_payment_info->>'test_mode')::BOOLEAN = TRUE THEN
            -- Simulate successful authorization for testing
            UPDATE payments 
            SET payment_status = 'authorized',
                authorization_code = 'TEST_AUTH_' || extract(epoch from CURRENT_TIMESTAMP)::text,
                authorized_at = CURRENT_TIMESTAMP,
                processor_response = jsonb_build_object(
                    'status', 'approved',
                    'auth_code', 'TEST_AUTH_CODE',
                    'processor_ref', 'TEST_REF_' || v_payment_id,
                    'processed_at', CURRENT_TIMESTAMP
                )
            WHERE payment_id = v_payment_id;

            -- Update order status
            UPDATE orders 
            SET payment_status = 'authorized',
                order_status = 'confirmed',
                updated_at = CURRENT_TIMESTAMP
            WHERE order_id = v_order_id;

        ELSE
            -- In production, this would require external payment processor integration
            -- which introduces distributed transaction complexity and potential failures
            v_error_occurred := TRUE;
            v_error_message := 'Payment processing not available in non-test mode';
        END IF;

        -- Final validation and commit preparation
        IF v_error_occurred THEN
            ROLLBACK TO SAVEPOINT order_processing_start;

            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'payment_failed'::VARCHAR, FALSE, v_error_message;
            RETURN;
        END IF;

        -- Success case
        COMMIT;

        RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'authorized'::VARCHAR,
                           'reserved'::VARCHAR, TRUE, 'Order processed successfully'::TEXT;

    EXCEPTION
        WHEN serialization_failure THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'serialization_error'::VARCHAR, FALSE, 'Transaction serialization failed'::TEXT;

        WHEN deadlock_detected THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'deadlock_error'::VARCHAR, FALSE, 'Deadlock detected during processing'::TEXT;

        WHEN OTHERS THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'system_error'::VARCHAR, FALSE, SQLERRM::TEXT;
    END;
END;
$$ LANGUAGE plpgsql;

-- Test the complex transaction workflow
SELECT * FROM process_customer_order(
    'customer-uuid-123'::UUID,
    '[
        {"product_id": "product-uuid-1", "quantity": 2, "unit_price": 29.99},
        {"product_id": "product-uuid-2", "quantity": 1, "unit_price": 149.99}
    ]'::JSONB,
    '{"method": "credit_card", "processor": "stripe", "test_mode": true}'::JSONB,
    '{"method": "standard", "cost": 9.99, "address": {"street": "123 Main St", "city": "Boston", "state": "MA"}}'::JSONB
);

-- Monitor transaction performance and conflicts
WITH transaction_analysis AS (
    SELECT 
        schemaname,
        tablename,
        n_tup_ins as inserts,
        n_tup_upd as updates,
        n_tup_del as deletes,
        n_deadlocks as deadlock_count,

        -- Lock analysis
        CASE 
            WHEN n_deadlocks > 0 THEN 'deadlock_issues'
            WHEN n_tup_upd > n_tup_ins * 2 THEN 'high_update_contention'
            ELSE 'normal_operation'
        END as transaction_health,

        -- Performance indicators
        ROUND(
            (n_tup_upd + n_tup_del)::DECIMAL / NULLIF((n_tup_ins + n_tup_upd + n_tup_del), 0) * 100, 
            2
        ) as modification_ratio

    FROM pg_stat_user_tables
    WHERE schemaname = 'public'
    AND tablename IN ('orders', 'order_items', 'inventory', 'payments')
),

lock_conflicts AS (
    SELECT 
        relation::regclass as table_name,
        mode as lock_mode,
        granted,
        COUNT(*) as lock_count
    FROM pg_locks 
    WHERE relation IS NOT NULL
    GROUP BY relation, mode, granted
)

SELECT 
    ta.tablename,
    ta.transaction_health,
    ta.modification_ratio || '%' as modification_percentage,
    ta.deadlock_count,

    -- Lock conflict analysis
    COALESCE(lc.lock_count, 0) as active_locks,
    COALESCE(lc.lock_mode, 'none') as primary_lock_mode,

    -- Transaction recommendations
    CASE 
        WHEN ta.deadlock_count > 5 THEN 'Redesign transaction order and locking strategy'
        WHEN ta.modification_ratio > 80 THEN 'Consider read replicas for query workload'
        WHEN ta.transaction_health = 'high_update_contention' THEN 'Optimize update batching and reduce lock duration'
        ELSE 'Transaction patterns within acceptable parameters'
    END as optimization_recommendation

FROM transaction_analysis ta
LEFT JOIN lock_conflicts lc ON ta.tablename = lc.table_name::text
ORDER BY ta.deadlock_count DESC, ta.modification_ratio DESC;

-- Problems with traditional transaction management:
-- 1. Complex multi-table coordination requiring careful lock management and deadlock prevention
-- 2. Limited scalability due to lock contention and serialization constraints
-- 3. Difficulty implementing distributed transactions across services and external systems
-- 4. Performance overhead from lock management and transaction coordination mechanisms
-- 5. Complex error handling for various transaction failure scenarios and rollback procedures
-- 6. Limited flexibility in transaction isolation levels affecting performance vs consistency
-- 7. Challenges with long-running transactions and their impact on system performance
-- 8. Complexity in implementing saga patterns for distributed transaction coordination
-- 9. Manual management of transaction boundaries and session coordination
-- 10. Difficulty in monitoring and optimizing transaction performance across complex workflows

MongoDB provides native ACID transactions with multi-document operations and distributed consistency:

// MongoDB Transactions - native ACID compliance with distributed consistency management
const { MongoClient, ObjectId } = require('mongodb');

// Advanced MongoDB Transaction Manager with ACID guarantees and distributed consistency
class MongoTransactionManager {
  constructor(config = {}) {
    this.config = {
      // Connection configuration
      uri: config.uri || 'mongodb://localhost:27017',
      database: config.database || 'ecommerce_platform',

      // Transaction configuration
      defaultTransactionOptions: {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true },
        maxCommitTimeMS: 30000,
        maxTransactionLockRequestTimeoutMillis: 5000
      },

      // Session management
      sessionPoolSize: config.sessionPoolSize || 10,
      enableSessionPooling: config.enableSessionPooling !== false,

      // Retry and error handling
      maxRetryAttempts: config.maxRetryAttempts || 3,
      retryDelayMs: config.retryDelayMs || 1000,
      enableAutoRetry: config.enableAutoRetry !== false,

      // Monitoring and analytics
      enableTransactionMonitoring: config.enableTransactionMonitoring !== false,
      enablePerformanceTracking: config.enablePerformanceTracking !== false,

      // Advanced features
      enableDistributedTransactions: config.enableDistributedTransactions !== false,
      enableCausalConsistency: config.enableCausalConsistency !== false
    };

    this.client = null;
    this.database = null;
    this.sessionPool = [];
    this.transactionMetrics = {
      totalTransactions: 0,
      successfulTransactions: 0,
      failedTransactions: 0,
      retriedTransactions: 0,
      averageTransactionTime: 0,
      transactionTypes: new Map()
    };
  }

  async initialize() {
    console.log('Initializing MongoDB Transaction Manager with ACID guarantees...');

    try {
      // Connect with transaction-optimized settings
      this.client = new MongoClient(this.config.uri, {
        // Replica set configuration for transactions
        readPreference: 'primary',
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true },

        // Connection optimization for transactions
        maxPoolSize: 20,
        minPoolSize: 5,
        retryWrites: true,
        retryReads: true,

        // Session configuration
        maxIdleTimeMS: 60000,
        serverSelectionTimeoutMS: 30000,

        // Application identification
        appName: 'TransactionManager'
      });

      await this.client.connect();
      this.database = this.client.db(this.config.database);

      // Initialize session pool for transaction management
      await this.initializeSessionPool();

      // Setup transaction monitoring
      if (this.config.enableTransactionMonitoring) {
        await this.setupTransactionMonitoring();
      }

      console.log('MongoDB Transaction Manager initialized successfully');

      return this.database;

    } catch (error) {
      console.error('Failed to initialize transaction manager:', error);
      throw error;
    }
  }

  async initializeSessionPool() {
    console.log('Initializing session pool for transaction management...');

    for (let i = 0; i < this.config.sessionPoolSize; i++) {
      const session = this.client.startSession({
        causalConsistency: this.config.enableCausalConsistency,
        defaultTransactionOptions: this.config.defaultTransactionOptions
      });

      this.sessionPool.push({
        session,
        inUse: false,
        createdAt: new Date(),
        transactionCount: 0
      });
    }

    console.log(`Session pool initialized with ${this.sessionPool.length} sessions`);
  }

  async acquireSession() {
    // Find available session from pool
    let sessionWrapper = this.sessionPool.find(s => !s.inUse);

    if (!sessionWrapper) {
      // Create temporary session if pool exhausted
      console.warn('Session pool exhausted, creating temporary session');
      sessionWrapper = {
        session: this.client.startSession({
          causalConsistency: this.config.enableCausalConsistency,
          defaultTransactionOptions: this.config.defaultTransactionOptions
        }),
        inUse: true,
        createdAt: new Date(),
        transactionCount: 0,
        temporary: true
      };
    } else {
      sessionWrapper.inUse = true;
    }

    return sessionWrapper;
  }

  async releaseSession(sessionWrapper) {
    sessionWrapper.inUse = false;
    sessionWrapper.transactionCount++;

    // Clean up temporary sessions
    if (sessionWrapper.temporary) {
      await sessionWrapper.session.endSession();
    }
  }

  async executeTransaction(transactionFunction, options = {}) {
    console.log('Executing MongoDB transaction with ACID guarantees...');
    const startTime = Date.now();
    const transactionId = new ObjectId().toString();

    let sessionWrapper = null;
    let attempt = 0;
    const maxRetries = options.maxRetries || this.config.maxRetryAttempts;

    while (attempt < maxRetries) {
      try {
        // Acquire session for transaction
        sessionWrapper = await this.acquireSession();
        const session = sessionWrapper.session;

        // Configure transaction options
        const transactionOptions = {
          ...this.config.defaultTransactionOptions,
          ...options.transactionOptions
        };

        console.log(`Starting transaction ${transactionId} (attempt ${attempt + 1})`);

        // Start transaction with ACID properties
        session.startTransaction(transactionOptions);

        // Execute transaction function with session
        const result = await transactionFunction(session, this.database);

        // Commit transaction
        await session.commitTransaction();

        const executionTime = Date.now() - startTime;
        console.log(`Transaction ${transactionId} committed successfully in ${executionTime}ms`);

        // Update metrics
        await this.updateTransactionMetrics('success', executionTime, options.transactionType);

        return {
          transactionId,
          success: true,
          result,
          executionTime,
          attempt: attempt + 1
        };

      } catch (error) {
        console.error(`Transaction ${transactionId} failed on attempt ${attempt + 1}:`, error.message);

        if (sessionWrapper) {
          try {
            await sessionWrapper.session.abortTransaction();
          } catch (abortError) {
            console.error('Error aborting transaction:', abortError.message);
          }
        }

        // Check if error is retryable
        if (this.isRetryableError(error) && attempt < maxRetries - 1) {
          attempt++;
          console.log(`Retrying transaction ${transactionId} (attempt ${attempt + 1})`);

          // Wait with exponential backoff
          const delay = this.config.retryDelayMs * Math.pow(2, attempt - 1);
          await this.sleep(delay);

          continue;
        }

        // Transaction failed permanently
        const executionTime = Date.now() - startTime;
        await this.updateTransactionMetrics('failure', executionTime, options.transactionType);

        throw new Error(`Transaction ${transactionId} failed after ${attempt + 1} attempts: ${error.message}`);

      } finally {
        if (sessionWrapper) {
          await this.releaseSession(sessionWrapper);
        }
      }
    }
  }

  async processCustomerOrder(orderData) {
    console.log('Processing customer order with ACID transaction...');

    return await this.executeTransaction(async (session, db) => {
      const orderId = new ObjectId();
      const orderNumber = `ORD-${Date.now()}-${Math.floor(Math.random() * 1000)}`;
      const timestamp = new Date();

      // Collections for multi-document transaction
      const ordersCollection = db.collection('orders');
      const inventoryCollection = db.collection('inventory');
      const paymentsCollection = db.collection('payments');
      const customersCollection = db.collection('customers');

      // Step 1: Validate customer exists (with session for consistency)
      const customer = await customersCollection.findOne(
        { _id: new ObjectId(orderData.customerId) },
        { session }
      );

      if (!customer) {
        throw new Error('Customer not found');
      }

      // Step 2: Validate and reserve inventory atomically
      let totalAmount = 0;
      const inventoryUpdates = [];
      const orderItems = [];

      for (const item of orderData.items) {
        const productId = new ObjectId(item.productId);

        // Check inventory with document-level locking within transaction
        const inventory = await inventoryCollection.findOne(
          { productId: productId, locationId: 'main_warehouse' },
          { session }
        );

        if (!inventory) {
          throw new Error(`Inventory not found for product ${item.productId}`);
        }

        if (inventory.availableQuantity < item.quantity) {
          throw new Error(
            `Insufficient inventory for product ${item.productId}: ` +
            `requested ${item.quantity}, available ${inventory.availableQuantity}`
          );
        }

        // Prepare inventory update
        inventoryUpdates.push({
          updateOne: {
            filter: { 
              productId: productId, 
              locationId: 'main_warehouse',
              availableQuantity: { $gte: item.quantity } // Optimistic concurrency control
            },
            update: {
              $inc: {
                availableQuantity: -item.quantity,
                reservedQuantity: item.quantity
              },
              $set: { updatedAt: timestamp }
            }
          }
        });

        // Prepare order item
        const lineTotal = item.quantity * item.unitPrice;
        totalAmount += lineTotal;

        orderItems.push({
          _id: new ObjectId(),
          productId: productId,
          productSku: inventory.productSku,
          productName: inventory.productName,
          quantity: item.quantity,
          unitPrice: item.unitPrice,
          lineTotal: lineTotal,
          inventoryReserved: true,
          reservationTimestamp: timestamp
        });
      }

      // Step 3: Execute inventory updates atomically
      const inventoryResult = await inventoryCollection.bulkWrite(
        inventoryUpdates,
        { session, ordered: true }
      );

      if (inventoryResult.matchedCount !== orderData.items.length) {
        throw new Error('Inventory reservation failed due to concurrent updates');
      }

      // Step 4: Process payment authorization (atomic within transaction)
      const paymentId = new ObjectId();
      const payment = {
        _id: paymentId,
        orderId: orderId,
        paymentMethod: orderData.payment.method,
        paymentProcessor: orderData.payment.processor || 'stripe',
        amount: totalAmount,
        status: 'processing',
        authorizationAttempts: 0,
        createdAt: timestamp,

        // Payment details
        processorTransactionId: null,
        authorizationCode: null,
        processorResponse: null
      };

      // Simulate payment processing (in production would integrate with payment processor)
      if (orderData.payment.testMode) {
        payment.status = 'authorized';
        payment.authorizationCode = `TEST_AUTH_${Date.now()}`;
        payment.processorTransactionId = `test_txn_${paymentId}`;
        payment.authorizedAt = timestamp;
        payment.processorResponse = {
          status: 'approved',
          authCode: payment.authorizationCode,
          processorRef: payment.processorTransactionId,
          processedAt: timestamp
        };
      } else {
        // In production, would make external API call within transaction timeout
        payment.status = 'authorization_pending';
      }

      await paymentsCollection.insertOne(payment, { session });

      // Step 5: Create order document with all related data
      const order = {
        _id: orderId,
        orderNumber: orderNumber,
        customerId: new ObjectId(orderData.customerId),
        status: payment.status === 'authorized' ? 'confirmed' : 'payment_pending',

        // Order details
        items: orderItems,
        itemCount: orderItems.length,
        totalAmount: totalAmount,

        // Payment information
        paymentId: paymentId,
        paymentMethod: orderData.payment.method,
        paymentStatus: payment.status,

        // Shipping information
        shippingAddress: orderData.shipping.address,
        shippingMethod: orderData.shipping.method,
        shippingCost: orderData.shipping.cost || 0,

        // Timestamps and metadata
        createdAt: timestamp,
        updatedAt: timestamp,
        salesChannel: 'web',
        orderSource: 'transaction_api',

        // Transaction tracking
        transactionId: session.id ? session.id.toString() : null,
        inventoryReserved: true,
        inventoryReservationExpiry: new Date(timestamp.getTime() + 15 * 60 * 1000) // 15 minutes
      };

      await ordersCollection.insertOne(order, { session });

      // Step 6: Update customer order history (within same transaction)
      await customersCollection.updateOne(
        { _id: new ObjectId(orderData.customerId) },
        {
          $inc: { 
            totalOrders: 1,
            totalSpent: totalAmount
          },
          $push: {
            recentOrders: {
              $each: [{ orderId: orderId, orderNumber: orderNumber, amount: totalAmount, date: timestamp }],
              $slice: -10 // Keep only last 10 orders
            }
          },
          $set: { lastOrderDate: timestamp, updatedAt: timestamp }
        },
        { session }
      );

      console.log(`Order ${orderNumber} processed successfully with ${orderItems.length} items`);

      return {
        orderId: orderId,
        orderNumber: orderNumber,
        status: order.status,
        totalAmount: totalAmount,
        paymentStatus: payment.status,
        inventoryReserved: true,
        items: orderItems.length,
        processingTime: Date.now() - timestamp.getTime()
      };

    }, {
      transactionType: 'customer_order',
      maxRetries: 3,
      transactionOptions: {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true }
      }
    });
  }

  async processInventoryTransfer(transferData) {
    console.log('Processing inventory transfer with ACID transaction...');

    return await this.executeTransaction(async (session, db) => {
      const transferId = new ObjectId();
      const timestamp = new Date();

      const inventoryCollection = db.collection('inventory');
      const transfersCollection = db.collection('inventory_transfers');

      // Validate source location inventory
      const sourceInventory = await inventoryCollection.findOne(
        { 
          productId: new ObjectId(transferData.productId),
          locationId: transferData.sourceLocation
        },
        { session }
      );

      if (!sourceInventory || sourceInventory.availableQuantity < transferData.quantity) {
        throw new Error(
          `Insufficient inventory at source location ${transferData.sourceLocation}: ` +
          `requested ${transferData.quantity}, available ${sourceInventory?.availableQuantity || 0}`
        );
      }

      // Validate destination location exists
      const destinationInventory = await inventoryCollection.findOne(
        { 
          productId: new ObjectId(transferData.productId),
          locationId: transferData.destinationLocation
        },
        { session }
      );

      if (!destinationInventory) {
        throw new Error(`Destination location ${transferData.destinationLocation} not found`);
      }

      // Execute atomic inventory updates
      const transferOperations = [
        {
          updateOne: {
            filter: {
              productId: new ObjectId(transferData.productId),
              locationId: transferData.sourceLocation,
              availableQuantity: { $gte: transferData.quantity }
            },
            update: {
              $inc: { availableQuantity: -transferData.quantity },
              $set: { updatedAt: timestamp }
            }
          }
        },
        {
          updateOne: {
            filter: {
              productId: new ObjectId(transferData.productId),
              locationId: transferData.destinationLocation
            },
            update: {
              $inc: { availableQuantity: transferData.quantity },
              $set: { updatedAt: timestamp }
            }
          }
        }
      ];

      const transferResult = await inventoryCollection.bulkWrite(
        transferOperations,
        { session, ordered: true }
      );

      if (transferResult.matchedCount !== 2) {
        throw new Error('Inventory transfer failed due to concurrent updates');
      }

      // Record transfer transaction
      const transfer = {
        _id: transferId,
        productId: new ObjectId(transferData.productId),
        sourceLocation: transferData.sourceLocation,
        destinationLocation: transferData.destinationLocation,
        quantity: transferData.quantity,
        transferType: transferData.transferType || 'manual',
        reason: transferData.reason || 'inventory_rebalancing',
        status: 'completed',

        // Audit trail
        requestedBy: transferData.requestedBy,
        approvedBy: transferData.approvedBy,
        createdAt: timestamp,
        completedAt: timestamp,

        // Transaction metadata
        transactionId: session.id ? session.id.toString() : null
      };

      await transfersCollection.insertOne(transfer, { session });

      console.log(`Inventory transfer completed: ${transferData.quantity} units from ${transferData.sourceLocation} to ${transferData.destinationLocation}`);

      return {
        transferId: transferId,
        productId: transferData.productId,
        quantity: transferData.quantity,
        sourceLocation: transferData.sourceLocation,
        destinationLocation: transferData.destinationLocation,
        status: 'completed',
        completedAt: timestamp
      };

    }, {
      transactionType: 'inventory_transfer',
      maxRetries: 3
    });
  }

  async processRefundTransaction(refundData) {
    console.log('Processing refund transaction with ACID guarantees...');

    return await this.executeTransaction(async (session, db) => {
      const refundId = new ObjectId();
      const timestamp = new Date();

      const ordersCollection = db.collection('orders');
      const paymentsCollection = db.collection('payments');
      const inventoryCollection = db.collection('inventory');
      const refundsCollection = db.collection('refunds');

      // Validate original order and payment
      const order = await ordersCollection.findOne(
        { _id: new ObjectId(refundData.orderId) },
        { session }
      );

      if (!order) {
        throw new Error('Order not found');
      }

      if (order.status === 'refunded') {
        throw new Error('Order already refunded');
      }

      const payment = await paymentsCollection.findOne(
        { orderId: new ObjectId(refundData.orderId) },
        { session }
      );

      if (!payment || payment.status !== 'authorized') {
        throw new Error('Payment not found or not in authorized state');
      }

      // Calculate refund amount
      const refundAmount = refundData.fullRefund ? order.totalAmount : refundData.amount;

      if (refundAmount > order.totalAmount) {
        throw new Error('Refund amount cannot exceed order total');
      }

      // Process inventory restoration if items are being refunded
      const inventoryUpdates = [];
      if (refundData.restoreInventory && refundData.itemsToRefund) {
        for (const refundItem of refundData.itemsToRefund) {
          const orderItem = order.items.find(item => 
            item.productId.toString() === refundItem.productId
          );

          if (!orderItem) {
            throw new Error(`Order item not found: ${refundItem.productId}`);
          }

          inventoryUpdates.push({
            updateOne: {
              filter: {
                productId: new ObjectId(refundItem.productId),
                locationId: 'main_warehouse'
              },
              update: {
                $inc: {
                  availableQuantity: refundItem.quantity,
                  reservedQuantity: -refundItem.quantity
                },
                $set: { updatedAt: timestamp }
              }
            }
          });
        }

        // Execute inventory restoration
        if (inventoryUpdates.length > 0) {
          await inventoryCollection.bulkWrite(inventoryUpdates, { session });
        }
      }

      // Create refund record
      const refund = {
        _id: refundId,
        orderId: new ObjectId(refundData.orderId),
        orderNumber: order.orderNumber,
        originalAmount: order.totalAmount,
        refundAmount: refundAmount,
        refundType: refundData.fullRefund ? 'full' : 'partial',
        reason: refundData.reason,

        // Processing details
        status: 'processing',
        paymentMethod: payment.paymentMethod,
        processorTransactionId: null,

        // Items being refunded
        itemsRefunded: refundData.itemsToRefund || [],
        inventoryRestored: refundData.restoreInventory || false,

        // Audit trail
        requestedBy: refundData.requestedBy,
        processedBy: refundData.processedBy,
        createdAt: timestamp,

        // Transaction metadata
        transactionId: session.id ? session.id.toString() : null
      };

      // Simulate refund processing
      if (refundData.testMode) {
        refund.status = 'completed';
        refund.processorTransactionId = `refund_${refundId}`;
        refund.processedAt = timestamp;
        refund.processorResponse = {
          status: 'refunded',
          refundRef: refund.processorTransactionId,
          processedAt: timestamp
        };
      }

      await refundsCollection.insertOne(refund, { session });

      // Update order status
      const newOrderStatus = refundData.fullRefund ? 'refunded' : 'partially_refunded';
      await ordersCollection.updateOne(
        { _id: new ObjectId(refundData.orderId) },
        {
          $set: {
            status: newOrderStatus,
            refundStatus: refund.status,
            refundAmount: refundAmount,
            refundedAt: timestamp,
            updatedAt: timestamp
          }
        },
        { session }
      );

      // Update payment record
      await paymentsCollection.updateOne(
        { orderId: new ObjectId(refundData.orderId) },
        {
          $set: {
            refundStatus: refund.status,
            refundAmount: refundAmount,
            refundedAt: timestamp,
            updatedAt: timestamp
          }
        },
        { session }
      );

      console.log(`Refund processed: ${refundAmount} for order ${order.orderNumber}`);

      return {
        refundId: refundId,
        orderId: refundData.orderId,
        refundAmount: refundAmount,
        status: refund.status,
        inventoryRestored: refund.inventoryRestored,
        processedAt: timestamp
      };

    }, {
      transactionType: 'refund_processing',
      maxRetries: 2
    });
  }

  // Utility methods for transaction management

  isRetryableError(error) {
    // MongoDB transient transaction errors that can be retried
    const retryableErrorCodes = [
      112, // WriteConflict
      117, // ConflictingOperationInProgress  
      251, // NoSuchTransaction
      244, // TransactionCoordinatorSteppingDown
      246, // TransactionCoordinatorReachedAbortDecision
    ];

    const retryableErrorLabels = [
      'TransientTransactionError',
      'UnknownTransactionCommitResult'
    ];

    return retryableErrorCodes.includes(error.code) ||
           retryableErrorLabels.some(label => error.errorLabels?.includes(label)) ||
           error.message.includes('WriteConflict') ||
           error.message.includes('TransientTransactionError');
  }

  async updateTransactionMetrics(status, executionTime, transactionType) {
    this.transactionMetrics.totalTransactions++;

    if (status === 'success') {
      this.transactionMetrics.successfulTransactions++;
    } else {
      this.transactionMetrics.failedTransactions++;
    }

    // Update average execution time
    const totalTime = this.transactionMetrics.averageTransactionTime * 
                      (this.transactionMetrics.totalTransactions - 1);
    this.transactionMetrics.averageTransactionTime = 
      (totalTime + executionTime) / this.transactionMetrics.totalTransactions;

    // Track transaction types
    if (transactionType) {
      const typeStats = this.transactionMetrics.transactionTypes.get(transactionType) || {
        count: 0,
        successCount: 0,
        failureCount: 0,
        averageTime: 0
      };

      typeStats.count++;
      if (status === 'success') {
        typeStats.successCount++;
      } else {
        typeStats.failureCount++;
      }

      typeStats.averageTime = ((typeStats.averageTime * (typeStats.count - 1)) + executionTime) / typeStats.count;

      this.transactionMetrics.transactionTypes.set(transactionType, typeStats);
    }
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  async getTransactionMetrics() {
    const successRate = this.transactionMetrics.totalTransactions > 0 ?
      (this.transactionMetrics.successfulTransactions / this.transactionMetrics.totalTransactions) * 100 : 0;

    return {
      totalTransactions: this.transactionMetrics.totalTransactions,
      successfulTransactions: this.transactionMetrics.successfulTransactions,
      failedTransactions: this.transactionMetrics.failedTransactions,
      successRate: Math.round(successRate * 100) / 100,
      averageTransactionTime: Math.round(this.transactionMetrics.averageTransactionTime),
      transactionTypes: Object.fromEntries(this.transactionMetrics.transactionTypes),
      sessionPoolSize: this.sessionPool.length,
      availableSessions: this.sessionPool.filter(s => !s.inUse).length
    };
  }

  async setupTransactionMonitoring() {
    console.log('Setting up transaction monitoring and analytics...');

    // Monitor transaction performance
    setInterval(async () => {
      const metrics = await this.getTransactionMetrics();
      console.log('Transaction Metrics:', metrics);

      // Store metrics to database for analysis
      if (this.database) {
        await this.database.collection('transaction_metrics').insertOne({
          ...metrics,
          timestamp: new Date()
        });
      }
    }, 60000); // Every minute
  }

  async closeTransactionManager() {
    console.log('Closing MongoDB Transaction Manager...');

    // End all sessions in pool
    for (const sessionWrapper of this.sessionPool) {
      try {
        await sessionWrapper.session.endSession();
      } catch (error) {
        console.error('Error ending session:', error);
      }
    }

    // Close MongoDB connection
    if (this.client) {
      await this.client.close();
    }

    console.log('Transaction Manager closed successfully');
  }
}

// Example usage demonstrating ACID transactions
async function demonstrateMongoDBTransactions() {
  const transactionManager = new MongoTransactionManager({
    uri: 'mongodb://localhost:27017',
    database: 'ecommerce_transactions',
    enableTransactionMonitoring: true
  });

  try {
    await transactionManager.initialize();

    // Demonstrate customer order processing with ACID guarantees
    const orderResult = await transactionManager.processCustomerOrder({
      customerId: '507f1f77bcf86cd799439011',
      items: [
        { productId: '507f1f77bcf86cd799439012', quantity: 2, unitPrice: 29.99 },
        { productId: '507f1f77bcf86cd799439013', quantity: 1, unitPrice: 149.99 }
      ],
      payment: {
        method: 'credit_card',
        processor: 'stripe',
        testMode: true
      },
      shipping: {
        method: 'standard',
        cost: 9.99,
        address: {
          street: '123 Main St',
          city: 'Boston',
          state: 'MA',
          zipCode: '02101'
        }
      }
    });

    console.log('Order processing result:', orderResult);

    // Demonstrate inventory transfer with ACID consistency
    const transferResult = await transactionManager.processInventoryTransfer({
      productId: '507f1f77bcf86cd799439012',
      sourceLocation: 'warehouse_east',
      destinationLocation: 'warehouse_west',
      quantity: 50,
      transferType: 'rebalancing',
      reason: 'Regional demand adjustment',
      requestedBy: 'inventory_manager',
      approvedBy: 'operations_director'
    });

    console.log('Inventory transfer result:', transferResult);

    // Demonstrate refund processing with inventory restoration
    const refundResult = await transactionManager.processRefundTransaction({
      orderId: orderResult.result.orderId,
      fullRefund: false,
      amount: 59.98, // Refund for first item
      reason: 'Customer satisfaction',
      restoreInventory: true,
      itemsToRefund: [
        { productId: '507f1f77bcf86cd799439012', quantity: 2 }
      ],
      testMode: true,
      requestedBy: 'customer_service',
      processedBy: 'service_manager'
    });

    console.log('Refund processing result:', refundResult);

    // Get transaction performance metrics
    const metrics = await transactionManager.getTransactionMetrics();
    console.log('Transaction Performance Metrics:', metrics);

    return {
      orderResult,
      transferResult, 
      refundResult,
      metrics
    };

  } catch (error) {
    console.error('Transaction demonstration failed:', error);
    throw error;
  } finally {
    await transactionManager.closeTransactionManager();
  }
}

// Benefits of MongoDB ACID Transactions:
// - Native multi-document ACID compliance eliminates complex coordination logic
// - Distributed transaction support across replica sets and sharded clusters  
// - Automatic retry and recovery mechanisms for transient failures
// - Session-based transaction management with connection pooling optimization
// - Comprehensive transaction monitoring and performance analytics
// - Flexible transaction boundaries supporting complex business workflows
// - Integration with MongoDB's document model for rich transactional operations
// - Production-ready error handling with intelligent retry strategies

module.exports = {
  MongoTransactionManager,
  demonstrateMongoDBTransactions
};

SQL-Style Transaction Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB transactions and ACID operations:

-- QueryLeaf MongoDB transactions with SQL-familiar ACID syntax

-- Configure transaction settings
SET transaction_isolation_level = 'read_committed';
SET transaction_timeout = '30 seconds';
SET enable_auto_retry = true;
SET max_retry_attempts = 3;
SET transaction_read_concern = 'majority';
SET transaction_write_concern = 'majority';

-- Begin transaction with explicit ACID properties
BEGIN TRANSACTION
    READ CONCERN MAJORITY
    WRITE CONCERN MAJORITY
    TIMEOUT 30000
    MAX_RETRY_ATTEMPTS 3;

-- Customer order processing with multi-collection ACID transaction
WITH order_transaction_context AS (
    -- Transaction metadata and configuration
    SELECT 
        GENERATE_UUID() as transaction_id,
        CURRENT_TIMESTAMP as transaction_start_time,
        'customer_order_processing' as transaction_type,
        JSON_OBJECT(
            'isolation_level', 'read_committed',
            'consistency_level', 'strong',
            'durability', 'guaranteed',
            'atomicity', 'all_or_nothing'
        ) as acid_properties
),

-- Step 1: Validate customer and inventory availability
order_validation AS (
    SELECT 
        c.customer_id,
        c.customer_email,
        c.customer_status,

        -- Order items validation with inventory checks
        ARRAY_AGG(
            JSON_OBJECT(
                'product_id', p.product_id,
                'product_sku', p.sku,
                'product_name', p.name,
                'requested_quantity', oi.quantity,
                'unit_price', oi.unit_price,
                'available_inventory', i.available_quantity,
                'can_fulfill', CASE WHEN i.available_quantity >= oi.quantity THEN true ELSE false END,
                'line_total', oi.quantity * oi.unit_price
            )
        ) as order_items_validation,

        -- Aggregate order totals
        SUM(oi.quantity * oi.unit_price) as order_total,
        COUNT(*) as item_count,
        COUNT(*) FILTER (WHERE i.available_quantity >= oi.quantity) as fulfillable_items,

        -- Validation status
        CASE 
            WHEN c.customer_status != 'active' THEN 'customer_inactive'
            WHEN COUNT(*) != COUNT(*) FILTER (WHERE i.available_quantity >= oi.quantity) THEN 'insufficient_inventory'
            ELSE 'validated'
        END as validation_status

    FROM customers c
    CROSS JOIN (
        SELECT 
            'product_uuid_1' as product_id, 2 as quantity, 29.99 as unit_price
        UNION ALL
        SELECT 
            'product_uuid_2' as product_id, 1 as quantity, 149.99 as unit_price
    ) oi
    JOIN products p ON p.product_id = oi.product_id
    JOIN inventory i ON i.product_id = oi.product_id AND i.location_id = 'main_warehouse'
    WHERE c.customer_id = 'customer_uuid_123'
    GROUP BY c.customer_id, c.customer_email, c.customer_status
),

-- Step 2: Reserve inventory atomically (within transaction)  
inventory_reservations AS (
    UPDATE inventory 
    SET 
        available_quantity = available_quantity - reservation_info.quantity,
        reserved_quantity = reserved_quantity + reservation_info.quantity,
        updated_at = CURRENT_TIMESTAMP,
        last_reservation_id = otc.transaction_id
    FROM (
        SELECT 
            json_array_elements(ov.order_items_validation)->>'product_id' as product_id,
            (json_array_elements(ov.order_items_validation)->>'requested_quantity')::INTEGER as quantity
        FROM order_validation ov
        CROSS JOIN order_transaction_context otc
        WHERE ov.validation_status = 'validated'
    ) reservation_info,
    order_transaction_context otc
    WHERE inventory.product_id = reservation_info.product_id
    AND inventory.location_id = 'main_warehouse'
    AND inventory.available_quantity >= reservation_info.quantity
    RETURNING 
        product_id,
        available_quantity as new_available_quantity,
        reserved_quantity as new_reserved_quantity,
        'reserved' as reservation_status
),

-- Step 3: Process payment authorization (simulated within transaction)
payment_processing AS (
    INSERT INTO payments (
        payment_id,
        transaction_id,  
        order_amount,
        payment_method,
        payment_processor,
        payment_status,
        authorization_code,
        processed_at,

        -- ACID transaction metadata
        transaction_isolation_level,
        transaction_consistency_guarantee,
        created_within_transaction
    )
    SELECT 
        GENERATE_UUID() as payment_id,
        otc.transaction_id,
        ov.order_total,
        'credit_card' as payment_method,
        'stripe' as payment_processor,
        'authorized' as payment_status,
        'AUTH_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) as authorization_code,
        CURRENT_TIMESTAMP as processed_at,

        -- Transaction ACID properties
        'read_committed' as transaction_isolation_level,
        'strong_consistency' as transaction_consistency_guarantee,
        true as created_within_transaction

    FROM order_validation ov
    CROSS JOIN order_transaction_context otc
    WHERE ov.validation_status = 'validated'
    RETURNING payment_id, payment_status, authorization_code
),

-- Step 4: Create order with full ACID compliance
order_creation AS (
    INSERT INTO orders (
        order_id,
        transaction_id,
        customer_id,
        order_number,
        order_status,

        -- Order details
        items,
        item_count,
        total_amount,

        -- Payment information
        payment_id,
        payment_method,
        payment_status,

        -- Inventory status
        inventory_reserved,
        reservation_expiry,

        -- Transaction metadata  
        created_within_transaction,
        transaction_isolation_level,
        acid_compliance_verified,

        -- Timestamps
        created_at,
        updated_at
    )
    SELECT 
        GENERATE_UUID() as order_id,
        otc.transaction_id,
        ov.customer_id,
        'ORD-' || to_char(CURRENT_TIMESTAMP, 'YYYYMMDD') || '-' || 
            LPAD(EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)::INTEGER % 10000, 4, '0') as order_number,
        'confirmed' as order_status,

        -- Order items with reservation confirmation
        JSON_AGG(
            JSON_OBJECT(
                'product_id', (json_array_elements(ov.order_items_validation)->>'product_id'),
                'quantity', (json_array_elements(ov.order_items_validation)->>'requested_quantity')::INTEGER,
                'unit_price', (json_array_elements(ov.order_items_validation)->>'unit_price')::DECIMAL,
                'line_total', (json_array_elements(ov.order_items_validation)->>'line_total')::DECIMAL,
                'inventory_reserved', true,
                'reservation_confirmed', EXISTS(
                    SELECT 1 FROM inventory_reservations ir 
                    WHERE ir.product_id = json_array_elements(ov.order_items_validation)->>'product_id'
                )
            )
        ) as items,
        ov.item_count,
        ov.order_total,

        pp.payment_id,
        'credit_card' as payment_method,
        pp.payment_status,

        true as inventory_reserved,
        CURRENT_TIMESTAMP + INTERVAL '15 minutes' as reservation_expiry,

        -- ACID transaction verification
        true as created_within_transaction,
        'read_committed' as transaction_isolation_level,
        true as acid_compliance_verified,

        CURRENT_TIMESTAMP as created_at,
        CURRENT_TIMESTAMP as updated_at

    FROM order_validation ov
    CROSS JOIN order_transaction_context otc
    CROSS JOIN payment_processing pp
    WHERE ov.validation_status = 'validated'
    GROUP BY otc.transaction_id, ov.customer_id, ov.item_count, ov.order_total, 
             pp.payment_id, pp.payment_status
    RETURNING order_id, order_number, order_status, total_amount
),

-- Step 5: Update customer statistics (within same transaction)
customer_statistics_update AS (
    UPDATE customers 
    SET 
        total_orders = total_orders + 1,
        total_spent = total_spent + oc.total_amount,
        last_order_date = CURRENT_TIMESTAMP,
        last_order_amount = oc.total_amount,
        updated_at = CURRENT_TIMESTAMP,

        -- Transaction audit trail
        last_transaction_id = otc.transaction_id,
        updated_within_transaction = true

    FROM order_creation oc
    CROSS JOIN order_transaction_context otc
    WHERE customers.customer_id = (
        SELECT customer_id FROM order_validation WHERE validation_status = 'validated'
    )
    RETURNING customer_id, total_orders, total_spent, last_order_date
),

-- Final transaction result compilation
transaction_result AS (
    SELECT 
        otc.transaction_id,
        otc.transaction_type,
        'committed' as transaction_status,

        -- Order details
        oc.order_id,
        oc.order_number,
        oc.order_status,
        oc.total_amount,

        -- Payment confirmation
        pp.payment_id,
        pp.payment_status,
        pp.authorization_code,

        -- Inventory confirmation
        ARRAY_AGG(
            JSON_OBJECT(
                'product_id', ir.product_id,
                'reservation_status', ir.reservation_status,
                'available_quantity', ir.new_available_quantity,
                'reserved_quantity', ir.new_reserved_quantity
            )
        ) as inventory_reservations,

        -- Customer update confirmation
        csu.total_orders as customer_total_orders,
        csu.total_spent as customer_total_spent,

        -- ACID compliance verification
        JSON_OBJECT(
            'atomicity', 'all_operations_committed',
            'consistency', 'business_rules_enforced', 
            'isolation', 'read_committed_maintained',
            'durability', 'changes_persisted'
        ) as acid_verification,

        -- Performance metrics
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - otc.transaction_start_time)) * 1000 as transaction_duration_ms,
        COUNT(DISTINCT ir.product_id) as items_reserved,

        -- Transaction metadata
        CURRENT_TIMESTAMP as transaction_committed_at,
        true as transaction_successful

    FROM order_transaction_context otc
    CROSS JOIN order_creation oc
    CROSS JOIN payment_processing pp
    LEFT JOIN inventory_reservations ir ON true
    LEFT JOIN customer_statistics_update csu ON true
    GROUP BY otc.transaction_id, otc.transaction_type, otc.transaction_start_time,
             oc.order_id, oc.order_number, oc.order_status, oc.total_amount,
             pp.payment_id, pp.payment_status, pp.authorization_code,
             csu.total_orders, csu.total_spent
)

-- Return comprehensive transaction result
SELECT 
    tr.transaction_id,
    tr.transaction_status,
    tr.order_id,
    tr.order_number,
    tr.total_amount,
    tr.payment_status,
    tr.inventory_reservations,
    tr.acid_verification,
    tr.transaction_duration_ms || 'ms' as execution_time,
    tr.transaction_successful,

    -- Success confirmation message
    CASE 
        WHEN tr.transaction_successful THEN 
            'Order ' || tr.order_number || ' processed successfully with ACID guarantees: ' ||
            tr.items_reserved || ' items reserved, payment ' || tr.payment_status ||
            ', customer statistics updated'
        ELSE 'Transaction failed - all changes rolled back'
    END as result_summary

FROM transaction_result tr;

-- Commit transaction with durability guarantee
COMMIT TRANSACTION 
    WITH DURABILITY_GUARANTEE = 'majority_acknowledged'
    AND CONSISTENCY_CHECK = 'business_rules_validated';

-- Transaction performance and ACID compliance monitoring
WITH transaction_performance_analysis AS (
    SELECT 
        transaction_type,
        DATE_TRUNC('hour', transaction_committed_at) as hour_bucket,

        -- Performance metrics
        COUNT(*) as transaction_count,
        AVG(transaction_duration_ms) as avg_duration_ms,
        MAX(transaction_duration_ms) as max_duration_ms,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY transaction_duration_ms) as p95_duration_ms,

        -- Success and failure rates
        COUNT(*) FILTER (WHERE transaction_successful = true) as successful_transactions,
        COUNT(*) FILTER (WHERE transaction_successful = false) as failed_transactions,
        ROUND(
            COUNT(*) FILTER (WHERE transaction_successful = true)::DECIMAL / COUNT(*) * 100, 
            2
        ) as success_rate_percent,

        -- ACID compliance metrics
        COUNT(*) FILTER (WHERE acid_verification->>'atomicity' = 'all_operations_committed') as atomic_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'consistency' = 'business_rules_enforced') as consistent_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'isolation' = 'read_committed_maintained') as isolated_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'durability' = 'changes_persisted') as durable_transactions,

        -- Resource utilization analysis
        AVG(items_reserved) as avg_items_per_transaction,
        SUM(total_amount) as total_transaction_value

    FROM transaction_results_log
    WHERE transaction_committed_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY transaction_type, DATE_TRUNC('hour', transaction_committed_at)
),

-- ACID compliance assessment
acid_compliance_assessment AS (
    SELECT 
        tpa.transaction_type,
        tpa.hour_bucket,
        tpa.transaction_count,

        -- Performance assessment
        CASE 
            WHEN tpa.avg_duration_ms < 100 THEN 'excellent'
            WHEN tpa.avg_duration_ms < 500 THEN 'good' 
            WHEN tpa.avg_duration_ms < 1000 THEN 'acceptable'
            ELSE 'needs_optimization'
        END as performance_rating,

        -- ACID compliance scoring
        ROUND(
            (tpa.atomic_transactions + tpa.consistent_transactions + 
             tpa.isolated_transactions + tpa.durable_transactions)::DECIMAL / 
            (tpa.transaction_count * 4) * 100, 
            2
        ) as acid_compliance_score,

        -- Reliability assessment
        CASE 
            WHEN tpa.success_rate_percent >= 99.9 THEN 'highly_reliable'
            WHEN tpa.success_rate_percent >= 99.0 THEN 'reliable'
            WHEN tpa.success_rate_percent >= 95.0 THEN 'acceptable'
            ELSE 'needs_improvement'
        END as reliability_rating,

        -- Throughput analysis
        ROUND(tpa.transaction_count / 3600.0, 2) as transactions_per_second,
        ROUND(tpa.total_transaction_value / tpa.transaction_count, 2) as avg_transaction_value,

        -- Optimization recommendations
        CASE 
            WHEN tpa.avg_duration_ms > 1000 THEN 'Optimize transaction logic and reduce operation count'
            WHEN tpa.success_rate_percent < 95 THEN 'Investigate failure patterns and improve error handling'
            WHEN tpa.p95_duration_ms > tpa.avg_duration_ms * 3 THEN 'Address performance outliers and resource contention'
            ELSE 'Transaction performance within acceptable parameters'
        END as optimization_recommendation

    FROM transaction_performance_analysis tpa
)

-- Comprehensive transaction monitoring dashboard
SELECT 
    aca.transaction_type,
    TO_CHAR(aca.hour_bucket, 'YYYY-MM-DD HH24:00') as analysis_period,
    aca.transaction_count,

    -- Performance metrics
    ROUND(tpa.avg_duration_ms, 2) || 'ms' as avg_execution_time,
    ROUND(tpa.p95_duration_ms, 2) || 'ms' as p95_execution_time,
    aca.performance_rating,
    aca.transactions_per_second || '/sec' as throughput,

    -- ACID compliance status
    aca.acid_compliance_score || '%' as acid_compliance,
    CASE 
        WHEN aca.acid_compliance_score >= 99.9 THEN 'Full ACID Compliance'
        WHEN aca.acid_compliance_score >= 99.0 THEN 'High ACID Compliance'
        WHEN aca.acid_compliance_score >= 95.0 THEN 'Acceptable ACID Compliance'
        ELSE 'ACID Compliance Issues Detected'
    END as compliance_status,

    -- Reliability metrics
    tpa.success_rate_percent || '%' as success_rate,
    aca.reliability_rating,
    tpa.failed_transactions as failure_count,

    -- Business impact
    '$' || ROUND(aca.avg_transaction_value, 2) as avg_transaction_value,
    '$' || ROUND(tpa.total_transaction_value, 2) as total_value_processed,

    -- Operational guidance
    aca.optimization_recommendation,

    -- System health indicators
    CASE 
        WHEN aca.performance_rating = 'excellent' AND aca.reliability_rating = 'highly_reliable' THEN 'optimal'
        WHEN aca.performance_rating IN ('excellent', 'good') AND aca.reliability_rating IN ('highly_reliable', 'reliable') THEN 'healthy'
        WHEN aca.performance_rating = 'acceptable' OR aca.reliability_rating = 'acceptable' THEN 'monitoring_required'
        ELSE 'attention_required'
    END as system_health,

    -- Next steps
    CASE 
        WHEN aca.performance_rating = 'needs_optimization' THEN 'Immediate performance tuning required'
        WHEN aca.reliability_rating = 'needs_improvement' THEN 'Investigate and resolve reliability issues'
        WHEN aca.acid_compliance_score < 99 THEN 'Review ACID compliance implementation'
        ELSE 'Continue monitoring and maintain current configuration'
    END as recommended_actions

FROM acid_compliance_assessment aca
JOIN transaction_performance_analysis tpa ON 
    aca.transaction_type = tpa.transaction_type AND 
    aca.hour_bucket = tpa.hour_bucket
ORDER BY aca.hour_bucket DESC, aca.transaction_count DESC;

-- QueryLeaf provides comprehensive MongoDB transaction capabilities:
-- 1. SQL-familiar ACID transaction syntax with explicit isolation levels and consistency guarantees
-- 2. Multi-document operations with atomic commit/rollback across collections
-- 3. Automatic retry mechanisms with configurable backoff strategies for transient failures
-- 4. Comprehensive transaction monitoring with performance and compliance analytics
-- 5. Session management and connection pooling optimization for transaction performance
-- 6. Distributed transaction coordination across replica sets and sharded clusters
-- 7. Business logic integration with transaction boundaries and error handling
-- 8. SQL-style transaction control statements (BEGIN, COMMIT, ROLLBACK) for familiar workflow
-- 9. Advanced analytics for transaction performance tuning and ACID compliance verification
-- 10. Enterprise-grade transaction management with monitoring and operational insights

Best Practices for MongoDB Transaction Implementation

ACID Compliance and Performance Optimization

Essential practices for implementing MongoDB transactions effectively:

  1. Transaction Boundaries: Design clear transaction boundaries that encompass related operations while minimizing transaction duration
  2. Error Handling Strategy: Implement comprehensive retry logic for transient failures and proper rollback procedures for business logic errors
  3. Performance Considerations: Optimize transactions for minimal lock contention and efficient resource utilization
  4. Session Management: Use connection pooling and session management to optimize transaction performance across concurrent operations
  5. Monitoring and Analytics: Establish comprehensive monitoring for transaction success rates, performance, and ACID compliance verification
  6. Testing Strategies: Implement thorough testing of transaction boundaries, failure scenarios, and recovery procedures

Production Deployment and Scalability

Key considerations for enterprise MongoDB transaction deployments:

  1. Replica Set Configuration: Ensure proper replica set deployment with sufficient nodes for transaction availability and performance
  2. Distributed Transactions: Design transaction patterns that work efficiently across sharded MongoDB clusters
  3. Resource Planning: Plan for transaction resource requirements including memory, CPU, and network overhead
  4. Backup and Recovery: Implement backup strategies that account for transaction consistency and point-in-time recovery
  5. Security Implementation: Secure transaction operations with proper authentication, authorization, and audit logging
  6. Operational Procedures: Create standardized procedures for transaction monitoring, troubleshooting, and performance tuning

Conclusion

MongoDB transactions provide comprehensive ACID properties with multi-document operations, distributed consistency guarantees, and intelligent session management designed for modern applications requiring strong consistency across complex business operations. The native transaction support eliminates the complexity of manual coordination while providing enterprise-grade reliability and performance for distributed systems.

Key MongoDB transaction benefits include:

  • Complete ACID Compliance: Full atomicity, consistency, isolation, and durability guarantees across multi-document operations
  • Distributed Consistency: Native support for transactions across replica sets and sharded clusters with automatic coordination
  • Intelligent Retry Logic: Built-in retry mechanisms for transient failures with configurable backoff strategies
  • Session Management: Optimized session pooling and connection management for transaction performance
  • Comprehensive Monitoring: Real-time transaction performance analytics and ACID compliance verification
  • SQL Compatibility: Familiar transaction management patterns accessible through SQL-style operations

Whether you're building financial applications, e-commerce platforms, inventory management systems, or any application requiring strong consistency guarantees, MongoDB transactions with QueryLeaf's SQL-familiar interface provide the foundation for reliable, scalable, and maintainable transactional operations.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB transactions while providing SQL-familiar syntax for transaction management and monitoring. Advanced ACID patterns, error handling strategies, and performance optimization techniques are seamlessly accessible through familiar SQL constructs, making sophisticated transaction management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's robust ACID transaction capabilities with familiar SQL-style transaction management makes it an ideal platform for applications that require both strong consistency guarantees and familiar development patterns, ensuring your transactional operations maintain data integrity while scaling efficiently across distributed environments.

MongoDB TTL Collections and Automatic Data Lifecycle Management: Enterprise-Grade Data Expiration and Storage Optimization

Modern applications generate massive amounts of time-sensitive data that requires intelligent lifecycle management to prevent storage bloat, maintain performance, and satisfy compliance requirements. Traditional relational databases provide limited automatic data expiration capabilities, often requiring complex batch jobs, manual cleanup procedures, or external scheduling systems that add operational overhead and complexity to data management workflows.

MongoDB TTL (Time To Live) collections provide native automatic data expiration capabilities with precise control over data retention policies, storage optimization, and compliance-driven data lifecycle management. Unlike traditional databases that require manual cleanup procedures and complex scheduling, MongoDB's TTL functionality automatically removes expired documents based on date field values, ensuring optimal storage utilization while maintaining query performance and operational simplicity.

The Traditional Data Expiration Challenge

Conventional relational database data lifecycle management faces significant operational limitations:

-- Traditional PostgreSQL data expiration - manual cleanup with complex maintenance overhead

-- Session data management with manual expiration logic
CREATE TABLE user_sessions (
    session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id INTEGER NOT NULL,
    session_token VARCHAR(256) UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP + INTERVAL '24 hours',
    last_activity TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Session metadata
    user_agent TEXT,
    ip_address INET,
    login_method VARCHAR(50),
    session_data JSONB,

    -- Security tracking
    is_active BOOLEAN DEFAULT TRUE,
    invalid_attempts INTEGER DEFAULT 0,
    security_flags TEXT[],

    -- Cleanup tracking
    cleanup_eligible BOOLEAN DEFAULT FALSE,
    cleanup_scheduled TIMESTAMP,

    -- Foreign key constraints
    FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);

-- Audit log table requiring manual retention management
CREATE TABLE audit_logs (
    log_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    event_type VARCHAR(100) NOT NULL,
    user_id INTEGER,

    -- Event details
    resource_type VARCHAR(100),
    resource_id VARCHAR(255),
    action_performed VARCHAR(100),
    event_data JSONB,

    -- Request context
    ip_address INET,
    user_agent TEXT,
    request_id VARCHAR(100),
    session_id VARCHAR(100),

    -- Compliance and retention
    retention_category VARCHAR(50) NOT NULL DEFAULT 'standard',
    retention_expiry TIMESTAMP,
    compliance_flags TEXT[],

    -- Cleanup metadata
    marked_for_deletion BOOLEAN DEFAULT FALSE,
    deletion_scheduled TIMESTAMP,
    deletion_reason TEXT,

    -- Performance indexes
    INDEX idx_audit_event_timestamp (event_timestamp),
    INDEX idx_audit_user_id_timestamp (user_id, event_timestamp),
    INDEX idx_audit_retention_expiry (retention_expiry),
    INDEX idx_audit_cleanup_eligible (marked_for_deletion, deletion_scheduled)
);

-- Complex manual cleanup procedure with performance impact
CREATE OR REPLACE FUNCTION cleanup_expired_sessions()
RETURNS INTEGER AS $$
DECLARE
    cleanup_batch_size INTEGER := 10000;
    total_deleted INTEGER := 0;
    batch_deleted INTEGER;
    cleanup_start TIMESTAMP := CURRENT_TIMESTAMP;
    session_cursor CURSOR FOR 
        SELECT session_id, user_id, expires_at, last_activity
        FROM user_sessions
        WHERE (expires_at < CURRENT_TIMESTAMP OR 
               last_activity < CURRENT_TIMESTAMP - INTERVAL '7 days')
        AND cleanup_eligible = FALSE
        ORDER BY expires_at ASC
        LIMIT cleanup_batch_size;

    session_record RECORD;

BEGIN
    RAISE NOTICE 'Starting session cleanup process at %', cleanup_start;

    -- Mark sessions eligible for cleanup
    UPDATE user_sessions 
    SET cleanup_eligible = TRUE,
        cleanup_scheduled = CURRENT_TIMESTAMP
    WHERE (expires_at < CURRENT_TIMESTAMP OR 
           last_activity < CURRENT_TIMESTAMP - INTERVAL '7 days')
    AND cleanup_eligible = FALSE;

    GET DIAGNOSTICS batch_deleted = ROW_COUNT;
    RAISE NOTICE 'Marked % sessions for cleanup', batch_deleted;

    -- Process cleanup in batches to avoid long locks
    FOR session_record IN session_cursor LOOP
        BEGIN
            -- Log session termination for audit
            INSERT INTO audit_logs (
                event_type, user_id, resource_type, resource_id,
                action_performed, event_data, retention_category
            ) VALUES (
                'session_expired', session_record.user_id, 'session', 
                session_record.session_id::text, 'automatic_cleanup',
                jsonb_build_object(
                    'expired_at', session_record.expires_at,
                    'last_activity', session_record.last_activity,
                    'cleanup_reason', 'ttl_expiration',
                    'cleanup_timestamp', CURRENT_TIMESTAMP
                ),
                'session_management'
            );

            -- Remove expired session
            DELETE FROM user_sessions 
            WHERE session_id = session_record.session_id;

            total_deleted := total_deleted + 1;

            -- Commit periodically to avoid long transactions
            IF total_deleted % 1000 = 0 THEN
                COMMIT;
                RAISE NOTICE 'Progress: % sessions cleaned up', total_deleted;
            END IF;

        EXCEPTION
            WHEN foreign_key_violation THEN
                RAISE WARNING 'Foreign key constraint prevents deletion of session %', 
                    session_record.session_id;
            WHEN OTHERS THEN
                RAISE WARNING 'Error cleaning up session %: %', 
                    session_record.session_id, SQLERRM;
        END;
    END LOOP;

    -- Update cleanup statistics
    INSERT INTO cleanup_statistics (
        cleanup_type, cleanup_timestamp, records_processed,
        processing_duration, success_count, error_count
    ) VALUES (
        'session_cleanup', cleanup_start, total_deleted,
        CURRENT_TIMESTAMP - cleanup_start, total_deleted, 0
    );

    RAISE NOTICE 'Session cleanup completed: % sessions removed in %',
        total_deleted, CURRENT_TIMESTAMP - cleanup_start;

    RETURN total_deleted;
END;
$$ LANGUAGE plpgsql;

-- Audit log retention with complex policy management
CREATE OR REPLACE FUNCTION manage_audit_log_retention()
RETURNS INTEGER AS $$
DECLARE
    retention_policies RECORD;
    policy_cursor CURSOR FOR
        SELECT retention_category, retention_days, compliance_required
        FROM retention_policy_config
        WHERE active = TRUE;

    total_processed INTEGER := 0;
    category_processed INTEGER;
    retention_threshold TIMESTAMP;

BEGIN
    RAISE NOTICE 'Starting audit log retention management...';

    -- Process each retention policy
    FOR retention_policies IN policy_cursor LOOP
        retention_threshold := CURRENT_TIMESTAMP - (retention_policies.retention_days || ' days')::INTERVAL;

        -- Mark logs for deletion based on retention policy
        UPDATE audit_logs 
        SET marked_for_deletion = TRUE,
            deletion_scheduled = CURRENT_TIMESTAMP + INTERVAL '24 hours',
            deletion_reason = 'retention_policy_' || retention_policies.retention_category
        WHERE retention_category = retention_policies.retention_category
        AND event_timestamp < retention_threshold
        AND marked_for_deletion = FALSE
        AND (compliance_flags IS NULL OR NOT compliance_flags && ARRAY['litigation_hold', 'investigation_hold']);

        GET DIAGNOSTICS category_processed = ROW_COUNT;
        total_processed := total_processed + category_processed;

        RAISE NOTICE 'Retention policy %: marked % logs for deletion (threshold: %)',
            retention_policies.retention_category, category_processed, retention_threshold;
    END LOOP;

    -- Execute delayed deletion for logs past grace period
    DELETE FROM audit_logs 
    WHERE marked_for_deletion = TRUE 
    AND deletion_scheduled < CURRENT_TIMESTAMP
    AND (compliance_flags IS NULL OR NOT compliance_flags && ARRAY['litigation_hold']);

    GET DIAGNOSTICS category_processed = ROW_COUNT;
    RAISE NOTICE 'Deleted % audit logs past grace period', category_processed;

    RETURN total_processed;
END;
$$ LANGUAGE plpgsql;

-- Complex cache data management with manual expiration
CREATE TABLE application_cache (
    cache_key VARCHAR(500) PRIMARY KEY,
    cache_namespace VARCHAR(100) NOT NULL DEFAULT 'default',
    cache_value JSONB NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL,
    last_accessed TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Cache metadata
    cache_size_bytes INTEGER,
    access_count INTEGER DEFAULT 1,
    cache_tags TEXT[],
    cache_priority INTEGER DEFAULT 5, -- 1 highest, 10 lowest

    -- Cleanup tracking
    cleanup_candidate BOOLEAN DEFAULT FALSE,

    -- Performance optimization indexes
    INDEX idx_cache_expires_at (expires_at),
    INDEX idx_cache_namespace_expires (cache_namespace, expires_at),
    INDEX idx_cache_cleanup_candidate (cleanup_candidate, expires_at)
);

-- Cache cleanup with performance considerations
CREATE OR REPLACE FUNCTION cleanup_expired_cache()
RETURNS INTEGER AS $$
DECLARE
    cleanup_batch_size INTEGER := 5000;
    total_cleaned INTEGER := 0;
    batch_count INTEGER;
    cleanup_rounds INTEGER := 0;
    max_cleanup_rounds INTEGER := 20;

BEGIN
    RAISE NOTICE 'Starting cache cleanup process...';

    WHILE cleanup_rounds < max_cleanup_rounds LOOP
        -- Delete expired cache entries in batches
        DELETE FROM application_cache 
        WHERE cache_key IN (
            SELECT cache_key 
            FROM application_cache
            WHERE expires_at < CURRENT_TIMESTAMP
            ORDER BY expires_at ASC
            LIMIT cleanup_batch_size
        );

        GET DIAGNOSTICS batch_count = ROW_COUNT;

        IF batch_count = 0 THEN
            EXIT; -- No more expired entries
        END IF;

        total_cleaned := total_cleaned + batch_count;
        cleanup_rounds := cleanup_rounds + 1;

        RAISE NOTICE 'Cleanup round %: removed % expired cache entries', 
            cleanup_rounds, batch_count;

        -- Brief pause to avoid overwhelming the system
        PERFORM pg_sleep(0.1);
    END LOOP;

    -- Additional cleanup for low-priority unused cache
    DELETE FROM application_cache 
    WHERE last_accessed < CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND cache_priority >= 8
    AND access_count <= 5;

    GET DIAGNOSTICS batch_count = ROW_COUNT;
    total_cleaned := total_cleaned + batch_count;

    RAISE NOTICE 'Cache cleanup completed: % total entries removed', total_cleaned;

    RETURN total_cleaned;
END;
$$ LANGUAGE plpgsql;

-- Scheduled cleanup job management (requires external cron)
CREATE TABLE cleanup_job_schedule (
    job_name VARCHAR(100) PRIMARY KEY,
    job_function VARCHAR(200) NOT NULL,
    schedule_expression VARCHAR(100) NOT NULL, -- Cron expression
    last_execution TIMESTAMP,
    next_execution TIMESTAMP,
    execution_count INTEGER DEFAULT 0,

    -- Job configuration
    enabled BOOLEAN DEFAULT TRUE,
    max_execution_time INTERVAL DEFAULT '2 hours',
    cleanup_batch_size INTEGER DEFAULT 10000,

    -- Performance tracking
    average_execution_time INTERVAL,
    total_records_processed BIGINT DEFAULT 0,
    last_records_processed INTEGER,

    -- Error handling
    last_error_message TEXT,
    consecutive_failures INTEGER DEFAULT 0,
    max_failures_allowed INTEGER DEFAULT 3
);

-- Insert cleanup job configurations
INSERT INTO cleanup_job_schedule (job_name, job_function, schedule_expression) VALUES
('session_cleanup', 'cleanup_expired_sessions()', '0 */6 * * *'), -- Every 6 hours
('audit_retention', 'manage_audit_log_retention()', '0 2 * * 0'),  -- Weekly at 2 AM
('cache_cleanup', 'cleanup_expired_cache()', '*/30 * * * *'),      -- Every 30 minutes
('temp_file_cleanup', 'cleanup_temporary_files()', '0 1 * * *');   -- Daily at 1 AM

-- Monitor cleanup job performance
WITH cleanup_performance AS (
    SELECT 
        job_name,
        last_execution,
        next_execution,
        execution_count,
        average_execution_time,
        total_records_processed,
        last_records_processed,
        consecutive_failures,

        -- Performance calculations
        CASE 
            WHEN execution_count > 0 AND total_records_processed > 0 THEN
                ROUND(total_records_processed::DECIMAL / execution_count::DECIMAL, 0)
            ELSE 0
        END as avg_records_per_execution,

        -- Health status
        CASE 
            WHEN consecutive_failures >= max_failures_allowed THEN 'failed'
            WHEN consecutive_failures > 0 THEN 'degraded'
            WHEN last_execution < CURRENT_TIMESTAMP - INTERVAL '24 hours' THEN 'overdue'
            ELSE 'healthy'
        END as job_health

    FROM cleanup_job_schedule
    WHERE enabled = TRUE
),

cleanup_recommendations AS (
    SELECT 
        cp.job_name,
        cp.job_health,
        cp.avg_records_per_execution,
        cp.average_execution_time,

        -- Optimization recommendations
        CASE 
            WHEN cp.job_health = 'failed' THEN 'Immediate attention: job failing consistently'
            WHEN cp.average_execution_time > INTERVAL '1 hour' THEN 'Performance issue: execution time too long'
            WHEN cp.avg_records_per_execution > 50000 THEN 'Consider smaller batch sizes to reduce lock contention'
            WHEN cp.consecutive_failures > 0 THEN 'Monitor job execution and error logs'
            ELSE 'Job performing within expected parameters'
        END as recommendation,

        -- Resource impact assessment
        CASE 
            WHEN cp.average_execution_time > INTERVAL '30 minutes' THEN 'high'
            WHEN cp.average_execution_time > INTERVAL '10 minutes' THEN 'medium'
            ELSE 'low'
        END as resource_impact

    FROM cleanup_performance cp
)

-- Generate cleanup management dashboard
SELECT 
    cr.job_name,
    cr.job_health,
    cr.avg_records_per_execution,
    cr.average_execution_time,
    cr.resource_impact,
    cr.recommendation,

    -- Next steps
    CASE cr.job_health
        WHEN 'failed' THEN 'Review error logs and fix underlying issues'
        WHEN 'degraded' THEN 'Monitor next execution and investigate intermittent failures'
        WHEN 'overdue' THEN 'Check job scheduler and execute manually if needed'
        ELSE 'Continue monitoring performance trends'
    END as next_actions,

    -- Operational guidance
    CASE 
        WHEN cr.resource_impact = 'high' THEN 'Schedule during low-traffic periods'
        WHEN cr.avg_records_per_execution > 100000 THEN 'Consider parallel processing'
        ELSE 'Current execution strategy is appropriate'
    END as operational_guidance

FROM cleanup_recommendations cr
ORDER BY 
    CASE cr.job_health
        WHEN 'failed' THEN 1
        WHEN 'degraded' THEN 2
        WHEN 'overdue' THEN 3
        ELSE 4
    END,
    cr.resource_impact DESC;

-- Problems with traditional data expiration management:
-- 1. Complex manual cleanup procedures requiring extensive procedural code and maintenance
-- 2. Performance impact from batch deletion operations affecting application responsiveness
-- 3. Resource-intensive cleanup jobs requiring careful scheduling and monitoring  
-- 4. Risk of data inconsistency during cleanup operations due to foreign key constraints
-- 5. Limited scalability for high-volume data expiration scenarios
-- 6. Manual configuration and maintenance of retention policies across different data types
-- 7. Complex error handling and recovery procedures for failed cleanup operations
-- 8. Difficulty coordinating cleanup across multiple tables with interdependencies
-- 9. Operational overhead of monitoring and maintaining cleanup job performance
-- 10. Risk of storage bloat if cleanup jobs fail or are disabled

MongoDB provides native TTL functionality with automatic data expiration and lifecycle management:

// MongoDB TTL Collections - Native automatic data lifecycle management and expiration
const { MongoClient, ObjectId } = require('mongodb');

// Advanced MongoDB TTL Collection Manager with Enterprise Data Lifecycle Management
class MongoDBTTLManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'enterprise_data');

    this.config = {
      // TTL Configuration
      defaultTTLSeconds: config.defaultTTLSeconds || 86400, // 24 hours
      enableTTLMonitoring: config.enableTTLMonitoring !== false,
      enableExpirationAlerts: config.enableExpirationAlerts !== false,

      // Data lifecycle policies
      retentionPolicies: config.retentionPolicies || {},
      complianceMode: config.complianceMode || false,
      enableDataArchiving: config.enableDataArchiving || false,

      // Performance optimization
      enableBackgroundExpiration: config.enableBackgroundExpiration !== false,
      expirationBatchSize: config.expirationBatchSize || 1000,
      enableExpirationMetrics: config.enableExpirationMetrics !== false
    };

    // TTL collection management
    this.ttlCollections = new Map();
    this.retentionPolicies = new Map();
    this.expirationMetrics = new Map();

    this.initializeTTLManager();
  }

  async initializeTTLManager() {
    console.log('Initializing MongoDB TTL Collection Manager...');

    try {
      // Setup TTL collections for different data types
      await this.setupSessionTTLCollection();
      await this.setupAuditLogTTLCollection();
      await this.setupCacheTTLCollection();
      await this.setupTemporaryDataTTLCollection();
      await this.setupEventTTLCollection();

      // Initialize monitoring and metrics
      if (this.config.enableTTLMonitoring) {
        await this.initializeTTLMonitoring();
      }

      // Setup data lifecycle policies
      await this.configureDataLifecyclePolicies();

      console.log('MongoDB TTL Collection Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing TTL manager:', error);
      throw error;
    }
  }

  async setupSessionTTLCollection() {
    console.log('Setting up session TTL collection...');

    try {
      const sessionCollection = this.db.collection('user_sessions');

      // Create TTL index on expiresAt field (24 hours)
      await sessionCollection.createIndex(
        { expiresAt: 1 }, 
        { 
          expireAfterSeconds: 0,  // Expire based on document date field value
          background: true,
          name: 'session_ttl_index'
        }
      );

      // Additional indexes for performance
      await sessionCollection.createIndexes([
        { key: { userId: 1, expiresAt: 1 }, background: true },
        { key: { sessionToken: 1 }, unique: true, background: true },
        { key: { lastActivity: -1 }, background: true },
        { key: { ipAddress: 1, createdAt: -1 }, background: true }
      ]);

      // Store TTL configuration
      this.ttlCollections.set('user_sessions', {
        collection: sessionCollection,
        ttlField: 'expiresAt',
        ttlSeconds: 0, // Document-controlled expiration
        retentionPolicy: 'session_management',
        complianceLevel: 'standard'
      });

      console.log('Session TTL collection configured with automatic expiration');

    } catch (error) {
      console.error('Error setting up session TTL collection:', error);
      throw error;
    }
  }

  async setupAuditLogTTLCollection() {
    console.log('Setting up audit log TTL collection with compliance requirements...');

    try {
      const auditCollection = this.db.collection('audit_logs');

      // Create TTL index for standard audit logs (90 days retention)
      await auditCollection.createIndex(
        { retentionExpiry: 1 },
        {
          expireAfterSeconds: 0, // Document-controlled expiration
          background: true,
          name: 'audit_retention_ttl_index',
          partialFilterExpression: {
            complianceHold: { $ne: true },
            retentionCategory: { $nin: ['critical', 'legal_hold'] }
          }
        }
      );

      // Performance indexes for audit queries
      await auditCollection.createIndexes([
        { key: { eventTimestamp: -1 }, background: true },
        { key: { userId: 1, eventTimestamp: -1 }, background: true },
        { key: { eventType: 1, eventTimestamp: -1 }, background: true },
        { key: { retentionCategory: 1, retentionExpiry: 1 }, background: true },
        { key: { complianceHold: 1 }, sparse: true, background: true }
      ]);

      this.ttlCollections.set('audit_logs', {
        collection: auditCollection,
        ttlField: 'retentionExpiry',
        ttlSeconds: 0,
        retentionPolicy: 'audit_compliance',
        complianceLevel: 'high',
        specialHandling: ['critical', 'legal_hold']
      });

      console.log('Audit log TTL collection configured with compliance controls');

    } catch (error) {
      console.error('Error setting up audit log TTL collection:', error);
      throw error;
    }
  }

  async setupCacheTTLCollection() {
    console.log('Setting up cache TTL collection for automatic cleanup...');

    try {
      const cacheCollection = this.db.collection('application_cache');

      // Create TTL index for cache expiration (immediate expiration when expired)
      await cacheCollection.createIndex(
        { expiresAt: 1 },
        {
          expireAfterSeconds: 60, // 1 minute grace period for cache cleanup
          background: true,
          name: 'cache_ttl_index'
        }
      );

      // Performance indexes for cache operations
      await cacheCollection.createIndexes([
        { key: { cacheKey: 1 }, unique: true, background: true },
        { key: { cacheNamespace: 1, cacheKey: 1 }, background: true },
        { key: { lastAccessed: -1 }, background: true },
        { key: { cachePriority: 1, expiresAt: 1 }, background: true }
      ]);

      this.ttlCollections.set('application_cache', {
        collection: cacheCollection,
        ttlField: 'expiresAt',
        ttlSeconds: 60, // Short grace period
        retentionPolicy: 'cache_management',
        complianceLevel: 'low'
      });

      console.log('Cache TTL collection configured for optimal performance');

    } catch (error) {
      console.error('Error setting up cache TTL collection:', error);
      throw error;
    }
  }

  async setupTemporaryDataTTLCollection() {
    console.log('Setting up temporary data TTL collection...');

    try {
      const tempCollection = this.db.collection('temporary_data');

      // Create TTL index for temporary data (1 hour default)
      await tempCollection.createIndex(
        { createdAt: 1 },
        {
          expireAfterSeconds: 3600, // 1 hour
          background: true,
          name: 'temp_data_ttl_index'
        }
      );

      // Additional indexes for temporary data queries
      await tempCollection.createIndexes([
        { key: { dataType: 1, createdAt: -1 }, background: true },
        { key: { userId: 1, dataType: 1 }, background: true },
        { key: { sessionId: 1 }, background: true, sparse: true }
      ]);

      this.ttlCollections.set('temporary_data', {
        collection: tempCollection,
        ttlField: 'createdAt',
        ttlSeconds: 3600,
        retentionPolicy: 'temporary_storage',
        complianceLevel: 'low'
      });

      console.log('Temporary data TTL collection configured');

    } catch (error) {
      console.error('Error setting up temporary data TTL collection:', error);
      throw error;
    }
  }

  async setupEventTTLCollection() {
    console.log('Setting up event TTL collection with tiered retention...');

    try {
      const eventCollection = this.db.collection('application_events');

      // Create compound TTL index with conditional expiration
      await eventCollection.createIndex(
        { retentionTier: 1, expiresAt: 1 },
        {
          expireAfterSeconds: 0, // Document-controlled
          background: true,
          name: 'event_tiered_ttl_index'
        }
      );

      // Performance indexes for event queries
      await eventCollection.createIndexes([
        { key: { eventTimestamp: -1 }, background: true },
        { key: { eventType: 1, eventTimestamp: -1 }, background: true },
        { key: { userId: 1, eventTimestamp: -1 }, background: true },
        { key: { retentionTier: 1, eventTimestamp: -1 }, background: true }
      ]);

      this.ttlCollections.set('application_events', {
        collection: eventCollection,
        ttlField: 'expiresAt',
        ttlSeconds: 0,
        retentionPolicy: 'tiered_retention',
        complianceLevel: 'medium',
        tiers: {
          'hot': 86400 * 7,    // 7 days
          'warm': 86400 * 30,  // 30 days  
          'cold': 86400 * 90   // 90 days
        }
      });

      console.log('Event TTL collection configured with tiered retention');

    } catch (error) {
      console.error('Error setting up event TTL collection:', error);
      throw error;
    }
  }

  async createSessionWithTTL(sessionData) {
    console.log('Creating user session with automatic TTL expiration...');

    try {
      const sessionCollection = this.db.collection('user_sessions');
      const expirationTime = new Date(Date.now() + (24 * 60 * 60 * 1000)); // 24 hours

      const session = {
        _id: new ObjectId(),
        sessionToken: sessionData.sessionToken,
        userId: sessionData.userId,
        createdAt: new Date(),
        expiresAt: expirationTime, // TTL expiration field
        lastActivity: new Date(),

        // Session metadata
        userAgent: sessionData.userAgent,
        ipAddress: sessionData.ipAddress,
        loginMethod: sessionData.loginMethod || 'password',
        sessionData: sessionData.additionalData || {},

        // Security tracking
        isActive: true,
        invalidAttempts: 0,
        securityFlags: [],

        // TTL metadata
        ttlManaged: true,
        retentionPolicy: 'session_management'
      };

      const result = await sessionCollection.insertOne(session);

      // Update session metrics
      await this.updateTTLMetrics('user_sessions', 'created', session);

      console.log(`Session created with TTL expiration: ${result.insertedId}`);

      return {
        sessionId: result.insertedId,
        expiresAt: expirationTime,
        ttlEnabled: true
      };

    } catch (error) {
      console.error('Error creating session with TTL:', error);
      throw error;
    }
  }

  async createAuditLogWithRetention(auditData) {
    console.log('Creating audit log with compliance-driven retention...');

    try {
      const auditCollection = this.db.collection('audit_logs');

      // Calculate retention expiry based on data classification
      const retentionDays = this.calculateRetentionPeriod(auditData.retentionCategory);
      const retentionExpiry = new Date(Date.now() + (retentionDays * 24 * 60 * 60 * 1000));

      const auditLog = {
        _id: new ObjectId(),
        eventTimestamp: new Date(),
        eventType: auditData.eventType,
        userId: auditData.userId,

        // Event details
        resourceType: auditData.resourceType,
        resourceId: auditData.resourceId,
        actionPerformed: auditData.action,
        eventData: auditData.eventData || {},

        // Request context
        ipAddress: auditData.ipAddress,
        userAgent: auditData.userAgent,
        requestId: auditData.requestId,
        sessionId: auditData.sessionId,

        // Compliance and retention
        retentionCategory: auditData.retentionCategory || 'standard',
        retentionExpiry: retentionExpiry, // TTL expiration field
        complianceFlags: auditData.complianceFlags || [],
        complianceHold: auditData.complianceHold || false,

        // TTL metadata
        ttlManaged: !auditData.complianceHold,
        retentionDays: retentionDays,
        dataClassification: auditData.dataClassification || 'internal'
      };

      const result = await auditCollection.insertOne(auditLog);

      // Update audit metrics
      await this.updateTTLMetrics('audit_logs', 'created', auditLog);

      console.log(`Audit log created with ${retentionDays}-day retention: ${result.insertedId}`);

      return {
        auditId: result.insertedId,
        retentionExpiry: retentionExpiry,
        retentionDays: retentionDays,
        ttlEnabled: !auditData.complianceHold
      };

    } catch (error) {
      console.error('Error creating audit log with retention:', error);
      throw error;
    }
  }

  async createCacheEntryWithTTL(cacheData) {
    console.log('Creating cache entry with automatic expiration...');

    try {
      const cacheCollection = this.db.collection('application_cache');

      // Calculate cache expiration based on cache type and priority
      const ttlSeconds = this.calculateCacheTTL(cacheData.cacheType, cacheData.priority);
      const expirationTime = new Date(Date.now() + (ttlSeconds * 1000));

      const cacheEntry = {
        _id: new ObjectId(),
        cacheKey: cacheData.key,
        cacheNamespace: cacheData.namespace || 'default',
        cacheValue: cacheData.value,
        createdAt: new Date(),
        expiresAt: expirationTime, // TTL expiration field
        lastAccessed: new Date(),

        // Cache metadata
        cacheType: cacheData.cacheType || 'general',
        cacheSizeBytes: JSON.stringify(cacheData.value).length,
        accessCount: 1,
        cacheTags: cacheData.tags || [],
        cachePriority: cacheData.priority || 5,

        // TTL configuration
        ttlSeconds: ttlSeconds,
        ttlManaged: true
      };

      // Use upsert to handle cache key uniqueness
      const result = await cacheCollection.replaceOne(
        { cacheKey: cacheData.key },
        cacheEntry,
        { upsert: true }
      );

      // Update cache metrics
      await this.updateTTLMetrics('application_cache', 'created', cacheEntry);

      console.log(`Cache entry created with ${ttlSeconds}s TTL: ${cacheData.key}`);

      return {
        cacheKey: cacheData.key,
        expiresAt: expirationTime,
        ttlSeconds: ttlSeconds,
        upserted: result.upsertedCount > 0
      };

    } catch (error) {
      console.error('Error creating cache entry with TTL:', error);
      throw error;
    }
  }

  async createEventWithTieredRetention(eventData) {
    console.log('Creating event with tiered retention policy...');

    try {
      const eventCollection = this.db.collection('application_events');

      // Determine retention tier based on event importance
      const retentionTier = this.determineEventRetentionTier(eventData);
      const ttlConfig = this.ttlCollections.get('application_events').tiers;
      const ttlSeconds = ttlConfig[retentionTier];
      const expirationTime = new Date(Date.now() + (ttlSeconds * 1000));

      const event = {
        _id: new ObjectId(),
        eventTimestamp: new Date(),
        eventType: eventData.type,
        userId: eventData.userId,

        // Event payload
        eventData: eventData.data || {},
        eventSource: eventData.source || 'application',
        eventSeverity: eventData.severity || 'info',

        // Context information
        sessionId: eventData.sessionId,
        requestId: eventData.requestId,
        correlationId: eventData.correlationId,

        // Tiered retention
        retentionTier: retentionTier,
        expiresAt: expirationTime, // TTL expiration field
        retentionDays: Math.floor(ttlSeconds / 86400),

        // Event metadata
        eventVersion: eventData.version || '1.0',
        processingRequirements: eventData.processing || [],

        // TTL management
        ttlManaged: true,
        ttlTier: retentionTier
      };

      const result = await eventCollection.insertOne(event);

      // Update event metrics
      await this.updateTTLMetrics('application_events', 'created', event);

      console.log(`Event created with ${retentionTier} tier retention: ${result.insertedId}`);

      return {
        eventId: result.insertedId,
        retentionTier: retentionTier,
        expiresAt: expirationTime,
        retentionDays: Math.floor(ttlSeconds / 86400)
      };

    } catch (error) {
      console.error('Error creating event with tiered retention:', error);
      throw error;
    }
  }

  async updateTTLConfiguration(collectionName, newTTLSeconds) {
    console.log(`Updating TTL configuration for collection: ${collectionName}`);

    try {
      const collection = this.db.collection(collectionName);
      const ttlConfig = this.ttlCollections.get(collectionName);

      if (!ttlConfig) {
        throw new Error(`TTL configuration not found for collection: ${collectionName}`);
      }

      // Update TTL index
      await collection.dropIndex(ttlConfig.ttlField + '_1');
      await collection.createIndex(
        { [ttlConfig.ttlField]: 1 },
        {
          expireAfterSeconds: newTTLSeconds,
          background: true,
          name: `${ttlConfig.ttlField}_ttl_index`
        }
      );

      // Update configuration
      ttlConfig.ttlSeconds = newTTLSeconds;
      this.ttlCollections.set(collectionName, ttlConfig);

      console.log(`TTL configuration updated: ${collectionName} now expires after ${newTTLSeconds} seconds`);

      return {
        collection: collectionName,
        ttlSeconds: newTTLSeconds,
        updated: true
      };

    } catch (error) {
      console.error(`Error updating TTL configuration for ${collectionName}:`, error);
      throw error;
    }
  }

  // Utility methods for TTL management

  calculateRetentionPeriod(retentionCategory) {
    const retentionPolicies = {
      'session_management': 1,      // 1 day
      'standard': 90,               // 90 days
      'security': 365,              // 1 year
      'financial': 2555,            // 7 years
      'legal': 3650,                // 10 years
      'critical': 7300,             // 20 years
      'permanent': 0                // No expiration
    };

    return retentionPolicies[retentionCategory] || 90;
  }

  calculateCacheTTL(cacheType, priority) {
    const baseTTL = {
      'session': 1800,         // 30 minutes
      'user_data': 3600,       // 1 hour  
      'api_response': 300,     // 5 minutes
      'computed': 7200,        // 2 hours
      'static': 86400          // 24 hours
    };

    const base = baseTTL[cacheType] || 3600;

    // Adjust TTL based on priority (1 = highest, 10 = lowest)
    const priorityMultiplier = Math.max(0.5, Math.min(2.0, (11 - priority) / 5));

    return Math.floor(base * priorityMultiplier);
  }

  determineEventRetentionTier(eventData) {
    const eventType = eventData.type;
    const severity = eventData.severity || 'info';
    const importance = eventData.importance || 'standard';

    // Critical events get longest retention
    if (severity === 'critical' || importance === 'high') {
      return 'cold'; // 90 days
    }

    // Security and audit events get medium retention  
    if (eventType.includes('security') || eventType.includes('audit')) {
      return 'warm'; // 30 days
    }

    // Regular application events get short retention
    return 'hot'; // 7 days
  }

  async updateTTLMetrics(collectionName, operation, document) {
    if (!this.config.enableExpirationMetrics) return;

    const metrics = this.expirationMetrics.get(collectionName) || {
      created: 0,
      expired: 0,
      totalSize: 0,
      lastUpdated: new Date()
    };

    if (operation === 'created') {
      metrics.created++;
      metrics.totalSize += JSON.stringify(document).length;
    } else if (operation === 'expired') {
      metrics.expired++;
    }

    metrics.lastUpdated = new Date();
    this.expirationMetrics.set(collectionName, metrics);
  }

  async getTTLStatus() {
    console.log('Retrieving TTL status for all managed collections...');

    const status = {
      collections: {},
      summary: {
        totalCollections: this.ttlCollections.size,
        totalDocuments: 0,
        upcomingExpirations: 0,
        storageOptimization: 0
      }
    };

    for (const [collectionName, config] of this.ttlCollections) {
      try {
        const collection = config.collection;
        const stats = await collection.stats();

        // Count documents expiring soon (next 24 hours)
        const upcoming = await collection.countDocuments({
          [config.ttlField]: {
            $lte: new Date(Date.now() + 86400000) // 24 hours
          }
        });

        status.collections[collectionName] = {
          ttlField: config.ttlField,
          ttlSeconds: config.ttlSeconds,
          retentionPolicy: config.retentionPolicy,
          documentCount: stats.count,
          storageSize: stats.storageSize,
          upcomingExpirations: upcoming,
          lastChecked: new Date()
        };

        status.summary.totalDocuments += stats.count;
        status.summary.upcomingExpirations += upcoming;
        status.summary.storageOptimization += stats.storageSize;

      } catch (error) {
        console.error(`Error getting TTL status for ${collectionName}:`, error);
        status.collections[collectionName] = {
          error: error.message,
          lastChecked: new Date()
        };
      }
    }

    return status;
  }

  async getExpirationMetrics() {
    console.log('Retrieving comprehensive expiration metrics...');

    const metrics = {
      timestamp: new Date(),
      collections: {},
      summary: {
        totalCreated: 0,
        totalExpired: 0,
        storageReclaimed: 0,
        expirationEfficiency: 0
      }
    };

    for (const [collectionName, collectionMetrics] of this.expirationMetrics) {
      metrics.collections[collectionName] = {
        ...collectionMetrics,
        expirationRate: collectionMetrics.expired / Math.max(collectionMetrics.created, 1)
      };

      metrics.summary.totalCreated += collectionMetrics.created;
      metrics.summary.totalExpired += collectionMetrics.expired;
    }

    metrics.summary.expirationEfficiency = 
      metrics.summary.totalExpired / Math.max(metrics.summary.totalCreated, 1);

    return metrics;
  }

  async cleanup() {
    console.log('Cleaning up TTL Manager resources...');

    // Clear monitoring intervals and cleanup resources
    this.ttlCollections.clear();
    this.retentionPolicies.clear();
    this.expirationMetrics.clear();

    console.log('TTL Manager cleanup completed');
  }
}

// Example usage for enterprise data lifecycle management
async function demonstrateEnterpriseDataLifecycle() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const ttlManager = new MongoDBTTLManager(client, {
    database: 'enterprise_lifecycle',
    enableTTLMonitoring: true,
    enableExpirationMetrics: true,
    complianceMode: true
  });

  try {
    // Create session with automatic 24-hour expiration
    const session = await ttlManager.createSessionWithTTL({
      sessionToken: 'session_' + Date.now(),
      userId: 'user_12345',
      userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
      ipAddress: '192.168.1.100',
      loginMethod: 'password'
    });

    // Create audit log with compliance-driven retention
    const auditLog = await ttlManager.createAuditLogWithRetention({
      eventType: 'user_login',
      userId: 'user_12345',
      resourceType: 'authentication',
      action: 'login_success',
      retentionCategory: 'security', // 365 days retention
      ipAddress: '192.168.1.100',
      eventData: {
        loginMethod: 'password',
        mfaUsed: true,
        riskScore: 'low'
      }
    });

    // Create cache entry with priority-based TTL
    const cacheEntry = await ttlManager.createCacheEntryWithTTL({
      key: 'user_preferences_12345',
      namespace: 'user_data',
      value: {
        theme: 'dark',
        language: 'en',
        timezone: 'UTC',
        notifications: true
      },
      cacheType: 'user_data',
      priority: 3, // High priority = longer TTL
      tags: ['preferences', 'user_settings']
    });

    // Create event with tiered retention
    const event = await ttlManager.createEventWithTieredRetention({
      type: 'page_view',
      userId: 'user_12345',
      severity: 'info',
      data: {
        page: '/dashboard',
        duration: 1500,
        interactions: 5
      },
      source: 'web_app',
      sessionId: session.sessionId.toString()
    });

    // Get TTL status and metrics
    const ttlStatus = await ttlManager.getTTLStatus();
    const expirationMetrics = await ttlManager.getExpirationMetrics();

    console.log('Enterprise Data Lifecycle Management Results:');
    console.log('Session:', session);
    console.log('Audit Log:', auditLog);
    console.log('Cache Entry:', cacheEntry);
    console.log('Event:', event);
    console.log('TTL Status:', JSON.stringify(ttlStatus, null, 2));
    console.log('Expiration Metrics:', JSON.stringify(expirationMetrics, null, 2));

    return {
      session,
      auditLog,
      cacheEntry,
      event,
      ttlStatus,
      expirationMetrics
    };

  } catch (error) {
    console.error('Error demonstrating enterprise data lifecycle:', error);
    throw error;
  } finally {
    await ttlManager.cleanup();
    await client.close();
  }
}

// Benefits of MongoDB TTL Collections:
// - Native automatic data expiration eliminates complex manual cleanup procedures
// - Document-level TTL control with flexible expiration policies based on business requirements
// - Zero performance impact on application operations with background expiration processing
// - Compliance-friendly retention management with audit trails and legal hold capabilities  
// - Intelligent storage optimization with automatic document removal and space reclamation
// - Scalable data lifecycle management that handles high-volume data expiration scenarios
// - Enterprise-grade monitoring and metrics for data retention and compliance reporting
// - Seamless integration with MongoDB's document model and indexing capabilities

module.exports = {
  MongoDBTTLManager,
  demonstrateEnterpriseDataLifecycle
};

SQL-Style TTL Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB TTL collections and data lifecycle management:

-- QueryLeaf TTL collections with SQL-familiar data lifecycle management syntax

-- Configure TTL collections and expiration policies
SET ttl_monitoring_enabled = true;
SET ttl_expiration_alerts = true;
SET default_ttl_seconds = 86400; -- 24 hours
SET enable_compliance_mode = true;
SET enable_data_archiving = true;

-- Create TTL-managed collections with expiration policies
WITH ttl_collection_configuration AS (
  SELECT 
    -- Collection TTL configurations
    'user_sessions' as collection_name,
    'expiresAt' as ttl_field,
    0 as ttl_seconds, -- Document-controlled expiration
    'session_management' as retention_policy,
    24 * 3600 as default_session_ttl_seconds,

    -- Index configuration
    JSON_BUILD_OBJECT(
      'ttl_index', JSON_BUILD_OBJECT(
        'field', 'expiresAt',
        'expireAfterSeconds', 0,
        'background', true
      ),
      'performance_indexes', ARRAY[
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('userId', 1, 'expiresAt', 1)),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('sessionToken', 1), 'unique', true),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('lastActivity', -1))
      ]
    ) as index_configuration

  UNION ALL

  SELECT 
    'audit_logs' as collection_name,
    'retentionExpiry' as ttl_field,
    0 as ttl_seconds, -- Document-controlled with compliance
    'audit_compliance' as retention_policy,
    90 * 24 * 3600 as default_audit_ttl_seconds,

    JSON_BUILD_OBJECT(
      'ttl_index', JSON_BUILD_OBJECT(
        'field', 'retentionExpiry',
        'expireAfterSeconds', 0,
        'background', true,
        'partial_filter', JSON_BUILD_OBJECT(
          'complianceHold', JSON_BUILD_OBJECT('$ne', true),
          'retentionCategory', JSON_BUILD_OBJECT('$nin', ARRAY['critical', 'legal_hold'])
        )
      ),
      'performance_indexes', ARRAY[
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('eventTimestamp', -1)),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('userId', 1, 'eventTimestamp', -1)),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('retentionCategory', 1, 'retentionExpiry', 1))
      ]
    ) as index_configuration

  UNION ALL

  SELECT 
    'application_cache' as collection_name,
    'expiresAt' as ttl_field,
    60 as ttl_seconds, -- 1 minute grace period
    'cache_management' as retention_policy,
    3600 as default_cache_ttl_seconds, -- 1 hour

    JSON_BUILD_OBJECT(
      'ttl_index', JSON_BUILD_OBJECT(
        'field', 'expiresAt',
        'expireAfterSeconds', 60,
        'background', true
      ),
      'performance_indexes', ARRAY[
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('cacheKey', 1), 'unique', true),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('cacheNamespace', 1, 'cacheKey', 1)),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('cachePriority', 1, 'expiresAt', 1))
      ]
    ) as index_configuration
),

-- Data retention policy definitions
retention_policy_definitions AS (
  SELECT 
    policy_name,
    retention_days,
    compliance_level,
    auto_expiration,
    archive_before_expiration,
    legal_hold_exempt,

    -- TTL calculation
    retention_days * 24 * 3600 as retention_seconds,

    -- Policy rules
    CASE policy_name
      WHEN 'session_management' THEN 'Expire user sessions after inactivity period'
      WHEN 'audit_compliance' THEN 'Retain audit logs per compliance requirements'
      WHEN 'cache_management' THEN 'Optimize cache storage with automatic cleanup'
      WHEN 'temporary_storage' THEN 'Remove temporary data after processing'
      WHEN 'event_analytics' THEN 'Tiered retention for application events'
    END as policy_description,

    -- Compliance requirements
    CASE compliance_level
      WHEN 'high' THEN ARRAY['audit_trail', 'legal_hold_support', 'data_classification']
      WHEN 'medium' THEN ARRAY['audit_trail', 'data_classification'] 
      ELSE ARRAY['basic_logging']
    END as compliance_requirements

  FROM (VALUES
    ('session_management', 1, 'medium', true, false, true),
    ('audit_compliance', 90, 'high', true, true, false),
    ('security_logs', 365, 'high', true, true, false),
    ('cache_management', 0, 'low', true, false, true), -- Immediate expiration
    ('temporary_storage', 1, 'low', true, false, true),
    ('event_analytics', 30, 'medium', true, false, true),
    ('financial_records', 2555, 'critical', false, true, false), -- 7 years
    ('legal_documents', 3650, 'critical', false, true, false)    -- 10 years
  ) AS policies(policy_name, retention_days, compliance_level, auto_expiration, archive_before_expiration, legal_hold_exempt)
),

-- Create session data with automatic TTL expiration
session_ttl_operations AS (
  INSERT INTO user_sessions_ttl
  SELECT 
    GENERATE_UUID() as session_id,
    'user_' || generate_series(1, 1000) as user_id,
    'session_token_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) || '_' || generate_series(1, 1000) as session_token,
    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP + INTERVAL '24 hours' as expires_at, -- TTL expiration field
    CURRENT_TIMESTAMP as last_activity,

    -- Session metadata
    'Mozilla/5.0 (compatible; Enterprise App)' as user_agent,
    ('192.168.1.' || (1 + random() * 254)::int)::inet as ip_address,
    'password' as login_method,
    JSON_BUILD_OBJECT(
      'preferences', JSON_BUILD_OBJECT('theme', 'dark', 'language', 'en'),
      'permissions', ARRAY['read', 'write'],
      'mfa_verified', true
    ) as session_data,

    -- Security and TTL metadata
    true as is_active,
    0 as invalid_attempts,
    ARRAY[]::text[] as security_flags,
    true as ttl_managed,
    'session_management' as retention_policy
  RETURNING session_id, expires_at
),

-- Create audit logs with compliance-driven TTL
audit_log_ttl_operations AS (
  INSERT INTO audit_logs_ttl
  SELECT 
    GENERATE_UUID() as log_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '30 days') as event_timestamp,

    -- Event details
    (ARRAY['user_login', 'data_access', 'permission_change', 'security_event', 'system_action'])
      [1 + floor(random() * 5)] as event_type,
    'user_' || (1 + floor(random() * 100)) as user_id,
    'resource_' || (1 + floor(random() * 500)) as resource_id,
    (ARRAY['create', 'read', 'update', 'delete', 'execute'])
      [1 + floor(random() * 5)] as action_performed,

    -- Compliance and retention
    (ARRAY['standard', 'security', 'financial', 'legal'])
      [1 + floor(random() * 4)] as retention_category,

    -- Calculate retention expiry based on category
    CASE retention_category
      WHEN 'standard' THEN CURRENT_TIMESTAMP + INTERVAL '90 days'
      WHEN 'security' THEN CURRENT_TIMESTAMP + INTERVAL '365 days'  
      WHEN 'financial' THEN CURRENT_TIMESTAMP + INTERVAL '2555 days' -- 7 years
      WHEN 'legal' THEN CURRENT_TIMESTAMP + INTERVAL '3650 days'     -- 10 years
    END as retention_expiry, -- TTL expiration field

    -- Compliance flags and controls
    CASE WHEN random() < 0.1 THEN ARRAY['sensitive_data'] ELSE ARRAY[]::text[] END as compliance_flags,
    CASE WHEN random() < 0.05 THEN true ELSE false END as compliance_hold, -- Prevents TTL expiration

    -- Event data and context
    JSON_BUILD_OBJECT(
      'user_agent', 'Mozilla/5.0 (Enterprise Browser)',
      'ip_address', '192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254)),
      'request_id', 'req_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),
      'session_duration', floor(random() * 3600),
      'data_size', floor(random() * 10000)
    ) as event_data,

    -- TTL management metadata
    CASE WHEN compliance_hold THEN false ELSE true END as ttl_managed,
    'audit_compliance' as retention_policy_applied
  RETURNING log_id, retention_category, retention_expiry, compliance_hold
),

-- Create cache entries with priority-based TTL
cache_ttl_operations AS (
  INSERT INTO application_cache_ttl
  SELECT 
    'cache_key_' || generate_series(1, 5000) as cache_key,
    (ARRAY['user_data', 'api_responses', 'computed_results', 'session_data', 'static_content'])
      [1 + floor(random() * 5)] as cache_namespace,

    -- Cache value and metadata
    JSON_BUILD_OBJECT(
      'data', 'cached_data_' || generate_series(1, 5000),
      'computed_at', CURRENT_TIMESTAMP,
      'version', '1.0'
    ) as cache_value,

    CURRENT_TIMESTAMP as created_at,

    -- Priority-based TTL calculation
    CASE cache_namespace
      WHEN 'user_data' THEN CURRENT_TIMESTAMP + INTERVAL '1 hour'
      WHEN 'api_responses' THEN CURRENT_TIMESTAMP + INTERVAL '5 minutes'
      WHEN 'computed_results' THEN CURRENT_TIMESTAMP + INTERVAL '2 hours'
      WHEN 'session_data' THEN CURRENT_TIMESTAMP + INTERVAL '30 minutes'
      WHEN 'static_content' THEN CURRENT_TIMESTAMP + INTERVAL '24 hours'
    END as expires_at, -- TTL expiration field

    CURRENT_TIMESTAMP as last_accessed,

    -- Cache optimization metadata
    (1 + floor(random() * 10)) as cache_priority, -- 1 = highest, 10 = lowest
    JSON_LENGTH(cache_value::text) as cache_size_bytes,
    1 as access_count,
    ARRAY['generated', 'optimized'] as cache_tags,
    true as ttl_managed
  RETURNING cache_key, cache_namespace, expires_at
),

-- Monitor TTL operations and expiration patterns
ttl_monitoring_metrics AS (
  SELECT 
    collection_name,
    retention_policy,

    -- Document lifecycle metrics
    COUNT(*) as total_documents,
    COUNT(*) FILTER (WHERE ttl_managed = true) as ttl_managed_documents,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour') as expiring_soon,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '24 hours') as expiring_today,

    -- TTL efficiency analysis
    AVG(EXTRACT(EPOCH FROM (expires_at - created_at))) as avg_ttl_duration_seconds,
    MIN(expires_at) as next_expiration,
    MAX(expires_at) as latest_expiration,

    -- Storage optimization metrics
    SUM(COALESCE(JSON_LENGTH(session_data::text), JSON_LENGTH(cache_value::text), JSON_LENGTH(event_data::text), 0)) as total_storage_bytes,
    AVG(COALESCE(JSON_LENGTH(session_data::text), JSON_LENGTH(cache_value::text), JSON_LENGTH(event_data::text), 0)) as avg_document_size_bytes,

    -- Retention policy distribution
    MODE() WITHIN GROUP (ORDER BY retention_policy) as primary_retention_policy,

    -- Compliance tracking
    COUNT(*) FILTER (WHERE compliance_hold = true) as compliance_hold_count,
    COUNT(*) FILTER (WHERE compliance_flags IS NOT NULL AND array_length(compliance_flags, 1) > 0) as compliance_flagged

  FROM (
    -- Union all TTL-managed collections
    SELECT 'user_sessions' as collection_name, retention_policy, ttl_managed, 
           created_at, expires_at, session_data as data_field, 
           NULL::text[] as compliance_flags, false as compliance_hold
    FROM session_ttl_operations

    UNION ALL

    SELECT 'audit_logs' as collection_name, retention_policy_applied as retention_policy, ttl_managed,
           event_timestamp as created_at, retention_expiry as expires_at, event_data as data_field,
           compliance_flags, compliance_hold
    FROM audit_log_ttl_operations

    UNION ALL

    SELECT 'application_cache' as collection_name, 'cache_management' as retention_policy, ttl_managed,
           created_at, expires_at, cache_value as data_field,
           NULL::text[] as compliance_flags, false as compliance_hold
    FROM cache_ttl_operations
  ) combined_ttl_data
  GROUP BY collection_name, retention_policy
),

-- TTL performance and optimization analysis
ttl_optimization_analysis AS (
  SELECT 
    tmm.collection_name,
    tmm.retention_policy,
    tmm.total_documents,
    tmm.ttl_managed_documents,

    -- Expiration timeline
    tmm.expiring_soon,
    tmm.expiring_today,
    tmm.next_expiration,
    tmm.latest_expiration,

    -- Storage and performance metrics
    ROUND(tmm.total_storage_bytes / (1024 * 1024)::decimal, 2) as total_storage_mb,
    ROUND(tmm.avg_document_size_bytes / 1024::decimal, 2) as avg_document_size_kb,
    ROUND(tmm.avg_ttl_duration_seconds / 3600::decimal, 2) as avg_ttl_duration_hours,

    -- TTL efficiency assessment
    CASE 
      WHEN tmm.ttl_managed_documents::decimal / tmm.total_documents > 0.9 THEN 'highly_optimized'
      WHEN tmm.ttl_managed_documents::decimal / tmm.total_documents > 0.7 THEN 'well_optimized'
      WHEN tmm.ttl_managed_documents::decimal / tmm.total_documents > 0.5 THEN 'moderately_optimized'
      ELSE 'needs_optimization'
    END as ttl_optimization_level,

    -- Storage optimization potential
    CASE 
      WHEN tmm.expiring_today > tmm.total_documents * 0.1 THEN 'significant_storage_reclaim_expected'
      WHEN tmm.expiring_today > tmm.total_documents * 0.05 THEN 'moderate_storage_reclaim_expected'  
      WHEN tmm.expiring_today > 0 THEN 'minimal_storage_reclaim_expected'
      ELSE 'no_immediate_storage_reclaim'
    END as storage_optimization_forecast,

    -- Compliance assessment
    CASE 
      WHEN tmm.compliance_hold_count > 0 THEN 'compliance_holds_active'
      WHEN tmm.compliance_flagged > tmm.total_documents * 0.1 THEN 'high_compliance_requirements'
      WHEN tmm.compliance_flagged > 0 THEN 'moderate_compliance_requirements'
      ELSE 'standard_compliance_requirements'
    END as compliance_status,

    -- Operational recommendations
    CASE 
      WHEN tmm.avg_ttl_duration_seconds < 3600 THEN 'Consider longer TTL for performance'
      WHEN tmm.avg_ttl_duration_seconds > 86400 * 30 THEN 'Review long retention periods'
      WHEN tmm.expiring_soon > 1000 THEN 'High expiration volume - monitor performance'
      ELSE 'TTL configuration appropriate'
    END as operational_recommendation

  FROM ttl_monitoring_metrics tmm
),

-- Generate comprehensive TTL management dashboard
ttl_dashboard_comprehensive AS (
  SELECT 
    toa.collection_name,
    toa.retention_policy,

    -- Current status
    toa.total_documents,
    toa.ttl_managed_documents,
    ROUND((toa.ttl_managed_documents::decimal / toa.total_documents::decimal) * 100, 1) as ttl_coverage_percent,

    -- Expiration schedule
    toa.expiring_soon,
    toa.expiring_today,
    TO_CHAR(toa.next_expiration, 'YYYY-MM-DD HH24:MI:SS') as next_expiration_time,

    -- Storage metrics
    toa.total_storage_mb,
    toa.avg_document_size_kb,
    toa.avg_ttl_duration_hours,

    -- Optimization status
    toa.ttl_optimization_level,
    toa.storage_optimization_forecast,
    toa.compliance_status,
    toa.operational_recommendation,

    -- Retention policy details
    rpd.retention_days,
    rpd.compliance_level,
    rpd.auto_expiration,
    rpd.legal_hold_exempt,

    -- Performance projections
    CASE 
      WHEN toa.expiring_today > 0 THEN 
        ROUND((toa.expiring_today * toa.avg_document_size_kb) / 1024, 2)
      ELSE 0
    END as projected_storage_reclaim_mb,

    -- Action priorities
    CASE 
      WHEN toa.ttl_optimization_level = 'needs_optimization' THEN 'high'
      WHEN toa.compliance_status = 'compliance_holds_active' THEN 'high'
      WHEN toa.expiring_soon > 1000 THEN 'medium'
      WHEN toa.storage_optimization_forecast LIKE '%significant%' THEN 'medium'
      ELSE 'low'
    END as action_priority,

    -- Specific action items
    ARRAY[
      CASE WHEN toa.ttl_optimization_level = 'needs_optimization' 
           THEN 'Implement TTL for remaining ' || (toa.total_documents - toa.ttl_managed_documents) || ' documents' END,
      CASE WHEN toa.compliance_status = 'compliance_holds_active'
           THEN 'Review active compliance holds and update retention policies' END,
      CASE WHEN toa.expiring_soon > 1000
           THEN 'Monitor system performance during high-volume expiration period' END,
      CASE WHEN toa.operational_recommendation != 'TTL configuration appropriate'
           THEN toa.operational_recommendation END
    ] as action_items

  FROM ttl_optimization_analysis toa
  LEFT JOIN retention_policy_definitions rpd ON toa.retention_policy = rpd.policy_name
)

-- Final comprehensive TTL management report
SELECT 
  tdc.collection_name,
  tdc.retention_policy,
  tdc.compliance_level,

  -- Current state
  tdc.total_documents,
  tdc.ttl_coverage_percent || '%' as ttl_coverage,
  tdc.total_storage_mb || ' MB' as current_storage,

  -- Expiration schedule
  tdc.expiring_soon as expiring_next_hour,
  tdc.expiring_today as expiring_next_24h,
  tdc.next_expiration_time,

  -- Optimization assessment  
  tdc.ttl_optimization_level,
  tdc.storage_optimization_forecast,
  tdc.projected_storage_reclaim_mb || ' MB' as storage_reclaim_potential,

  -- Operational guidance
  tdc.action_priority,
  tdc.operational_recommendation,
  array_to_string(
    array_remove(tdc.action_items, NULL), 
    '; '
  ) as specific_action_items,

  -- Configuration recommendations
  CASE 
    WHEN tdc.ttl_coverage_percent < 70 THEN 
      'Enable TTL for ' || (100 - tdc.ttl_coverage_percent) || '% of documents to improve storage efficiency'
    WHEN tdc.avg_ttl_duration_hours > 720 THEN  -- 30 days
      'Review extended retention periods for compliance requirements'
    WHEN tdc.projected_storage_reclaim_mb > 100 THEN
      'Significant storage optimization opportunity available'
    ELSE 'TTL configuration optimized for current workload'
  END as configuration_guidance,

  -- Compliance and governance
  tdc.compliance_status,
  CASE 
    WHEN tdc.legal_hold_exempt = false THEN 'Legal hold procedures apply'
    WHEN tdc.auto_expiration = false THEN 'Manual expiration required'
    ELSE 'Automatic expiration enabled'
  END as governance_status,

  -- Performance impact assessment
  CASE 
    WHEN tdc.expiring_soon > 5000 THEN 'Monitor database performance during expiration'
    WHEN tdc.expiring_today > 10000 THEN 'Schedule expiration during low-traffic periods'
    WHEN tdc.total_storage_mb > 1000 THEN 'Storage optimization will improve query performance'
    ELSE 'Minimal performance impact expected'
  END as performance_impact_assessment,

  -- Success metrics
  JSON_BUILD_OBJECT(
    'storage_efficiency', ROUND(tdc.projected_storage_reclaim_mb / NULLIF(tdc.total_storage_mb, 0) * 100, 1),
    'automation_coverage', tdc.ttl_coverage_percent,
    'compliance_alignment', CASE WHEN tdc.compliance_status LIKE '%high%' THEN 'high' ELSE 'standard' END,
    'operational_maturity', tdc.ttl_optimization_level
  ) as success_metrics

FROM ttl_dashboard_comprehensive tdc
ORDER BY 
  CASE tdc.action_priority
    WHEN 'high' THEN 1
    WHEN 'medium' THEN 2
    ELSE 3
  END,
  tdc.total_storage_mb DESC;

-- QueryLeaf provides comprehensive MongoDB TTL capabilities:
-- 1. Native automatic data expiration with SQL-familiar TTL configuration syntax
-- 2. Compliance-driven retention policies with legal hold and audit trail support
-- 3. Intelligent TTL optimization based on data classification and access patterns  
-- 4. Performance monitoring with storage optimization and expiration forecasting
-- 5. Enterprise governance with retention policy management and compliance reporting
-- 6. Scalable data lifecycle management that handles high-volume expiration scenarios
-- 7. Integration with MongoDB's background TTL processing and index optimization
-- 8. SQL-style TTL operations for familiar data lifecycle management workflows
-- 9. Advanced analytics for TTL performance, storage optimization, and compliance tracking
-- 10. Automated recommendations for TTL configuration and data retention optimization

Best Practices for MongoDB TTL Implementation

Enterprise Data Lifecycle Management

Essential practices for implementing TTL collections effectively:

  1. TTL Strategy Design: Plan TTL configurations based on data classification, compliance requirements, and business value
  2. Performance Considerations: Monitor TTL processing impact and optimize index configurations for efficient expiration
  3. Compliance Integration: Implement legal hold capabilities and audit trails for regulated data retention
  4. Storage Optimization: Use TTL to maintain optimal storage utilization while preserving query performance
  5. Monitoring and Alerting: Establish comprehensive monitoring for TTL operations and expiration patterns
  6. Backup Coordination: Ensure backup strategies account for TTL expiration and data lifecycle requirements

Production Deployment and Scalability

Optimize TTL collections for enterprise-scale requirements:

  1. Index Strategy: Design efficient compound indexes that support both TTL expiration and query patterns
  2. Capacity Planning: Plan for TTL processing overhead and storage optimization benefits in capacity models
  3. High Availability: Implement TTL collections across replica sets with consistent expiration behavior
  4. Operational Excellence: Create standardized procedures for TTL configuration, monitoring, and compliance
  5. Integration Patterns: Design application integration patterns that leverage TTL for optimal data lifecycle management
  6. Performance Baselines: Establish performance baselines for TTL operations and storage optimization metrics

Conclusion

MongoDB TTL collections provide comprehensive automatic data lifecycle management that eliminates manual cleanup procedures, ensures compliance-driven retention, and maintains optimal storage utilization through intelligent document expiration. The native TTL functionality integrates seamlessly with MongoDB's document model and indexing capabilities to deliver enterprise-grade data lifecycle management.

Key MongoDB TTL Collection benefits include:

  • Automatic Expiration: Native document expiration eliminates manual cleanup procedures and operational overhead
  • Flexible Policies: Document-level and collection-level TTL control with compliance-driven retention management
  • Zero Performance Impact: Background expiration processing with no impact on application performance
  • Storage Optimization: Automatic storage reclamation and space optimization through intelligent document removal
  • Enterprise Compliance: Legal hold capabilities and audit trails for regulated data retention requirements
  • SQL Accessibility: Familiar TTL management operations through SQL-style syntax and configuration

Whether you're managing session data, audit logs, cache entries, or any time-sensitive information, MongoDB TTL collections with QueryLeaf's familiar SQL interface provide the foundation for scalable, compliant, and efficient data lifecycle management.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB TTL collections while providing SQL-familiar syntax for data lifecycle management, retention policy configuration, and expiration monitoring. Advanced TTL patterns, compliance controls, and storage optimization strategies are seamlessly accessible through familiar SQL constructs, making sophisticated data lifecycle management both powerful and approachable for SQL-oriented teams.

The combination of MongoDB's intelligent TTL capabilities with SQL-style lifecycle management makes it an ideal platform for applications requiring both automated data expiration and familiar operational patterns, ensuring your data lifecycle strategies scale efficiently while maintaining compliance and operational excellence.

MongoDB Connection Pooling and Concurrency Management: High-Performance Database Scaling and Enterprise Connection Optimization

Modern applications demand efficient database connection management to handle varying workloads, concurrent users, and peak traffic scenarios while maintaining optimal performance and resource utilization. Traditional database connection approaches often struggle with connection overhead, resource exhaustion, and poor scalability under high concurrency, leading to application bottlenecks, timeout errors, and degraded user experience.

MongoDB's connection pooling provides sophisticated connection management capabilities with intelligent pooling, automatic connection lifecycle management, and advanced concurrency control designed specifically for high-performance applications. Unlike traditional connection management that requires manual configuration and monitoring, MongoDB's connection pooling automatically optimizes connection usage while providing comprehensive monitoring and tuning capabilities for enterprise-scale deployments.

The Traditional Connection Management Challenge

Conventional database connection management faces significant scalability limitations:

-- Traditional PostgreSQL connection management - manual connection handling with poor scalability

-- Basic connection configuration (limited flexibility)
CREATE DATABASE production_app;

-- Connection pool configuration in application.properties (static configuration)
-- spring.datasource.url=jdbc:postgresql://localhost:5432/production_app
-- spring.datasource.username=app_user
-- spring.datasource.password=secure_password
-- spring.datasource.driver-class-name=org.postgresql.Driver

-- HikariCP connection pool settings (manual tuning required)
-- spring.datasource.hikari.maximum-pool-size=20
-- spring.datasource.hikari.minimum-idle=5
-- spring.datasource.hikari.connection-timeout=30000
-- spring.datasource.hikari.idle-timeout=600000
-- spring.datasource.hikari.max-lifetime=1800000
-- spring.datasource.hikari.leak-detection-threshold=60000

-- Application layer connection management with manual pooling
CREATE TABLE connection_metrics (
    metric_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    metric_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Connection pool metrics
    pool_name VARCHAR(100),
    active_connections INTEGER,
    idle_connections INTEGER,
    total_connections INTEGER,
    max_pool_size INTEGER,

    -- Performance metrics
    connection_acquisition_time_ms INTEGER,
    connection_usage_time_ms INTEGER,
    query_execution_count INTEGER,
    failed_connection_attempts INTEGER,

    -- Resource utilization
    memory_usage_bytes BIGINT,
    cpu_usage_percent DECIMAL(5,2),
    connection_wait_count INTEGER,
    connection_timeout_count INTEGER,

    -- Error tracking
    connection_leak_count INTEGER,
    pool_exhaustion_count INTEGER,
    database_errors INTEGER
);

-- Manual connection monitoring with limited visibility
CREATE OR REPLACE FUNCTION monitor_connection_pool()
RETURNS TABLE(
    pool_status VARCHAR,
    active_count INTEGER,
    idle_count INTEGER,
    wait_count INTEGER,
    usage_percent DECIMAL
) AS $$
BEGIN
    -- Basic connection pool monitoring (limited capabilities)
    RETURN QUERY
    WITH pool_stats AS (
        SELECT 
            'main_pool' as pool_name,
            -- Simulated pool metrics (not real-time)
            15 as current_active,
            5 as current_idle,
            20 as pool_max_size,
            2 as current_waiting
    )
    SELECT 
        'operational'::VARCHAR as pool_status,
        ps.current_active,
        ps.current_idle,
        ps.current_waiting,
        ROUND((ps.current_active::DECIMAL / ps.pool_max_size::DECIMAL) * 100, 2) as usage_percent
    FROM pool_stats ps;
END;
$$ LANGUAGE plpgsql;

-- Inadequate connection handling in stored procedures
CREATE OR REPLACE FUNCTION process_high_volume_transactions()
RETURNS VOID AS $$
DECLARE
    batch_size INTEGER := 1000;
    processed_count INTEGER := 0;
    error_count INTEGER := 0;
    connection_failures INTEGER := 0;
    start_time TIMESTAMP := CURRENT_TIMESTAMP;

    -- Limited connection context
    transaction_cursor CURSOR FOR 
        SELECT transaction_id, amount, user_id, transaction_type
        FROM pending_transactions
        WHERE status = 'pending'
        ORDER BY created_at
        LIMIT 10000;

    transaction_record RECORD;

BEGIN
    RAISE NOTICE 'Starting high-volume transaction processing...';

    -- Manual transaction processing with connection overhead
    FOR transaction_record IN transaction_cursor LOOP
        BEGIN
            -- Each operation creates connection overhead and latency
            INSERT INTO processed_transactions (
                original_transaction_id, 
                amount, 
                user_id, 
                transaction_type,
                processed_at,
                processing_batch
            ) VALUES (
                transaction_record.transaction_id,
                transaction_record.amount,
                transaction_record.user_id,
                transaction_record.transaction_type,
                CURRENT_TIMESTAMP,
                'batch_' || EXTRACT(EPOCH FROM start_time)
            );

            -- Update original transaction status
            UPDATE pending_transactions 
            SET status = 'processed',
                processed_at = CURRENT_TIMESTAMP,
                processed_by = CURRENT_USER
            WHERE transaction_id = transaction_record.transaction_id;

            processed_count := processed_count + 1;

            -- Frequent commits create connection pressure
            IF processed_count % batch_size = 0 THEN
                COMMIT;
                RAISE NOTICE 'Processed % transactions', processed_count;

                -- Manual connection health check (limited effectiveness)
                PERFORM pg_stat_get_activity(NULL);
            END IF;

        EXCEPTION
            WHEN connection_exception THEN
                connection_failures := connection_failures + 1;
                RAISE WARNING 'Connection failure for transaction %: %', 
                    transaction_record.transaction_id, SQLERRM;

            WHEN OTHERS THEN
                error_count := error_count + 1;
                RAISE WARNING 'Processing error for transaction %: %', 
                    transaction_record.transaction_id, SQLERRM;
        END;
    END LOOP;

    RAISE NOTICE 'Transaction processing completed: % processed, % errors, % connection failures in %',
        processed_count, error_count, connection_failures, 
        CURRENT_TIMESTAMP - start_time;

    -- Limited connection pool reporting
    INSERT INTO connection_metrics (
        pool_name, active_connections, total_connections,
        query_execution_count, failed_connection_attempts,
        connection_timeout_count
    ) VALUES (
        'manual_pool', 
        -- Estimated metrics (not accurate)
        GREATEST(processed_count / 100, 1),
        20,
        processed_count,
        connection_failures,
        connection_failures
    );
END;
$$ LANGUAGE plpgsql;

-- Complex manual connection management for concurrent operations
CREATE OR REPLACE FUNCTION concurrent_data_processing()
RETURNS TABLE(
    worker_id INTEGER,
    records_processed INTEGER,
    processing_time INTERVAL,
    connection_efficiency DECIMAL
) AS $$
DECLARE
    worker_count INTEGER := 5;
    records_per_worker INTEGER := 2000;
    worker_index INTEGER;
    processing_start TIMESTAMP;
    processing_end TIMESTAMP;

BEGIN
    processing_start := CURRENT_TIMESTAMP;

    -- Simulate concurrent workers (limited parallelization in PostgreSQL)
    FOR worker_index IN 1..worker_count LOOP
        BEGIN
            -- Each worker creates separate connection overhead
            PERFORM process_worker_batch(worker_index, records_per_worker);

            processing_end := CURRENT_TIMESTAMP;

            RETURN QUERY 
            SELECT 
                worker_index,
                records_per_worker,
                processing_end - processing_start,
                ROUND(
                    records_per_worker::DECIMAL / 
                    EXTRACT(EPOCH FROM processing_end - processing_start)::DECIMAL, 
                    2
                ) as efficiency;

        EXCEPTION
            WHEN connection_exception THEN
                RAISE WARNING 'Worker % failed due to connection issues', worker_index;

                RETURN QUERY 
                SELECT worker_index, 0, INTERVAL '0', 0.0::DECIMAL;

            WHEN OTHERS THEN
                RAISE WARNING 'Worker % failed: %', worker_index, SQLERRM;

                RETURN QUERY 
                SELECT worker_index, 0, INTERVAL '0', 0.0::DECIMAL;
        END;
    END LOOP;

    RETURN;
END;
$$ LANGUAGE plpgsql;

-- Helper function for worker batch processing
CREATE OR REPLACE FUNCTION process_worker_batch(
    p_worker_id INTEGER,
    p_batch_size INTEGER
) RETURNS VOID AS $$
DECLARE
    processed INTEGER := 0;
    batch_start TIMESTAMP := CURRENT_TIMESTAMP;
BEGIN
    -- Simulated batch processing with connection overhead
    WHILE processed < p_batch_size LOOP
        -- Each operation has connection acquisition overhead
        INSERT INTO worker_results (
            worker_id,
            batch_item,
            processed_at,
            processing_order
        ) VALUES (
            p_worker_id,
            processed + 1,
            CURRENT_TIMESTAMP,
            processed
        );

        processed := processed + 1;

        -- Frequent connection status checks
        IF processed % 100 = 0 THEN
            PERFORM pg_stat_get_activity(NULL);
        END IF;
    END LOOP;

    RAISE NOTICE 'Worker % completed % records in %',
        p_worker_id, processed, CURRENT_TIMESTAMP - batch_start;
END;
$$ LANGUAGE plpgsql;

-- Limited connection pool analysis and optimization
WITH connection_analysis AS (
    SELECT 
        pool_name,
        AVG(active_connections) as avg_active,
        MAX(active_connections) as peak_active,
        AVG(connection_acquisition_time_ms) as avg_acquisition_time,
        COUNT(*) FILTER (WHERE connection_timeout_count > 0) as timeout_incidents,
        COUNT(*) FILTER (WHERE pool_exhaustion_count > 0) as exhaustion_incidents,

        -- Basic utilization calculation
        AVG(active_connections::DECIMAL / total_connections::DECIMAL) as avg_utilization,

        -- Simple performance metrics
        AVG(query_execution_count) as avg_query_throughput,
        SUM(failed_connection_attempts) as total_failures

    FROM connection_metrics
    WHERE metric_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY pool_name
),

pool_health_assessment AS (
    SELECT 
        ca.*,

        -- Basic health scoring (limited insight)
        CASE 
            WHEN ca.avg_utilization > 0.9 THEN 'overloaded'
            WHEN ca.avg_utilization > 0.7 THEN 'high_usage'
            WHEN ca.avg_utilization > 0.5 THEN 'normal'
            ELSE 'underutilized'
        END as pool_health,

        -- Simple recommendations
        CASE 
            WHEN ca.timeout_incidents > 5 THEN 'increase_pool_size'
            WHEN ca.avg_acquisition_time > 5000 THEN 'optimize_connection_creation'
            WHEN ca.exhaustion_incidents > 0 THEN 'review_connection_limits'
            ELSE 'monitor_trends'
        END as recommendation,

        -- Limited optimization suggestions
        CASE 
            WHEN ca.avg_utilization < 0.3 THEN 'reduce_pool_size_for_efficiency'
            WHEN ca.total_failures > 100 THEN 'investigate_connection_failures'
            ELSE 'maintain_current_configuration'
        END as optimization_advice

    FROM connection_analysis ca
)

SELECT 
    pha.pool_name,
    pha.avg_active,
    pha.peak_active,
    ROUND(pha.avg_utilization * 100, 1) as utilization_percent,
    pha.avg_acquisition_time || 'ms' as avg_connection_time,
    pha.pool_health,
    pha.recommendation,
    pha.optimization_advice,

    -- Basic performance assessment
    CASE 
        WHEN pha.avg_query_throughput > 1000 THEN 'high_performance'
        WHEN pha.avg_query_throughput > 500 THEN 'moderate_performance'
        ELSE 'low_performance'
    END as performance_assessment

FROM pool_health_assessment pha
ORDER BY pha.avg_utilization DESC;

-- Problems with traditional connection management:
-- 1. Manual configuration and tuning required for different workloads
-- 2. Limited visibility into connection usage patterns and performance
-- 3. Poor handling of connection spikes and variable load scenarios
-- 4. Rigid pooling strategies that don't adapt to application patterns
-- 5. Complex error handling for connection failures and timeouts
-- 6. Inefficient resource utilization with static pool configurations
-- 7. Difficult monitoring and debugging of connection-related issues
-- 8. Poor integration with modern microservices and cloud-native architectures
-- 9. Limited scalability with concurrent operations and high-throughput scenarios
-- 10. Complex optimization requiring deep database and application expertise

MongoDB provides comprehensive connection pooling with intelligent management and optimization:

// MongoDB Advanced Connection Pooling - enterprise-grade connection management and optimization
const { MongoClient, MongoServerError, MongoNetworkError } = require('mongodb');
const { EventEmitter } = require('events');

// Advanced MongoDB connection pool manager with intelligent optimization
class AdvancedConnectionPoolManager extends EventEmitter {
  constructor(config = {}) {
    super();

    this.config = {
      // Connection configuration
      uri: config.uri || 'mongodb://localhost:27017',
      database: config.database || 'production_app',

      // Connection pool configuration
      minPoolSize: config.minPoolSize || 5,
      maxPoolSize: config.maxPoolSize || 100,
      maxIdleTimeMS: config.maxIdleTimeMS || 30000,
      waitQueueTimeoutMS: config.waitQueueTimeoutMS || 5000,

      // Advanced pooling features
      enableConnectionPooling: config.enableConnectionPooling !== false,
      enableReadPreference: config.enableReadPreference !== false,
      enableWriteConcern: config.enableWriteConcern !== false,

      // Performance optimization
      maxConnecting: config.maxConnecting || 2,
      heartbeatFrequencyMS: config.heartbeatFrequencyMS || 10000,
      serverSelectionTimeoutMS: config.serverSelectionTimeoutMS || 30000,
      socketTimeoutMS: config.socketTimeoutMS || 0,

      // Connection management
      retryWrites: config.retryWrites !== false,
      retryReads: config.retryReads !== false,
      compressors: config.compressors || ['snappy', 'zlib'],

      // Monitoring and analytics
      enableConnectionPoolMonitoring: config.enableConnectionPoolMonitoring !== false,
      enablePerformanceAnalytics: config.enablePerformanceAnalytics !== false,
      enableAdaptivePooling: config.enableAdaptivePooling !== false,

      // Application-specific optimization
      applicationName: config.applicationName || 'enterprise-mongodb-app',
      loadBalanced: config.loadBalanced || false,
      directConnection: config.directConnection || false
    };

    // Connection pool state
    this.connectionState = {
      isInitialized: false,
      client: null,
      database: null,
      connectionStats: {
        totalConnections: 0,
        activeConnections: 0,
        availableConnections: 0,
        connectionRequests: 0,
        failedConnections: 0,
        pooledConnections: 0
      }
    };

    // Performance monitoring
    this.performanceMetrics = {
      connectionAcquisitionTimes: [],
      operationLatencies: [],
      throughputMeasurements: [],
      errorRates: [],
      resourceUtilization: []
    };

    // Connection pool event handlers
    this.poolEventHandlers = new Map();

    // Adaptive pooling algorithm
    this.adaptivePooling = {
      enabled: this.config.enableAdaptivePooling,
      learningPeriodMS: 300000, // 5 minutes
      adjustmentThreshold: 0.15,
      lastAdjustment: Date.now(),
      performanceBaseline: null
    };

    this.initializeConnectionPool();
  }

  async initializeConnectionPool() {
    console.log('Initializing advanced MongoDB connection pool...');

    try {
      // Create MongoDB client with optimized connection pool settings
      this.connectionState.client = new MongoClient(this.config.uri, {
        // Connection pool configuration
        minPoolSize: this.config.minPoolSize,
        maxPoolSize: this.config.maxPoolSize,
        maxIdleTimeMS: this.config.maxIdleTimeMS,
        waitQueueTimeoutMS: this.config.waitQueueTimeoutMS,
        maxConnecting: this.config.maxConnecting,

        // Server selection and timeouts
        serverSelectionTimeoutMS: this.config.serverSelectionTimeoutMS,
        heartbeatFrequencyMS: this.config.heartbeatFrequencyMS,
        socketTimeoutMS: this.config.socketTimeoutMS,
        connectTimeoutMS: 10000,

        // Connection optimization
        retryWrites: this.config.retryWrites,
        retryReads: this.config.retryReads,
        compressors: this.config.compressors,

        // Application configuration
        appName: this.config.applicationName,
        loadBalanced: this.config.loadBalanced,
        directConnection: this.config.directConnection,

        // Read and write preferences
        readPreference: 'secondaryPreferred',
        writeConcern: { w: 'majority', j: true },
        readConcern: { level: 'majority' },

        // Monitoring configuration
        monitorCommands: this.config.enableConnectionPoolMonitoring,
        loggerLevel: 'info'
      });

      // Setup connection pool event monitoring
      if (this.config.enableConnectionPoolMonitoring) {
        this.setupConnectionPoolMonitoring();
      }

      // Connect to MongoDB
      await this.connectionState.client.connect();
      this.connectionState.database = this.connectionState.client.db(this.config.database);
      this.connectionState.isInitialized = true;

      // Initialize performance monitoring
      if (this.config.enablePerformanceAnalytics) {
        await this.initializePerformanceMonitoring();
      }

      // Setup adaptive pooling if enabled
      if (this.config.enableAdaptivePooling) {
        this.setupAdaptivePooling();
      }

      console.log('MongoDB connection pool initialized successfully', {
        database: this.config.database,
        minPoolSize: this.config.minPoolSize,
        maxPoolSize: this.config.maxPoolSize,
        adaptivePooling: this.config.enableAdaptivePooling
      });

      this.emit('connectionPoolReady', this.getConnectionStats());

      return this.connectionState.database;

    } catch (error) {
      console.error('Failed to initialize connection pool:', error);
      this.emit('connectionPoolError', error);
      throw error;
    }
  }

  setupConnectionPoolMonitoring() {
    console.log('Setting up comprehensive connection pool monitoring...');

    // Connection pool opened
    this.connectionState.client.on('connectionPoolCreated', (event) => {
      console.log(`Connection pool created: ${event.address}`, {
        maxPoolSize: event.options?.maxPoolSize,
        minPoolSize: event.options?.minPoolSize
      });

      this.emit('poolCreated', event);
    });

    // Connection created
    this.connectionState.client.on('connectionCreated', (event) => {
      this.connectionState.connectionStats.totalConnections++;
      this.connectionState.connectionStats.availableConnections++;

      console.log(`Connection created: ${event.connectionId}`, {
        totalConnections: this.connectionState.connectionStats.totalConnections
      });

      this.emit('connectionCreated', event);
    });

    // Connection ready
    this.connectionState.client.on('connectionReady', (event) => {
      console.log(`Connection ready: ${event.connectionId}`);
      this.emit('connectionReady', event);
    });

    // Connection checked out
    this.connectionState.client.on('connectionCheckedOut', (event) => {
      this.connectionState.connectionStats.activeConnections++;
      this.connectionState.connectionStats.availableConnections--;

      const checkoutTime = Date.now();
      this.recordConnectionAcquisitionTime(checkoutTime);

      this.emit('connectionCheckedOut', event);
    });

    // Connection checked in
    this.connectionState.client.on('connectionCheckedIn', (event) => {
      this.connectionState.connectionStats.activeConnections--;
      this.connectionState.connectionStats.availableConnections++;

      this.emit('connectionCheckedIn', event);
    });

    // Connection pool closed
    this.connectionState.client.on('connectionPoolClosed', (event) => {
      console.log(`Connection pool closed: ${event.address}`);
      this.emit('connectionPoolClosed', event);
    });

    // Connection check out failed
    this.connectionState.client.on('connectionCheckOutFailed', (event) => {
      this.connectionState.connectionStats.failedConnections++;

      console.warn(`Connection checkout failed: ${event.reason}`, {
        failedConnections: this.connectionState.connectionStats.failedConnections
      });

      this.emit('connectionCheckoutFailed', event);

      // Trigger adaptive pooling adjustment if enabled
      if (this.config.enableAdaptivePooling) {
        this.evaluatePoolingAdjustment('checkout_failure');
      }
    });

    // Connection closed
    this.connectionState.client.on('connectionClosed', (event) => {
      this.connectionState.connectionStats.totalConnections--;

      console.log(`Connection closed: ${event.connectionId}`, {
        reason: event.reason,
        totalConnections: this.connectionState.connectionStats.totalConnections
      });

      this.emit('connectionClosed', event);
    });
  }

  async executeWithPoolManagement(operation, options = {}) {
    console.log('Executing operation with advanced pool management...');
    const startTime = Date.now();

    try {
      if (!this.connectionState.isInitialized) {
        throw new Error('Connection pool not initialized');
      }

      // Record connection request
      this.connectionState.connectionStats.connectionRequests++;

      // Check pool health before operation
      const poolHealth = await this.assessPoolHealth();
      if (poolHealth.status === 'critical') {
        console.warn('Pool in critical state, applying emergency measures...');
        await this.applyEmergencyPoolMeasures(poolHealth);
      }

      // Execute operation with connection management
      const result = await this.executeOperationWithRetry(operation, options);

      // Record successful operation
      const executionTime = Date.now() - startTime;
      this.recordOperationLatency(executionTime);

      // Update performance metrics
      if (this.config.enablePerformanceAnalytics) {
        this.updatePerformanceMetrics(executionTime, 'success');
      }

      return result;

    } catch (error) {
      const executionTime = Date.now() - startTime;

      console.error('Operation failed with connection pool:', error.message);

      // Record failed operation
      this.recordOperationLatency(executionTime, 'error');

      // Handle connection-specific errors
      if (this.isConnectionError(error)) {
        await this.handleConnectionError(error, options);
      }

      // Update error metrics
      if (this.config.enablePerformanceAnalytics) {
        this.updatePerformanceMetrics(executionTime, 'error');
      }

      throw error;
    }
  }

  async executeOperationWithRetry(operation, options) {
    const maxRetries = options.maxRetries || 3;
    const retryDelayMs = options.retryDelayMs || 1000;
    let lastError = null;

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        // Execute the operation
        const result = await operation(this.connectionState.database);

        if (attempt > 1) {
          console.log(`Operation succeeded on retry attempt ${attempt}`);
        }

        return result;

      } catch (error) {
        lastError = error;

        // Check if error is retryable
        if (!this.isRetryableError(error) || attempt === maxRetries) {
          throw error;
        }

        console.warn(`Operation failed (attempt ${attempt}/${maxRetries}): ${error.message}`);

        // Wait before retry with exponential backoff
        const delay = retryDelayMs * Math.pow(2, attempt - 1);
        await this.sleep(delay);
      }
    }

    throw lastError;
  }

  async performBulkOperationsWithPoolOptimization(collectionName, operations, options = {}) {
    console.log(`Executing bulk operations with pool optimization: ${operations.length} operations...`);
    const startTime = Date.now();

    try {
      // Optimize pool for bulk operations
      await this.optimizePoolForBulkOperations(operations.length);

      const collection = this.connectionState.database.collection(collectionName);
      const batchSize = options.batchSize || 1000;
      const results = {
        totalOperations: operations.length,
        successfulOperations: 0,
        failedOperations: 0,
        batches: [],
        totalTime: 0,
        averageLatency: 0
      };

      // Process operations in optimized batches
      const batches = this.createOptimizedBatches(operations, batchSize);

      for (let batchIndex = 0; batchIndex < batches.length; batchIndex++) {
        const batch = batches[batchIndex];
        const batchStartTime = Date.now();

        try {
          const batchResult = await this.executeWithPoolManagement(async (db) => {
            return await collection.bulkWrite(batch, {
              ordered: options.ordered !== false,
              writeConcern: { w: 'majority', j: true }
            });
          });

          const batchTime = Date.now() - batchStartTime;
          results.successfulOperations += batchResult.insertedCount + batchResult.modifiedCount;
          results.batches.push({
            batchIndex,
            batchSize: batch.length,
            executionTime: batchTime,
            insertedCount: batchResult.insertedCount,
            modifiedCount: batchResult.modifiedCount,
            deletedCount: batchResult.deletedCount
          });

          console.log(`Batch ${batchIndex + 1}/${batches.length} completed: ${batch.length} operations in ${batchTime}ms`);

        } catch (batchError) {
          console.error(`Batch ${batchIndex + 1} failed:`, batchError.message);
          results.failedOperations += batch.length;

          if (!options.continueOnError) {
            throw batchError;
          }
        }
      }

      // Calculate final statistics
      results.totalTime = Date.now() - startTime;
      results.averageLatency = results.totalTime / results.batches.length;

      console.log(`Bulk operations completed: ${results.successfulOperations}/${results.totalOperations} successful in ${results.totalTime}ms`);

      return results;

    } catch (error) {
      console.error('Bulk operations failed:', error);
      throw error;
    }
  }

  async handleConcurrentOperations(concurrentTasks, options = {}) {
    console.log(`Managing ${concurrentTasks.length} concurrent operations with pool optimization...`);
    const startTime = Date.now();

    try {
      // Optimize pool for concurrent operations
      await this.optimizePoolForConcurrency(concurrentTasks.length);

      const maxConcurrency = options.maxConcurrency || Math.min(concurrentTasks.length, this.config.maxPoolSize * 0.8);
      const results = [];
      const errors = [];

      // Execute tasks with controlled concurrency
      const taskPromises = [];
      const semaphore = { count: maxConcurrency };

      for (let i = 0; i < concurrentTasks.length; i++) {
        const task = concurrentTasks[i];
        const taskPromise = this.executeConcurrentTask(task, i, semaphore, options);
        taskPromises.push(taskPromise);
      }

      // Wait for all tasks to complete
      const taskResults = await Promise.allSettled(taskPromises);

      // Process results
      taskResults.forEach((result, index) => {
        if (result.status === 'fulfilled') {
          results.push({
            taskIndex: index,
            result: result.value,
            success: true
          });
        } else {
          errors.push({
            taskIndex: index,
            error: result.reason.message,
            success: false
          });
        }
      });

      const totalTime = Date.now() - startTime;

      console.log(`Concurrent operations completed: ${results.length} successful, ${errors.length} failed in ${totalTime}ms`);

      return {
        totalTasks: concurrentTasks.length,
        successfulTasks: results.length,
        failedTasks: errors.length,
        totalTime,
        results,
        errors,
        averageConcurrency: maxConcurrency
      };

    } catch (error) {
      console.error('Concurrent operations management failed:', error);
      throw error;
    }
  }

  async executeConcurrentTask(task, taskIndex, semaphore, options) {
    // Wait for semaphore (connection availability)
    await this.acquireSemaphore(semaphore);

    try {
      const taskStartTime = Date.now();

      const result = await this.executeWithPoolManagement(async (db) => {
        return await task(db, taskIndex);
      }, options);

      const taskTime = Date.now() - taskStartTime;

      return {
        taskIndex,
        executionTime: taskTime,
        result
      };

    } finally {
      this.releaseSemaphore(semaphore);
    }
  }

  async optimizePoolForBulkOperations(operationCount) {
    console.log(`Optimizing connection pool for ${operationCount} bulk operations...`);

    // Calculate optimal pool size for bulk operations
    const estimatedConnections = Math.min(
      Math.ceil(operationCount / 1000) + 2, // Base estimate plus buffer
      this.config.maxPoolSize
    );

    // Temporarily adjust pool if needed
    if (estimatedConnections > this.config.minPoolSize) {
      console.log(`Temporarily increasing pool size to ${estimatedConnections} for bulk operations`);
      // Note: In production, this would adjust pool configuration dynamically
    }
  }

  async optimizePoolForConcurrency(concurrentTaskCount) {
    console.log(`Optimizing connection pool for ${concurrentTaskCount} concurrent operations...`);

    // Ensure sufficient connections for concurrency
    const requiredConnections = Math.min(concurrentTaskCount + 2, this.config.maxPoolSize);

    if (requiredConnections > this.connectionState.connectionStats.totalConnections) {
      console.log(`Pool optimization: ensuring ${requiredConnections} connections are available`);
      // Note: MongoDB driver automatically manages this, but we can provide hints
    }
  }

  async assessPoolHealth() {
    const stats = this.getConnectionStats();
    const utilizationRatio = stats.activeConnections / this.config.maxPoolSize;
    const failureRate = stats.failedConnections / Math.max(stats.connectionRequests, 1);

    let status = 'healthy';
    const issues = [];

    if (utilizationRatio > 0.9) {
      status = 'critical';
      issues.push('high_utilization');
    } else if (utilizationRatio > 0.7) {
      status = 'warning';
      issues.push('moderate_utilization');
    }

    if (failureRate > 0.1) {
      status = status === 'healthy' ? 'warning' : 'critical';
      issues.push('high_failure_rate');
    }

    if (stats.availableConnections === 0) {
      status = 'critical';
      issues.push('no_available_connections');
    }

    return {
      status,
      utilizationRatio,
      failureRate,
      issues,
      recommendations: this.generateHealthRecommendations(issues)
    };
  }

  generateHealthRecommendations(issues) {
    const recommendations = [];

    if (issues.includes('high_utilization')) {
      recommendations.push('Consider increasing maxPoolSize');
    }

    if (issues.includes('high_failure_rate')) {
      recommendations.push('Check network connectivity and server health');
    }

    if (issues.includes('no_available_connections')) {
      recommendations.push('Investigate connection leaks and optimize operation duration');
    }

    return recommendations;
  }

  async applyEmergencyPoolMeasures(poolHealth) {
    console.log('Applying emergency pool measures:', poolHealth.issues);

    if (poolHealth.issues.includes('no_available_connections')) {
      console.log('Force closing idle connections to recover pool capacity...');
      // In production, this would implement connection cleanup
    }

    if (poolHealth.issues.includes('high_failure_rate')) {
      console.log('Implementing circuit breaker for connection failures...');
      // In production, this would implement circuit breaker pattern
    }
  }

  setupAdaptivePooling() {
    console.log('Setting up adaptive connection pooling algorithm...');

    setInterval(() => {
      this.evaluateAndAdjustPool();
    }, this.adaptivePooling.learningPeriodMS);
  }

  async evaluateAndAdjustPool() {
    if (!this.adaptivePooling.enabled) return;

    console.log('Evaluating pool performance for adaptive adjustment...');

    const currentMetrics = this.calculatePerformanceMetrics();

    if (this.adaptivePooling.performanceBaseline === null) {
      this.adaptivePooling.performanceBaseline = currentMetrics;
      return;
    }

    const performanceChange = this.comparePerformanceMetrics(
      currentMetrics,
      this.adaptivePooling.performanceBaseline
    );

    if (Math.abs(performanceChange) > this.adaptivePooling.adjustmentThreshold) {
      await this.adjustPoolConfiguration(performanceChange, currentMetrics);
      this.adaptivePooling.performanceBaseline = currentMetrics;
    }
  }

  async adjustPoolConfiguration(performanceChange, metrics) {
    console.log(`Adaptive pooling: adjusting configuration based on ${performanceChange > 0 ? 'improved' : 'degraded'} performance`);

    if (performanceChange < -this.adaptivePooling.adjustmentThreshold) {
      // Performance degraded, try to optimize
      if (metrics.utilizationRatio > 0.8) {
        console.log('Increasing pool size due to high utilization');
        // In production, would adjust pool size
      }
    } else if (performanceChange > this.adaptivePooling.adjustmentThreshold) {
      // Performance improved, maintain or optimize further
      console.log('Performance improved, maintaining current pool configuration');
    }
  }

  // Utility methods for connection pool management

  recordConnectionAcquisitionTime(checkoutTime) {
    const acquisitionTime = Date.now() - checkoutTime;
    this.performanceMetrics.connectionAcquisitionTimes.push(acquisitionTime);

    // Keep only recent measurements
    if (this.performanceMetrics.connectionAcquisitionTimes.length > 1000) {
      this.performanceMetrics.connectionAcquisitionTimes = 
        this.performanceMetrics.connectionAcquisitionTimes.slice(-500);
    }
  }

  recordOperationLatency(latency, status = 'success') {
    this.performanceMetrics.operationLatencies.push({
      latency,
      status,
      timestamp: Date.now()
    });

    // Keep only recent measurements
    if (this.performanceMetrics.operationLatencies.length > 1000) {
      this.performanceMetrics.operationLatencies = 
        this.performanceMetrics.operationLatencies.slice(-500);
    }
  }

  isConnectionError(error) {
    return error instanceof MongoNetworkError || 
           error instanceof MongoServerError ||
           error.message.includes('connection') ||
           error.message.includes('timeout');
  }

  isRetryableError(error) {
    if (error instanceof MongoNetworkError) return true;
    if (error.code === 11000) return false; // Duplicate key error
    if (error.message.includes('timeout')) return true;
    return false;
  }

  async handleConnectionError(error, options) {
    console.warn('Handling connection error:', error.message);

    if (error instanceof MongoNetworkError) {
      console.log('Network error detected, checking pool health...');
      const poolHealth = await this.assessPoolHealth();
      if (poolHealth.status === 'critical') {
        await this.applyEmergencyPoolMeasures(poolHealth);
      }
    }
  }

  createOptimizedBatches(operations, batchSize) {
    const batches = [];
    for (let i = 0; i < operations.length; i += batchSize) {
      batches.push(operations.slice(i, i + batchSize));
    }
    return batches;
  }

  async acquireSemaphore(semaphore) {
    while (semaphore.count <= 0) {
      await this.sleep(10);
    }
    semaphore.count--;
  }

  releaseSemaphore(semaphore) {
    semaphore.count++;
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  getConnectionStats() {
    return {
      ...this.connectionState.connectionStats,
      poolSize: this.config.maxPoolSize,
      utilizationRatio: this.connectionState.connectionStats.activeConnections / this.config.maxPoolSize,
      timestamp: new Date()
    };
  }

  calculatePerformanceMetrics() {
    const recent = this.performanceMetrics.operationLatencies.slice(-100);
    const avgLatency = recent.reduce((sum, op) => sum + op.latency, 0) / recent.length || 0;
    const successRate = recent.filter(op => op.status === 'success').length / recent.length || 0;
    const utilizationRatio = this.connectionState.connectionStats.activeConnections / this.config.maxPoolSize;

    return {
      avgLatency,
      successRate,
      utilizationRatio,
      throughput: recent.length / 5 // Operations per second estimate
    };
  }

  comparePerformanceMetrics(current, baseline) {
    const latencyChange = (baseline.avgLatency - current.avgLatency) / baseline.avgLatency;
    const successRateChange = current.successRate - baseline.successRate;
    const throughputChange = (current.throughput - baseline.throughput) / baseline.throughput;

    // Weighted performance score
    return (latencyChange * 0.4) + (successRateChange * 0.3) + (throughputChange * 0.3);
  }

  async getDetailedPoolAnalytics() {
    const stats = this.getConnectionStats();
    const metrics = this.calculatePerformanceMetrics();
    const poolHealth = await this.assessPoolHealth();

    return {
      connectionStats: stats,
      performanceMetrics: metrics,
      poolHealth: poolHealth,
      configuration: {
        minPoolSize: this.config.minPoolSize,
        maxPoolSize: this.config.maxPoolSize,
        maxIdleTimeMS: this.config.maxIdleTimeMS,
        adaptivePoolingEnabled: this.config.enableAdaptivePooling
      },
      recommendations: poolHealth.recommendations
    };
  }

  async closeConnectionPool() {
    console.log('Closing MongoDB connection pool...');

    if (this.connectionState.client) {
      await this.connectionState.client.close();
      this.connectionState.isInitialized = false;
      console.log('Connection pool closed successfully');
    }
  }
}

// Example usage for enterprise-scale applications
async function demonstrateAdvancedConnectionPooling() {
  const poolManager = new AdvancedConnectionPoolManager({
    uri: 'mongodb://localhost:27017',
    database: 'production_analytics',
    minPoolSize: 10,
    maxPoolSize: 50,
    enableAdaptivePooling: true,
    enablePerformanceAnalytics: true,
    applicationName: 'enterprise-data-processor'
  });

  try {
    // Wait for pool initialization
    await poolManager.initializeConnectionPool();

    // Demonstrate bulk operations with pool optimization
    const bulkOperations = Array.from({ length: 5000 }, (_, index) => ({
      insertOne: {
        document: {
          userId: `user_${index}`,
          eventType: 'page_view',
          timestamp: new Date(),
          sessionId: `session_${Math.floor(index / 100)}`,
          data: {
            page: `/page_${index % 50}`,
            duration: Math.floor(Math.random() * 300),
            source: 'web'
          }
        }
      }
    }));

    console.log('Executing bulk operations with pool optimization...');
    const bulkResults = await poolManager.performBulkOperationsWithPoolOptimization(
      'user_events',
      bulkOperations,
      {
        batchSize: 1000,
        continueOnError: true
      }
    );

    // Demonstrate concurrent operations
    const concurrentTasks = Array.from({ length: 20 }, (_, index) => 
      async (db, taskIndex) => {
        const collection = db.collection('analytics_data');

        // Simulate complex aggregation
        const result = await collection.aggregate([
          { $match: { userId: { $regex: `user_${taskIndex}` } } },
          { $group: {
            _id: '$eventType',
            count: { $sum: 1 },
            avgDuration: { $avg: '$data.duration' }
          }},
          { $sort: { count: -1 } }
        ]).toArray();

        return { taskIndex, resultCount: result.length };
      }
    );

    console.log('Executing concurrent operations with pool management...');
    const concurrentResults = await poolManager.handleConcurrentOperations(concurrentTasks, {
      maxConcurrency: 15
    });

    // Get detailed analytics
    const poolAnalytics = await poolManager.getDetailedPoolAnalytics();
    console.log('Connection Pool Analytics:', JSON.stringify(poolAnalytics, null, 2));

    return {
      bulkResults,
      concurrentResults,
      poolAnalytics
    };

  } catch (error) {
    console.error('Advanced connection pooling demonstration failed:', error);
    throw error;
  } finally {
    await poolManager.closeConnectionPool();
  }
}

// Benefits of MongoDB Advanced Connection Pooling:
// - Intelligent connection management with automatic optimization and resource management
// - Comprehensive monitoring with real-time pool health assessment and performance analytics
// - Adaptive pooling algorithms that adjust to application patterns and workload changes
// - Advanced error handling with retry mechanisms and circuit breaker patterns
// - Support for concurrent operations with intelligent connection allocation and management
// - Production-ready scalability with distributed connection management and optimization
// - Comprehensive analytics and monitoring for operational insight and troubleshooting
// - Seamless integration with MongoDB's native connection pooling and cluster management

module.exports = {
  AdvancedConnectionPoolManager,
  demonstrateAdvancedConnectionPooling
};

Understanding MongoDB Connection Pooling Architecture

Enterprise-Scale Connection Management and Optimization

Implement sophisticated connection pooling strategies for production applications:

// Production-ready connection pooling with advanced features and enterprise optimization
class ProductionConnectionPoolPlatform extends AdvancedConnectionPoolManager {
  constructor(productionConfig) {
    super(productionConfig);

    this.productionConfig = {
      ...productionConfig,
      distributedPooling: true,
      realtimeMonitoring: true,
      advancedLoadBalancing: true,
      enterpriseFailover: true,
      automaticRecovery: true,
      performanceOptimization: true
    };

    this.setupProductionFeatures();
    this.initializeDistributedPooling();
    this.setupEnterpriseMonitoring();
  }

  async implementDistributedConnectionPooling() {
    console.log('Setting up distributed connection pooling architecture...');

    const distributedStrategy = {
      // Multi-region pooling
      regionAwareness: {
        enabled: true,
        primaryRegion: 'us-east-1',
        secondaryRegions: ['us-west-2', 'eu-west-1'],
        crossRegionFailover: true
      },

      // Load balancing
      loadBalancing: {
        algorithm: 'weighted_round_robin',
        healthChecking: true,
        automaticFailover: true,
        loadFactors: {
          latency: 0.4,
          throughput: 0.3,
          availability: 0.3
        }
      },

      // Connection optimization
      optimization: {
        connectionAffinity: true,
        adaptiveBatchSizing: true,
        intelligentRouting: true,
        resourceOptimization: true
      }
    };

    return await this.deployDistributedStrategy(distributedStrategy);
  }

  async implementEnterpriseFailover() {
    console.log('Implementing enterprise-grade failover mechanisms...');

    const failoverStrategy = {
      // Automatic failover
      automaticFailover: {
        enabled: true,
        healthCheckInterval: 5000,
        failoverThreshold: 3,
        recoveryTimeout: 30000
      },

      // Connection recovery
      connectionRecovery: {
        automaticRecovery: true,
        retryBackoffStrategy: 'exponential',
        maxRecoveryAttempts: 5,
        recoveryDelay: 1000
      },

      // High availability
      highAvailability: {
        redundantConnections: true,
        crossDatacenterFailover: true,
        zeroDowntimeRecovery: true,
        dataConsistencyGuarantees: true
      }
    };

    return await this.deployFailoverStrategy(failoverStrategy);
  }

  async implementPerformanceOptimization() {
    console.log('Implementing advanced performance optimization...');

    const optimizationStrategy = {
      // Connection optimization
      connectionOptimization: {
        warmupConnections: true,
        connectionPreloading: true,
        intelligentCaching: true,
        resourcePooling: true
      },

      // Query optimization
      queryOptimization: {
        queryPlanCaching: true,
        connectionAffinity: true,
        batchOptimization: true,
        pipelineOptimization: true
      },

      // Resource management
      resourceManagement: {
        memoryOptimization: true,
        cpuUtilizationOptimization: true,
        networkOptimization: true,
        diskIOOptimization: true
      }
    };

    return await this.deployOptimizationStrategy(optimizationStrategy);
  }
}

SQL-Style Connection Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB connection pooling and management:

-- QueryLeaf connection pooling with SQL-familiar configuration syntax

-- Configure connection pool settings
SET connection_pool_min_size = 10;
SET connection_pool_max_size = 100;
SET connection_pool_max_idle_time = '30 seconds';
SET connection_pool_wait_timeout = '5 seconds';
SET enable_adaptive_pooling = true;
SET enable_connection_monitoring = true;

-- Advanced connection pool configuration
WITH connection_pool_configuration AS (
  SELECT 
    -- Pool sizing configuration
    10 as min_pool_size,
    100 as max_pool_size,
    30000 as max_idle_time_ms,
    5000 as wait_queue_timeout_ms,
    2 as max_connecting,

    -- Performance optimization
    true as enable_compression,
    ARRAY['snappy', 'zlib'] as compression_algorithms,
    true as retry_writes,
    true as retry_reads,

    -- Application configuration
    'enterprise-analytics-app' as application_name,
    false as load_balanced,
    false as direct_connection,

    -- Monitoring and analytics
    true as enable_monitoring,
    true as enable_performance_analytics,
    true as enable_adaptive_pooling,
    true as enable_health_checking,

    -- Timeout and retry configuration
    30000 as server_selection_timeout_ms,
    10000 as heartbeat_frequency_ms,
    0 as socket_timeout_ms,
    10000 as connect_timeout_ms,

    -- Read and write preferences
    'secondaryPreferred' as read_preference,
    JSON_OBJECT('w', 'majority', 'j', true) as write_concern,
    JSON_OBJECT('level', 'majority') as read_concern
),

-- Monitor connection pool performance and utilization
connection_pool_metrics AS (
  SELECT 
    pool_name,
    measurement_timestamp,

    -- Connection statistics
    total_connections,
    active_connections,
    available_connections,
    pooled_connections,
    connection_requests,
    failed_connections,

    -- Performance metrics
    avg_connection_acquisition_time_ms,
    max_connection_acquisition_time_ms,
    avg_operation_latency_ms,
    operations_per_second,

    -- Pool utilization analysis
    ROUND((active_connections::DECIMAL / total_connections::DECIMAL) * 100, 2) as utilization_percent,
    ROUND((failed_connections::DECIMAL / NULLIF(connection_requests::DECIMAL, 0)) * 100, 2) as failure_rate_percent,

    -- Connection lifecycle metrics
    connections_created_per_minute,
    connections_closed_per_minute,
    connection_timeouts,

    -- Resource utilization
    memory_usage_mb,
    cpu_usage_percent,
    network_bytes_per_second,

    -- Health indicators
    CASE 
      WHEN utilization_percent > 90 THEN 'critical'
      WHEN utilization_percent > 70 THEN 'warning'
      WHEN utilization_percent > 50 THEN 'normal'
      ELSE 'low'
    END as utilization_status,

    CASE 
      WHEN failure_rate_percent > 10 THEN 'critical'
      WHEN failure_rate_percent > 5 THEN 'warning'
      WHEN failure_rate_percent > 1 THEN 'moderate'
      ELSE 'healthy'
    END as connection_health_status

  FROM connection_pool_monitoring_data
  WHERE measurement_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
),

-- Analyze connection pool performance trends
performance_trend_analysis AS (
  SELECT 
    pool_name,
    DATE_TRUNC('minute', measurement_timestamp) as time_bucket,

    -- Aggregated performance metrics
    AVG(utilization_percent) as avg_utilization,
    MAX(utilization_percent) as peak_utilization,
    AVG(avg_connection_acquisition_time_ms) as avg_acquisition_time,
    MAX(max_connection_acquisition_time_ms) as peak_acquisition_time,
    AVG(operations_per_second) as avg_throughput,

    -- Error and timeout analysis
    SUM(failed_connections) as total_failures,
    SUM(connection_timeouts) as total_timeouts,
    AVG(failure_rate_percent) as avg_failure_rate,

    -- Resource consumption trends
    AVG(memory_usage_mb) as avg_memory_usage,
    AVG(cpu_usage_percent) as avg_cpu_usage,
    AVG(network_bytes_per_second) as avg_network_usage,

    -- Performance scoring
    CASE 
      WHEN AVG(avg_operation_latency_ms) < 10 AND AVG(failure_rate_percent) < 1 THEN 100
      WHEN AVG(avg_operation_latency_ms) < 50 AND AVG(failure_rate_percent) < 5 THEN 80
      WHEN AVG(avg_operation_latency_ms) < 100 AND AVG(failure_rate_percent) < 10 THEN 60
      ELSE 40
    END as performance_score,

    -- Trend calculations
    LAG(AVG(operations_per_second)) OVER (PARTITION BY pool_name ORDER BY time_bucket) as prev_throughput,
    LAG(AVG(avg_connection_acquisition_time_ms)) OVER (PARTITION BY pool_name ORDER BY time_bucket) as prev_acquisition_time

  FROM connection_pool_metrics
  GROUP BY pool_name, DATE_TRUNC('minute', measurement_timestamp)
),

-- Connection pool optimization recommendations
pool_optimization_analysis AS (
  SELECT 
    pta.pool_name,
    pta.time_bucket,
    pta.avg_utilization,
    pta.avg_acquisition_time,
    pta.avg_throughput,
    pta.performance_score,

    -- Performance trend analysis
    CASE 
      WHEN pta.avg_throughput > pta.prev_throughput THEN 'improving'
      WHEN pta.avg_throughput < pta.prev_throughput THEN 'degrading'
      ELSE 'stable'
    END as throughput_trend,

    CASE 
      WHEN pta.avg_acquisition_time < pta.prev_acquisition_time THEN 'improving'
      WHEN pta.avg_acquisition_time > pta.prev_acquisition_time THEN 'degrading'
      ELSE 'stable'
    END as latency_trend,

    -- Pool sizing recommendations
    CASE 
      WHEN pta.avg_utilization > 90 THEN 'increase_pool_size'
      WHEN pta.avg_utilization > 80 AND pta.avg_acquisition_time > 100 THEN 'increase_pool_size'
      WHEN pta.avg_utilization < 30 AND pta.performance_score > 80 THEN 'decrease_pool_size'
      WHEN pta.avg_acquisition_time > 200 THEN 'optimize_connection_creation'
      ELSE 'maintain_current_size'
    END as sizing_recommendation,

    -- Configuration optimization suggestions
    CASE 
      WHEN pta.total_failures > 10 THEN 'increase_retry_attempts'
      WHEN pta.total_timeouts > 5 THEN 'increase_timeout_values'
      WHEN pta.avg_failure_rate > 5 THEN 'investigate_connection_issues'
      WHEN pta.performance_score < 60 THEN 'comprehensive_optimization_needed'
      ELSE 'configuration_optimal'
    END as configuration_recommendation,

    -- Resource optimization suggestions
    CASE 
      WHEN pta.avg_memory_usage > 1000 THEN 'optimize_memory_usage'
      WHEN pta.avg_cpu_usage > 80 THEN 'optimize_cpu_utilization'
      WHEN pta.avg_network_usage > 100000000 THEN 'optimize_network_efficiency'
      ELSE 'resource_usage_optimal'
    END as resource_optimization,

    -- Priority scoring for optimization actions
    CASE 
      WHEN pta.avg_utilization > 95 OR pta.avg_failure_rate > 15 THEN 'critical'
      WHEN pta.avg_utilization > 85 OR pta.avg_failure_rate > 10 THEN 'high'
      WHEN pta.avg_utilization > 75 OR pta.avg_acquisition_time > 150 THEN 'medium'
      ELSE 'low'
    END as optimization_priority

  FROM performance_trend_analysis pta
),

-- Adaptive pooling recommendations based on workload patterns
adaptive_pooling_recommendations AS (
  SELECT 
    poa.pool_name,

    -- Current state assessment
    AVG(poa.avg_utilization) as current_avg_utilization,
    MAX(poa.avg_utilization) as current_peak_utilization,
    AVG(poa.avg_throughput) as current_avg_throughput,
    AVG(poa.performance_score) as current_performance_score,

    -- Optimization priority distribution
    COUNT(*) FILTER (WHERE poa.optimization_priority = 'critical') as critical_periods,
    COUNT(*) FILTER (WHERE poa.optimization_priority = 'high') as high_priority_periods,
    COUNT(*) FILTER (WHERE poa.optimization_priority = 'medium') as medium_priority_periods,

    -- Recommendation consensus
    MODE() WITHIN GROUP (ORDER BY poa.sizing_recommendation) as recommended_sizing_action,
    MODE() WITHIN GROUP (ORDER BY poa.configuration_recommendation) as recommended_config_action,
    MODE() WITHIN GROUP (ORDER BY poa.resource_optimization) as recommended_resource_action,

    -- Adaptive pooling configuration
    CASE 
      WHEN AVG(poa.avg_utilization) > 80 AND AVG(poa.performance_score) < 70 THEN
        JSON_OBJECT(
          'min_pool_size', GREATEST(cpc.min_pool_size + 5, 15),
          'max_pool_size', GREATEST(cpc.max_pool_size + 10, 50),
          'adjustment_reason', 'high_utilization_poor_performance'
        )
      WHEN AVG(poa.avg_utilization) < 40 AND AVG(poa.performance_score) > 85 THEN
        JSON_OBJECT(
          'min_pool_size', GREATEST(cpc.min_pool_size - 2, 5),
          'max_pool_size', cpc.max_pool_size,
          'adjustment_reason', 'low_utilization_good_performance'
        )
      WHEN AVG(poa.avg_throughput) FILTER (WHERE poa.throughput_trend = 'degrading') > 0.5 * COUNT(*) THEN
        JSON_OBJECT(
          'min_pool_size', cpc.min_pool_size + 3,
          'max_pool_size', cpc.max_pool_size + 15,
          'adjustment_reason', 'throughput_degradation'
        )
      ELSE
        JSON_OBJECT(
          'min_pool_size', cpc.min_pool_size,
          'max_pool_size', cpc.max_pool_size,
          'adjustment_reason', 'optimal_configuration'
        )
    END as adaptive_pool_config,

    -- Performance impact estimation
    CASE 
      WHEN COUNT(*) FILTER (WHERE poa.optimization_priority IN ('critical', 'high')) > COUNT(*) * 0.3 THEN
        'significant_improvement_expected'
      WHEN COUNT(*) FILTER (WHERE poa.optimization_priority = 'medium') > COUNT(*) * 0.5 THEN
        'moderate_improvement_expected'
      ELSE 'minimal_improvement_expected'
    END as expected_impact

  FROM pool_optimization_analysis poa
  CROSS JOIN connection_pool_configuration cpc
  GROUP BY poa.pool_name, cpc.min_pool_size, cpc.max_pool_size
)

-- Comprehensive connection pool management dashboard
SELECT 
  apr.pool_name,

  -- Current performance status
  ROUND(apr.current_avg_utilization, 1) || '%' as avg_utilization,
  ROUND(apr.current_peak_utilization, 1) || '%' as peak_utilization,
  ROUND(apr.current_avg_throughput, 0) as avg_throughput_ops_per_sec,
  apr.current_performance_score as performance_score,

  -- Problem severity assessment
  CASE 
    WHEN apr.critical_periods > 0 THEN 'Critical Issues Detected'
    WHEN apr.high_priority_periods > 0 THEN 'High Priority Issues Detected'
    WHEN apr.medium_priority_periods > 0 THEN 'Moderate Issues Detected'
    ELSE 'Operating Normally'
  END as overall_status,

  -- Optimization recommendations
  apr.recommended_sizing_action,
  apr.recommended_config_action,
  apr.recommended_resource_action,

  -- Adaptive pooling suggestions
  apr.adaptive_pool_config->>'min_pool_size' as recommended_min_pool_size,
  apr.adaptive_pool_config->>'max_pool_size' as recommended_max_pool_size,
  apr.adaptive_pool_config->>'adjustment_reason' as adjustment_rationale,

  -- Implementation priority and impact
  CASE 
    WHEN apr.critical_periods > 0 THEN 'Immediate'
    WHEN apr.high_priority_periods > 0 THEN 'Within 24 hours'
    WHEN apr.medium_priority_periods > 0 THEN 'Within 1 week'
    ELSE 'Monitor and evaluate'
  END as implementation_timeline,

  apr.expected_impact,

  -- Detailed action plan
  CASE 
    WHEN apr.recommended_sizing_action = 'increase_pool_size' THEN 
      ARRAY[
        'Increase max pool size to handle higher concurrent load',
        'Monitor utilization after adjustment',
        'Evaluate memory and CPU impact of larger pool',
        'Set up alerting for new utilization thresholds'
      ]
    WHEN apr.recommended_sizing_action = 'decrease_pool_size' THEN
      ARRAY[
        'Gradually reduce pool size to optimize resource usage',
        'Monitor for any performance degradation',
        'Adjust monitoring thresholds for new pool size',
        'Document resource savings achieved'
      ]
    WHEN apr.recommended_config_action = 'investigate_connection_issues' THEN
      ARRAY[
        'Review connection error logs for patterns',
        'Check network connectivity and latency',
        'Validate MongoDB server health and capacity',
        'Consider connection timeout optimization'
      ]
    ELSE 
      ARRAY['Continue monitoring current configuration', 'Review performance trends weekly']
  END as action_items,

  -- Configuration details for implementation
  JSON_BUILD_OBJECT(
    'current_configuration', JSON_BUILD_OBJECT(
      'min_pool_size', cpc.min_pool_size,
      'max_pool_size', cpc.max_pool_size,
      'max_idle_time_ms', cpc.max_idle_time_ms,
      'wait_timeout_ms', cpc.wait_queue_timeout_ms,
      'enable_adaptive_pooling', cpc.enable_adaptive_pooling
    ),
    'recommended_configuration', JSON_BUILD_OBJECT(
      'min_pool_size', (apr.adaptive_pool_config->>'min_pool_size')::integer,
      'max_pool_size', (apr.adaptive_pool_config->>'max_pool_size')::integer,
      'optimization_enabled', true,
      'monitoring_enhanced', true
    ),
    'expected_changes', JSON_BUILD_OBJECT(
      'utilization_improvement', CASE 
        WHEN apr.current_avg_utilization > 80 THEN 'Reduced peak utilization'
        WHEN apr.current_avg_utilization < 50 THEN 'Improved resource efficiency'
        ELSE 'Maintained optimal utilization'
      END,
      'performance_improvement', apr.expected_impact,
      'resource_impact', CASE 
        WHEN (apr.adaptive_pool_config->>'max_pool_size')::integer > cpc.max_pool_size THEN 'Increased memory usage'
        WHEN (apr.adaptive_pool_config->>'max_pool_size')::integer < cpc.max_pool_size THEN 'Reduced memory usage'
        ELSE 'No significant resource change'
      END
    )
  ) as configuration_details

FROM adaptive_pooling_recommendations apr
CROSS JOIN connection_pool_configuration cpc
ORDER BY apr.critical_periods DESC, apr.high_priority_periods DESC;

-- QueryLeaf provides comprehensive connection pooling capabilities:
-- 1. SQL-familiar connection pool configuration with advanced optimization settings
-- 2. Real-time monitoring and analytics for connection performance and utilization
-- 3. Intelligent pool sizing recommendations based on workload patterns and performance
-- 4. Adaptive pooling algorithms that automatically adjust to application requirements  
-- 5. Comprehensive error handling and retry mechanisms for connection reliability
-- 6. Advanced troubleshooting and optimization guidance for production environments
-- 7. Integration with MongoDB's native connection pooling features and optimizations
-- 8. Enterprise-scale monitoring with detailed metrics and performance analytics
-- 9. Automated optimization recommendations with implementation timelines and priorities
-- 10. SQL-style syntax for complex connection management workflows and configurations

Best Practices for Production Connection Pooling Implementation

Performance Architecture and Scaling Strategies

Essential principles for effective MongoDB connection pooling deployment:

  1. Pool Sizing Strategy: Configure optimal pool sizes based on application concurrency patterns and server capacity
  2. Performance Monitoring: Implement comprehensive monitoring for connection utilization, latency, and error rates
  3. Adaptive Management: Use intelligent pooling algorithms that adjust to changing workload patterns
  4. Error Handling: Design robust error handling with retry mechanisms and circuit breaker patterns
  5. Resource Optimization: Balance connection pool sizes with memory usage and server resource constraints
  6. Operational Excellence: Create monitoring dashboards and alerting for proactive pool management

Scalability and Production Deployment

Optimize connection pooling for enterprise-scale requirements:

  1. Distributed Architecture: Design connection pooling strategies that work effectively across microservices
  2. High Availability: Implement connection pooling with automatic failover and recovery capabilities
  3. Performance Tuning: Optimize pool configurations based on application patterns and MongoDB cluster topology
  4. Monitoring Integration: Integrate connection pool monitoring with enterprise observability platforms
  5. Capacity Planning: Plan connection pool capacity based on expected growth and peak load scenarios
  6. Security Considerations: Implement secure connection management with proper authentication and encryption

Conclusion

MongoDB connection pooling provides comprehensive high-performance database connection management capabilities that enable applications to efficiently handle concurrent operations, variable workloads, and peak traffic scenarios while maintaining optimal resource utilization and operational reliability. The intelligent pooling algorithms automatically optimize connection usage while providing detailed monitoring and tuning capabilities for enterprise deployments.

Key MongoDB connection pooling benefits include:

  • Intelligent Connection Management: Automatic connection lifecycle management with optimized pooling strategies
  • High Performance: Minimal connection overhead with intelligent connection reuse and resource optimization
  • Adaptive Optimization: Dynamic pool sizing based on application patterns and performance requirements
  • Comprehensive Monitoring: Real-time visibility into connection usage, performance, and health metrics
  • Enterprise Reliability: Robust error handling with automatic recovery and failover capabilities
  • Production Scalability: Distributed connection management that scales with application requirements

Whether you're building high-traffic web applications, real-time analytics platforms, microservices architectures, or any application requiring efficient database connectivity, MongoDB connection pooling with QueryLeaf's familiar SQL interface provides the foundation for scalable and reliable database connection management.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB connection pooling while providing SQL-familiar syntax for connection management and monitoring. Advanced pooling patterns, performance optimization strategies, and enterprise monitoring capabilities are seamlessly handled through familiar SQL constructs, making sophisticated connection management accessible to SQL-oriented development teams.

The combination of MongoDB's robust connection pooling capabilities with SQL-style management operations makes it an ideal platform for modern applications that require both high-performance database connectivity and familiar management patterns, ensuring your connection pooling solutions scale efficiently while remaining operationally excellent.