MongoDB Query Optimization and Explain Plans: Advanced Performance Analysis for High-Performance Database Operations
Database performance optimization is critical for applications that demand fast response times and efficient resource utilization. Poor query performance can lead to degraded user experience, increased infrastructure costs, and system bottlenecks that become increasingly problematic as data volumes and user loads grow.
MongoDB's sophisticated query optimizer and explain plan system provide comprehensive insights into query execution strategies, enabling developers and database administrators to identify performance bottlenecks, optimize index usage, and fine-tune queries for maximum efficiency. Unlike traditional database systems with limited query analysis tools, MongoDB's explain functionality offers detailed execution statistics, index usage patterns, and optimization recommendations that support both development and production performance tuning.
The Traditional Query Analysis Challenge
Conventional database systems often provide limited query analysis capabilities that make performance optimization difficult:
-- Traditional PostgreSQL query analysis with limited optimization insights
-- Basic EXPLAIN output with limited actionable information
EXPLAIN ANALYZE
SELECT
u.user_id,
u.email,
u.first_name,
u.last_name,
u.created_at,
COUNT(o.order_id) as order_count,
SUM(o.total_amount) as total_spent,
AVG(o.total_amount) as avg_order_value,
MAX(o.created_at) as last_order_date
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE u.status = 'active'
AND u.country IN ('US', 'CA', 'UK')
AND u.created_at >= '2023-01-01'
AND (o.status = 'completed' OR o.status IS NULL)
GROUP BY u.user_id, u.email, u.first_name, u.last_name, u.created_at
HAVING COUNT(o.order_id) > 0 OR u.created_at >= '2024-01-01'
ORDER BY total_spent DESC, order_count DESC
LIMIT 100;
-- PostgreSQL EXPLAIN output (simplified representation):
--
-- Limit (cost=15234.45..15234.70 rows=100 width=64) (actual time=245.123..245.167 rows=100 loops=1)
-- -> Sort (cost=15234.45..15489.78 rows=102133 width=64) (actual time=245.121..245.138 rows=100 loops=1)
-- Sort Key: (sum(o.total_amount)) DESC, (count(o.order_id)) DESC
-- Sort Method: top-N heapsort Memory: 40kB
-- -> HashAggregate (cost=11234.56..12456.89 rows=102133 width=64) (actual time=198.456..223.789 rows=45678 loops=1)
-- Group Key: u.user_id, u.email, u.first_name, u.last_name, u.created_at
-- -> Hash Left Join (cost=2345.67..8901.23 rows=345678 width=48) (actual time=12.456..89.123 rows=123456 loops=1)
-- Hash Cond: (u.user_id = o.user_id)
-- -> Bitmap Heap Scan on users u (cost=234.56..1789.45 rows=12345 width=32) (actual time=3.456..15.789 rows=8901 loops=1)
-- Recheck Cond: ((status = 'active'::text) AND (country = ANY ('{US,CA,UK}'::text[])) AND (created_at >= '2023-01-01'::date))
-- Heap Blocks: exact=234
-- -> BitmapOr (cost=234.56..234.56 rows=12345 width=0) (actual time=2.890..2.891 rows=0 loops=1)
-- -> Bitmap Index Scan on idx_users_status (cost=0.00..78.12 rows=4567 width=0) (actual time=0.890..0.890 rows=3456 loops=1)
-- Index Cond: (status = 'active'::text)
-- -> Hash (cost=1890.45..1890.45 rows=17890 width=24) (actual time=8.567..8.567 rows=14567 loops=1)
-- Buckets: 32768 Batches: 1 Memory Usage: 798kB
-- -> Seq Scan on orders o (cost=0.00..1890.45 rows=17890 width=24) (actual time=0.123..5.456 rows=14567 loops=1)
-- Filter: ((status = 'completed'::text) OR (status IS NULL))
-- Rows Removed by Filter: 3456
-- Planning Time: 2.456 ms
-- Execution Time: 245.678 ms
-- Problems with traditional PostgreSQL EXPLAIN:
-- 1. Complex output format that's difficult to interpret quickly
-- 2. Limited insights into index selection reasoning and alternatives
-- 3. No built-in recommendations for performance improvements
-- 4. Difficult to compare execution plans across different query variations
-- 5. Limited visibility into buffer usage, I/O patterns, and memory allocation
-- 6. No integration with query optimization recommendations or automated tuning
-- 7. Verbose output that makes it hard to identify key performance bottlenecks
-- 8. Limited historical explain plan tracking and performance trend analysis
-- Alternative PostgreSQL analysis approaches
-- Using pg_stat_statements for query analysis (requires extension)
SELECT
query,
calls,
total_time,
mean_time,
rows,
100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements
WHERE query LIKE '%users%orders%'
ORDER BY mean_time DESC
LIMIT 10;
-- Problems with pg_stat_statements:
-- - Requires additional configuration and extensions
-- - Limited detail about specific execution patterns
-- - No real-time optimization recommendations
-- - Difficult correlation between query patterns and index usage
-- - Limited integration with application performance monitoring
-- MySQL approach (even more limited)
EXPLAIN FORMAT=JSON
SELECT u.user_id, u.email, COUNT(o.order_id) as orders
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE u.status = 'active'
GROUP BY u.user_id, u.email;
-- MySQL EXPLAIN limitations:
-- {
-- "query_block": {
-- "select_id": 1,
-- "cost_info": {
-- "query_cost": "1234.56"
-- },
-- "grouping_operation": {
-- "using_filesort": false,
-- "nested_loop": [
-- {
-- "table": {
-- "table_name": "u",
-- "access_type": "range",
-- "possible_keys": ["idx_status"],
-- "key": "idx_status",
-- "used_key_parts": ["status"],
-- "key_length": "767",
-- "rows_examined_per_scan": 1000,
-- "rows_produced_per_join": 1000,
-- "cost_info": {
-- "read_cost": "200.00",
-- "eval_cost": "100.00",
-- "prefix_cost": "300.00",
-- "data_read_per_join": "64K"
-- }
-- }
-- }
-- ]
-- }
-- }
-- }
-- MySQL EXPLAIN problems:
-- - Very basic cost model with limited accuracy
-- - No detailed execution statistics or actual vs estimated comparisons
-- - Limited index optimization recommendations
-- - Basic JSON format that's difficult to analyze programmatically
-- - No integration with performance monitoring or automated optimization
-- - Limited support for complex query patterns and aggregations
-- - Minimal historical performance tracking capabilities
MongoDB provides comprehensive query analysis and optimization tools:
// MongoDB Advanced Query Optimization - comprehensive explain plans and performance analysis
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('ecommerce_analytics');
// Advanced query optimization and explain plan analysis system
class MongoQueryOptimizer {
constructor(db) {
this.db = db;
this.collections = {
users: db.collection('users'),
orders: db.collection('orders'),
products: db.collection('products'),
analytics: db.collection('analytics')
};
// Performance analysis configuration
this.performanceTargets = {
maxExecutionTimeMs: 100,
maxDocsExamined: 10000,
minIndexHitRate: 0.95,
maxMemoryUsageMB: 32
};
this.optimizationStrategies = new Map();
this.explainCache = new Map();
}
async analyzeQueryPerformance(collection, pipeline, options = {}) {
console.log('Analyzing query performance with comprehensive explain plans...');
const {
verbosity = 'executionStats', // 'queryPlanner', 'executionStats', 'allPlansExecution'
includeRecommendations = true,
compareAlternatives = true,
trackMetrics = true
} = options;
// Get the collection reference
const coll = typeof collection === 'string' ? this.collections[collection] : collection;
// Execute explain with comprehensive analysis
const explainResult = await this.performComprehensiveExplain(coll, pipeline, verbosity);
// Analyze explain plan for optimization opportunities
const analysis = this.analyzeExplainPlan(explainResult);
// Generate optimization recommendations
const recommendations = includeRecommendations ?
await this.generateOptimizationRecommendations(coll, pipeline, explainResult, analysis) : [];
// Compare with alternative query strategies
const alternatives = compareAlternatives ?
await this.generateQueryAlternatives(coll, pipeline, explainResult) : [];
// Track performance metrics for historical analysis
if (trackMetrics) {
await this.recordPerformanceMetrics(coll.collectionName, pipeline, explainResult, analysis);
}
const performanceReport = {
query: {
collection: coll.collectionName,
pipeline: pipeline,
timestamp: new Date()
},
execution: {
totalTimeMs: explainResult.executionStats?.executionTimeMillis || 0,
totalDocsExamined: explainResult.executionStats?.totalDocsExamined || 0,
totalDocsReturned: explainResult.executionStats?.totalDocsReturned || 0,
executionSuccess: explainResult.executionStats?.executionSuccess || false,
indexesUsed: this.extractIndexesUsed(explainResult),
memoryUsage: this.calculateMemoryUsage(explainResult)
},
performance: {
efficiency: this.calculateQueryEfficiency(explainResult),
indexHitRate: this.calculateIndexHitRate(explainResult),
selectivity: this.calculateSelectivity(explainResult),
performanceGrade: this.assignPerformanceGrade(explainResult),
bottlenecks: analysis.bottlenecks,
strengths: analysis.strengths
},
optimization: {
recommendations: recommendations,
alternatives: alternatives,
estimatedImprovement: this.estimateOptimizationImpact(recommendations),
prioritizedActions: this.prioritizeOptimizations(recommendations)
},
explainDetails: explainResult
};
console.log(`Query analysis completed - Performance Grade: ${performanceReport.performance.performanceGrade}`);
console.log(`Execution Time: ${performanceReport.execution.totalTimeMs}ms`);
console.log(`Documents Examined: ${performanceReport.execution.totalDocsExamined}`);
console.log(`Documents Returned: ${performanceReport.execution.totalDocsReturned}`);
console.log(`Index Hit Rate: ${(performanceReport.performance.indexHitRate * 100).toFixed(1)}%`);
return performanceReport;
}
async performComprehensiveExplain(collection, pipeline, verbosity) {
console.log(`Executing explain with verbosity: ${verbosity}`);
try {
// Handle different query types
if (Array.isArray(pipeline)) {
// Aggregation pipeline
const cursor = collection.aggregate(pipeline);
return await cursor.explain(verbosity);
} else if (typeof pipeline === 'object' && pipeline.find) {
// Find query
const cursor = collection.find(pipeline.find, pipeline.options || {});
if (pipeline.sort) cursor.sort(pipeline.sort);
if (pipeline.limit) cursor.limit(pipeline.limit);
if (pipeline.skip) cursor.skip(pipeline.skip);
return await cursor.explain(verbosity);
} else {
// Simple find query
const cursor = collection.find(pipeline);
return await cursor.explain(verbosity);
}
} catch (error) {
console.error('Explain execution failed:', error);
return {
error: error.message,
executionSuccess: false,
executionTimeMillis: 0
};
}
}
analyzeExplainPlan(explainResult) {
console.log('Analyzing explain plan for performance insights...');
const analysis = {
queryType: this.identifyQueryType(explainResult),
executionPattern: this.analyzeExecutionPattern(explainResult),
indexUsage: this.analyzeIndexUsage(explainResult),
bottlenecks: [],
strengths: [],
riskFactors: [],
optimizationOpportunities: []
};
// Identify performance bottlenecks
analysis.bottlenecks = this.identifyBottlenecks(explainResult);
// Identify query strengths
analysis.strengths = this.identifyStrengths(explainResult);
// Identify risk factors
analysis.riskFactors = this.identifyRiskFactors(explainResult);
// Identify optimization opportunities
analysis.optimizationOpportunities = this.identifyOptimizationOpportunities(explainResult);
return analysis;
}
identifyBottlenecks(explainResult) {
const bottlenecks = [];
const stats = explainResult.executionStats;
if (!stats) return bottlenecks;
// Collection scan bottleneck
if (this.hasCollectionScan(explainResult)) {
bottlenecks.push({
type: 'COLLECTION_SCAN',
severity: 'HIGH',
description: 'Query performs collection scan instead of using index',
impact: 'High CPU and I/O usage, poor scalability',
docsExamined: stats.totalDocsExamined
});
}
// Poor index selectivity
const selectivity = this.calculateSelectivity(explainResult);
if (selectivity < 0.1) {
bottlenecks.push({
type: 'POOR_SELECTIVITY',
severity: 'MEDIUM',
description: 'Index selectivity is poor, examining many unnecessary documents',
impact: 'Increased I/O and processing time',
selectivity: selectivity,
docsExamined: stats.totalDocsExamined,
docsReturned: stats.totalDocsReturned
});
}
// High execution time
if (stats.executionTimeMillis > this.performanceTargets.maxExecutionTimeMs) {
bottlenecks.push({
type: 'HIGH_EXECUTION_TIME',
severity: 'HIGH',
description: 'Query execution time exceeds performance target',
impact: 'User experience degradation, resource contention',
executionTime: stats.executionTimeMillis,
target: this.performanceTargets.maxExecutionTimeMs
});
}
// Sort without index
if (this.hasSortWithoutIndex(explainResult)) {
bottlenecks.push({
type: 'SORT_WITHOUT_INDEX',
severity: 'MEDIUM',
description: 'Sort operation performed in memory without index support',
impact: 'High memory usage, slower sort performance',
memoryUsage: this.calculateSortMemoryUsage(explainResult)
});
}
// Large result set without limit
if (stats.totalDocsReturned > 1000 && !this.hasLimit(explainResult)) {
bottlenecks.push({
type: 'LARGE_RESULT_SET',
severity: 'MEDIUM',
description: 'Query returns large number of documents without limit',
impact: 'High memory usage, network overhead',
docsReturned: stats.totalDocsReturned
});
}
return bottlenecks;
}
identifyStrengths(explainResult) {
const strengths = [];
const stats = explainResult.executionStats;
if (!stats) return strengths;
// Efficient index usage
if (this.hasEfficientIndexUsage(explainResult)) {
strengths.push({
type: 'EFFICIENT_INDEX_USAGE',
description: 'Query uses indexes efficiently with good selectivity',
indexesUsed: this.extractIndexesUsed(explainResult),
selectivity: this.calculateSelectivity(explainResult)
});
}
// Fast execution time
if (stats.executionTimeMillis < this.performanceTargets.maxExecutionTimeMs * 0.5) {
strengths.push({
type: 'FAST_EXECUTION',
description: 'Query executes well below performance targets',
executionTime: stats.executionTimeMillis,
target: this.performanceTargets.maxExecutionTimeMs
});
}
// Covered query
if (this.isCoveredQuery(explainResult)) {
strengths.push({
type: 'COVERED_QUERY',
description: 'Query is covered entirely by index, no document retrieval needed',
indexesUsed: this.extractIndexesUsed(explainResult)
});
}
// Good result set size management
if (stats.totalDocsReturned < 100 || this.hasLimit(explainResult)) {
strengths.push({
type: 'APPROPRIATE_RESULT_SIZE',
description: 'Query returns appropriate number of documents',
docsReturned: stats.totalDocsReturned,
hasLimit: this.hasLimit(explainResult)
});
}
return strengths;
}
async generateOptimizationRecommendations(collection, pipeline, explainResult, analysis) {
console.log('Generating optimization recommendations...');
const recommendations = [];
// Index recommendations based on bottlenecks
for (const bottleneck of analysis.bottlenecks) {
switch (bottleneck.type) {
case 'COLLECTION_SCAN':
recommendations.push({
type: 'CREATE_INDEX',
priority: 'HIGH',
description: 'Create index to eliminate collection scan',
action: await this.suggestIndexForQuery(collection, pipeline, explainResult),
estimatedImprovement: '80-95% reduction in execution time',
implementation: 'Create compound index on filtered and sorted fields'
});
break;
case 'POOR_SELECTIVITY':
recommendations.push({
type: 'IMPROVE_INDEX_SELECTIVITY',
priority: 'MEDIUM',
description: 'Improve index selectivity with partial index or compound index',
action: await this.suggestSelectivityImprovement(collection, pipeline, explainResult),
estimatedImprovement: '30-60% reduction in documents examined',
implementation: 'Add partial filter or reorganize compound index field order'
});
break;
case 'SORT_WITHOUT_INDEX':
recommendations.push({
type: 'INDEX_FOR_SORT',
priority: 'MEDIUM',
description: 'Create or modify index to support sort operation',
action: await this.suggestSortIndex(collection, pipeline, explainResult),
estimatedImprovement: '50-80% reduction in memory usage and sort time',
implementation: 'Include sort fields in compound index following ESR pattern'
});
break;
case 'LARGE_RESULT_SET':
recommendations.push({
type: 'LIMIT_RESULT_SET',
priority: 'LOW',
description: 'Add pagination or result limiting to reduce memory usage',
action: 'Add $limit stage or implement pagination',
estimatedImprovement: 'Reduced memory usage and network overhead',
implementation: 'Implement cursor-based pagination or reasonable limits'
});
break;
}
}
// Query restructuring recommendations
const structuralRecs = await this.suggestQueryRestructuring(collection, pipeline, explainResult);
recommendations.push(...structuralRecs);
// Aggregation pipeline optimization
if (Array.isArray(pipeline)) {
const pipelineRecs = await this.suggestPipelineOptimizations(pipeline, explainResult);
recommendations.push(...pipelineRecs);
}
return recommendations;
}
async generateQueryAlternatives(collection, pipeline, explainResult) {
console.log('Generating alternative query strategies...');
const alternatives = [];
// Test different index hints
const indexAlternatives = await this.testIndexAlternatives(collection, pipeline);
alternatives.push(...indexAlternatives);
// Test different aggregation pipeline orders
if (Array.isArray(pipeline)) {
const pipelineAlternatives = await this.testPipelineAlternatives(collection, pipeline);
alternatives.push(...pipelineAlternatives);
}
// Test query restructuring alternatives
const structuralAlternatives = await this.testStructuralAlternatives(collection, pipeline);
alternatives.push(...structuralAlternatives);
return alternatives;
}
async suggestIndexForQuery(collection, pipeline, explainResult) {
// Analyze query pattern to suggest optimal index
const queryFields = this.extractQueryFields(pipeline);
const sortFields = this.extractSortFields(pipeline);
const indexSuggestion = {
fields: {},
options: {}
};
// Apply ESR (Equality, Sort, Range) pattern
const equalityFields = queryFields.equality || [];
const rangeFields = queryFields.range || [];
// Add equality fields first
equalityFields.forEach(field => {
indexSuggestion.fields[field] = 1;
});
// Add sort fields
if (sortFields) {
Object.entries(sortFields).forEach(([field, direction]) => {
indexSuggestion.fields[field] = direction;
});
}
// Add range fields last
rangeFields.forEach(field => {
if (!indexSuggestion.fields[field]) {
indexSuggestion.fields[field] = 1;
}
});
// Suggest partial index if selective filters present
if (queryFields.selective && queryFields.selective.length > 0) {
indexSuggestion.options.partialFilterExpression = this.buildPartialFilter(queryFields.selective);
}
return {
indexSpec: indexSuggestion.fields,
indexOptions: indexSuggestion.options,
createCommand: `db.${collection.collectionName}.createIndex(${JSON.stringify(indexSuggestion.fields)}, ${JSON.stringify(indexSuggestion.options)})`,
explanation: this.explainIndexSuggestion(indexSuggestion, queryFields, sortFields)
};
}
calculateQueryEfficiency(explainResult) {
const stats = explainResult.executionStats;
if (!stats) return 0;
const docsExamined = stats.totalDocsExamined || 0;
const docsReturned = stats.totalDocsReturned || 0;
if (docsExamined === 0) return 1;
return Math.min(1, docsReturned / docsExamined);
}
calculateIndexHitRate(explainResult) {
if (this.hasCollectionScan(explainResult)) return 0;
const indexUsage = this.analyzeIndexUsage(explainResult);
return indexUsage.effectiveness || 0.5;
}
calculateSelectivity(explainResult) {
const stats = explainResult.executionStats;
if (!stats) return 0;
const docsExamined = stats.totalDocsExamined || 0;
const docsReturned = stats.totalDocsReturned || 0;
if (docsExamined === 0) return 1;
return docsReturned / docsExamined;
}
assignPerformanceGrade(explainResult) {
const efficiency = this.calculateQueryEfficiency(explainResult);
const indexHitRate = this.calculateIndexHitRate(explainResult);
const stats = explainResult.executionStats;
const executionTime = stats?.executionTimeMillis || 0;
let score = 0;
// Efficiency scoring (40% weight)
if (efficiency >= 0.9) score += 40;
else if (efficiency >= 0.7) score += 30;
else if (efficiency >= 0.5) score += 20;
else if (efficiency >= 0.2) score += 10;
// Index usage scoring (35% weight)
if (indexHitRate >= 0.95) score += 35;
else if (indexHitRate >= 0.8) score += 25;
else if (indexHitRate >= 0.5) score += 15;
else if (indexHitRate >= 0.2) score += 5;
// Execution time scoring (25% weight)
if (executionTime <= 50) score += 25;
else if (executionTime <= 100) score += 20;
else if (executionTime <= 250) score += 15;
else if (executionTime <= 500) score += 10;
else if (executionTime <= 1000) score += 5;
// Convert to letter grade
if (score >= 85) return 'A';
else if (score >= 75) return 'B';
else if (score >= 65) return 'C';
else if (score >= 50) return 'D';
else return 'F';
}
// Helper methods for detailed analysis
hasCollectionScan(explainResult) {
return this.findStageInPlan(explainResult, 'COLLSCAN') !== null;
}
hasSortWithoutIndex(explainResult) {
const sortStage = this.findStageInPlan(explainResult, 'SORT');
return sortStage !== null && !sortStage.inputStage?.stage?.includes('IXSCAN');
}
hasLimit(explainResult) {
return this.findStageInPlan(explainResult, 'LIMIT') !== null;
}
isCoveredQuery(explainResult) {
// Check if query is covered by examining projection and index keys
const projectionStage = this.findStageInPlan(explainResult, 'PROJECTION_COVERED');
return projectionStage !== null;
}
hasEfficientIndexUsage(explainResult) {
const selectivity = this.calculateSelectivity(explainResult);
const indexHitRate = this.calculateIndexHitRate(explainResult);
return selectivity > 0.1 && indexHitRate > 0.8;
}
findStageInPlan(explainResult, stageName) {
// Recursively search through execution plan for specific stage
const searchStage = (stage) => {
if (!stage) return null;
if (stage.stage === stageName) return stage;
if (stage.inputStage) {
const result = searchStage(stage.inputStage);
if (result) return result;
}
if (stage.inputStages) {
for (const inputStage of stage.inputStages) {
const result = searchStage(inputStage);
if (result) return result;
}
}
return null;
};
const executionStats = explainResult.executionStats;
if (executionStats?.executionStages) {
return searchStage(executionStats.executionStages);
}
return null;
}
extractIndexesUsed(explainResult) {
const indexes = new Set();
const findIndexes = (stage) => {
if (!stage) return;
if (stage.indexName) {
indexes.add(stage.indexName);
}
if (stage.inputStage) {
findIndexes(stage.inputStage);
}
if (stage.inputStages) {
stage.inputStages.forEach(inputStage => findIndexes(inputStage));
}
};
const executionStats = explainResult.executionStats;
if (executionStats?.executionStages) {
findIndexes(executionStats.executionStages);
}
return Array.from(indexes);
}
extractQueryFields(pipeline) {
// Extract fields used in query conditions
const fields = {
equality: [],
range: [],
selective: []
};
if (Array.isArray(pipeline)) {
// Aggregation pipeline
pipeline.forEach(stage => {
if (stage.$match) {
this.extractFieldsFromMatch(stage.$match, fields);
}
});
} else if (typeof pipeline === 'object') {
// Find query
if (pipeline.find) {
this.extractFieldsFromMatch(pipeline.find, fields);
} else {
this.extractFieldsFromMatch(pipeline, fields);
}
}
return fields;
}
extractFieldsFromMatch(matchStage, fields) {
Object.entries(matchStage).forEach(([field, condition]) => {
if (field.startsWith('$')) return; // Skip operators
if (typeof condition === 'object' && condition !== null) {
const operators = Object.keys(condition);
if (operators.some(op => ['$gt', '$gte', '$lt', '$lte'].includes(op))) {
fields.range.push(field);
} else if (operators.includes('$in')) {
if (condition.$in.length <= 5) {
fields.selective.push(field);
} else {
fields.equality.push(field);
}
} else {
fields.equality.push(field);
}
} else {
fields.equality.push(field);
}
});
}
extractSortFields(pipeline) {
if (Array.isArray(pipeline)) {
for (const stage of pipeline) {
if (stage.$sort) {
return stage.$sort;
}
}
} else if (pipeline.sort) {
return pipeline.sort;
}
return null;
}
async recordPerformanceMetrics(collectionName, pipeline, explainResult, analysis) {
try {
const metrics = {
timestamp: new Date(),
collection: collectionName,
queryHash: this.generateQueryHash(pipeline),
pipeline: pipeline,
execution: {
timeMs: explainResult.executionStats?.executionTimeMillis || 0,
docsExamined: explainResult.executionStats?.totalDocsExamined || 0,
docsReturned: explainResult.executionStats?.totalDocsReturned || 0,
indexesUsed: this.extractIndexesUsed(explainResult),
success: explainResult.executionStats?.executionSuccess !== false
},
performance: {
efficiency: this.calculateQueryEfficiency(explainResult),
indexHitRate: this.calculateIndexHitRate(explainResult),
selectivity: this.calculateSelectivity(explainResult),
grade: this.assignPerformanceGrade(explainResult)
},
analysis: {
bottleneckCount: analysis.bottlenecks.length,
strengthCount: analysis.strengths.length,
queryType: analysis.queryType,
riskLevel: this.calculateRiskLevel(analysis.riskFactors)
}
};
await this.collections.analytics.insertOne(metrics);
} catch (error) {
console.warn('Failed to record performance metrics:', error.message);
}
}
generateQueryHash(pipeline) {
// Generate consistent hash for query pattern identification
const queryString = JSON.stringify(pipeline, Object.keys(pipeline).sort());
return require('crypto').createHash('md5').update(queryString).digest('hex');
}
calculateMemoryUsage(explainResult) {
// Estimate memory usage from explain plan
let memoryUsage = 0;
const sortStage = this.findStageInPlan(explainResult, 'SORT');
if (sortStage) {
// Estimate sort memory usage
memoryUsage += (explainResult.executionStats?.totalDocsExamined || 0) * 0.001; // Rough estimate
}
return memoryUsage;
}
calculateSortMemoryUsage(explainResult) {
const stats = explainResult.executionStats;
if (!stats) return 0;
// Estimate memory usage for in-memory sort
const avgDocSize = 1024; // Estimated average document size in bytes
const docsToSort = stats.totalDocsExamined || 0;
return (docsToSort * avgDocSize) / (1024 * 1024); // Convert to MB
}
async performBatchQueryAnalysis(queries) {
console.log(`Analyzing batch of ${queries.length} queries...`);
const results = [];
const batchMetrics = {
totalQueries: queries.length,
analyzedSuccessfully: 0,
averageExecutionTime: 0,
averageEfficiency: 0,
gradeDistribution: { A: 0, B: 0, C: 0, D: 0, F: 0 },
commonBottlenecks: new Map(),
recommendationFrequency: new Map()
};
for (let i = 0; i < queries.length; i++) {
const query = queries[i];
console.log(`Analyzing query ${i + 1}/${queries.length}: ${query.name || 'Unnamed'}`);
try {
const analysis = await this.analyzeQueryPerformance(query.collection, query.pipeline, query.options);
results.push({
queryIndex: i,
queryName: query.name || `Query_${i + 1}`,
analysis: analysis,
success: true
});
// Update batch metrics
batchMetrics.analyzedSuccessfully++;
batchMetrics.averageExecutionTime += analysis.execution.totalTimeMs;
batchMetrics.averageEfficiency += analysis.performance.efficiency;
batchMetrics.gradeDistribution[analysis.performance.performanceGrade]++;
// Track common bottlenecks
analysis.performance.bottlenecks.forEach(bottleneck => {
const count = batchMetrics.commonBottlenecks.get(bottleneck.type) || 0;
batchMetrics.commonBottlenecks.set(bottleneck.type, count + 1);
});
// Track recommendation frequency
analysis.optimization.recommendations.forEach(rec => {
const count = batchMetrics.recommendationFrequency.get(rec.type) || 0;
batchMetrics.recommendationFrequency.set(rec.type, count + 1);
});
} catch (error) {
console.error(`Query ${i + 1} analysis failed:`, error.message);
results.push({
queryIndex: i,
queryName: query.name || `Query_${i + 1}`,
error: error.message,
success: false
});
}
}
// Calculate final batch metrics
if (batchMetrics.analyzedSuccessfully > 0) {
batchMetrics.averageExecutionTime /= batchMetrics.analyzedSuccessfully;
batchMetrics.averageEfficiency /= batchMetrics.analyzedSuccessfully;
}
// Convert Maps to Objects for JSON serialization
batchMetrics.commonBottlenecks = Object.fromEntries(batchMetrics.commonBottlenecks);
batchMetrics.recommendationFrequency = Object.fromEntries(batchMetrics.recommendationFrequency);
console.log(`Batch analysis completed: ${batchMetrics.analyzedSuccessfully}/${batchMetrics.totalQueries} queries analyzed successfully`);
console.log(`Average execution time: ${batchMetrics.averageExecutionTime.toFixed(2)}ms`);
console.log(`Average efficiency: ${(batchMetrics.averageEfficiency * 100).toFixed(1)}%`);
return {
results: results,
batchMetrics: batchMetrics,
summary: {
totalAnalyzed: batchMetrics.analyzedSuccessfully,
averagePerformance: batchMetrics.averageEfficiency,
mostCommonBottleneck: this.getMostCommon(batchMetrics.commonBottlenecks),
mostCommonRecommendation: this.getMostCommon(batchMetrics.recommendationFrequency),
performanceDistribution: batchMetrics.gradeDistribution
}
};
}
getMostCommon(frequency) {
let maxCount = 0;
let mostCommon = null;
Object.entries(frequency).forEach(([key, count]) => {
if (count > maxCount) {
maxCount = count;
mostCommon = key;
}
});
return { type: mostCommon, count: maxCount };
}
// Additional helper methods for comprehensive analysis...
identifyQueryType(explainResult) {
if (this.findStageInPlan(explainResult, 'GROUP')) return 'aggregation';
if (this.findStageInPlan(explainResult, 'SORT')) return 'sorted_query';
if (this.hasLimit(explainResult)) return 'limited_query';
return 'simple_query';
}
analyzeExecutionPattern(explainResult) {
const pattern = {
hasIndexScan: this.findStageInPlan(explainResult, 'IXSCAN') !== null,
hasCollectionScan: this.hasCollectionScan(explainResult),
hasSort: this.findStageInPlan(explainResult, 'SORT') !== null,
hasGroup: this.findStageInPlan(explainResult, 'GROUP') !== null,
hasLimit: this.hasLimit(explainResult)
};
return pattern;
}
analyzeIndexUsage(explainResult) {
const indexesUsed = this.extractIndexesUsed(explainResult);
const hasCollScan = this.hasCollectionScan(explainResult);
return {
indexCount: indexesUsed.length,
indexes: indexesUsed,
hasCollectionScan: hasCollScan,
effectiveness: hasCollScan ? 0 : Math.min(1, this.calculateSelectivity(explainResult))
};
}
identifyRiskFactors(explainResult) {
const risks = [];
const stats = explainResult.executionStats;
if (stats?.totalDocsExamined > 100000) {
risks.push({
type: 'HIGH_DOCUMENT_EXAMINATION',
description: 'Query examines very large number of documents',
impact: 'Scalability concerns, resource intensive'
});
}
if (this.hasCollectionScan(explainResult)) {
risks.push({
type: 'COLLECTION_SCAN_SCALING',
description: 'Collection scan will degrade with data growth',
impact: 'Linear performance degradation as data grows'
});
}
return risks;
}
identifyOptimizationOpportunities(explainResult) {
const opportunities = [];
if (this.hasCollectionScan(explainResult)) {
opportunities.push({
type: 'INDEX_CREATION',
description: 'Create appropriate indexes to eliminate collection scans',
impact: 'Significant performance improvement'
});
}
if (this.hasSortWithoutIndex(explainResult)) {
opportunities.push({
type: 'SORT_OPTIMIZATION',
description: 'Optimize index to support sort operations',
impact: 'Reduced memory usage and faster sorting'
});
}
return opportunities;
}
calculateRiskLevel(riskFactors) {
if (riskFactors.length === 0) return 'LOW';
if (riskFactors.some(r => r.type.includes('HIGH') || r.type.includes('CRITICAL'))) return 'HIGH';
if (riskFactors.length > 2) return 'MEDIUM';
return 'LOW';
}
}
// Benefits of MongoDB Query Optimization and Explain Plans:
// - Comprehensive execution plan analysis with detailed performance metrics
// - Automatic bottleneck identification and optimization recommendations
// - Advanced index usage analysis and index suggestion algorithms
// - Real-time query performance monitoring and historical trending
// - Intelligent query alternative generation and comparative analysis
// - Integration with aggregation pipeline optimization techniques
// - Detailed memory usage analysis and resource consumption tracking
// - Batch query analysis capabilities for application-wide performance review
// - Automated performance grading and risk assessment
// - Production-ready performance monitoring and alerting integration
module.exports = {
MongoQueryOptimizer
};
Understanding MongoDB Query Optimization Architecture
Advanced Query Analysis Techniques and Performance Tuning
Implement sophisticated query analysis patterns for production optimization:
// Advanced query optimization patterns and performance monitoring
class AdvancedQueryAnalyzer {
constructor(db) {
this.db = db;
this.performanceHistory = new Map();
this.optimizationRules = new Map();
this.alertThresholds = {
executionTimeMs: 1000,
docsExaminedRatio: 10,
indexHitRate: 0.8
};
}
async implementRealTimePerformanceMonitoring(collections) {
console.log('Setting up real-time query performance monitoring...');
// Enable database profiling for detailed query analysis
await this.db.runCommand({
profile: 2, // Profile all operations
slowms: 100, // Log operations slower than 100ms
sampleRate: 0.1 // Sample 10% of operations
});
// Create performance monitoring aggregation pipeline
const monitoringPipeline = [
{
$match: {
ts: { $gte: new Date(Date.now() - 60000) }, // Last minute
ns: { $in: collections.map(col => `${this.db.databaseName}.${col}`) },
command: { $exists: true }
}
},
{
$addFields: {
queryType: {
$switch: {
branches: [
{ case: { $ne: ['$command.find', null] }, then: 'find' },
{ case: { $ne: ['$command.aggregate', null] }, then: 'aggregate' },
{ case: { $ne: ['$command.update', null] }, then: 'update' },
{ case: { $ne: ['$command.delete', null] }, then: 'delete' }
],
default: 'other'
}
},
// Extract query shape for pattern analysis
queryShape: {
$switch: {
branches: [
{
case: { $ne: ['$command.find', null] },
then: { $objectToArray: { $ifNull: ['$command.filter', {}] } }
},
{
case: { $ne: ['$command.aggregate', null] },
then: { $arrayElemAt: ['$command.pipeline', 0] }
}
],
default: {}
}
},
// Performance metrics calculation
efficiency: {
$cond: {
if: { $gt: ['$docsExamined', 0] },
then: { $divide: ['$nreturned', '$docsExamined'] },
else: 1
}
},
// Index usage assessment
indexUsed: {
$cond: {
if: { $ne: ['$planSummary', null] },
then: { $not: { $regexMatch: { input: '$planSummary', regex: 'COLLSCAN' } } },
else: false
}
}
}
},
{
$group: {
_id: {
collection: { $arrayElemAt: [{ $split: ['$ns', '.'] }, 1] },
queryType: '$queryType',
queryShape: '$queryShape'
},
// Aggregated performance metrics
avgExecutionTime: { $avg: '$millis' },
maxExecutionTime: { $max: '$millis' },
totalQueries: { $sum: 1 },
avgEfficiency: { $avg: '$efficiency' },
avgDocsExamined: { $avg: '$docsExamined' },
avgDocsReturned: { $avg: '$nreturned' },
indexUsageRate: { $avg: { $cond: ['$indexUsed', 1, 0] } },
// Query examples for further analysis
sampleQueries: { $push: { command: '$command', millis: '$millis' } }
}
},
{
$match: {
$or: [
{ avgExecutionTime: { $gt: this.alertThresholds.executionTimeMs } },
{ avgEfficiency: { $lt: 0.1 } },
{ indexUsageRate: { $lt: this.alertThresholds.indexHitRate } }
]
}
},
{
$sort: { avgExecutionTime: -1 }
}
];
try {
const performanceIssues = await this.db.collection('system.profile')
.aggregate(monitoringPipeline).toArray();
// Process identified performance issues
for (const issue of performanceIssues) {
await this.processPerformanceIssue(issue);
}
console.log(`Performance monitoring identified ${performanceIssues.length} potential issues`);
return performanceIssues;
} catch (error) {
console.error('Performance monitoring failed:', error);
return [];
}
}
async processPerformanceIssue(issue) {
const issueSignature = this.generateIssueSignature(issue);
// Check if this issue has been seen before
if (this.performanceHistory.has(issueSignature)) {
const history = this.performanceHistory.get(issueSignature);
history.occurrences++;
history.lastSeen = new Date();
// Escalate if recurring issue
if (history.occurrences > 5) {
await this.escalatePerformanceIssue(issue, history);
}
} else {
// New issue, add to tracking
this.performanceHistory.set(issueSignature, {
firstSeen: new Date(),
lastSeen: new Date(),
occurrences: 1,
issue: issue
});
}
// Generate optimization recommendations
const recommendations = await this.generateRealtimeRecommendations(issue);
// Log performance alert
await this.logPerformanceAlert({
timestamp: new Date(),
collection: issue._id.collection,
queryType: issue._id.queryType,
severity: this.calculateSeverity(issue),
metrics: {
avgExecutionTime: issue.avgExecutionTime,
avgEfficiency: issue.avgEfficiency,
indexUsageRate: issue.indexUsageRate,
totalQueries: issue.totalQueries
},
recommendations: recommendations,
issueSignature: issueSignature
});
}
async generateRealtimeRecommendations(issue) {
const recommendations = [];
// Low index usage rate
if (issue.indexUsageRate < this.alertThresholds.indexHitRate) {
recommendations.push({
type: 'INDEX_OPTIMIZATION',
priority: 'HIGH',
description: `Collection ${issue._id.collection} has low index usage rate (${(issue.indexUsageRate * 100).toFixed(1)}%)`,
action: 'Analyze query patterns and create appropriate indexes',
queryType: issue._id.queryType
});
}
// High execution time
if (issue.avgExecutionTime > this.alertThresholds.executionTimeMs) {
recommendations.push({
type: 'PERFORMANCE_OPTIMIZATION',
priority: 'HIGH',
description: `Queries on ${issue._id.collection} averaging ${issue.avgExecutionTime.toFixed(2)}ms execution time`,
action: 'Review query structure and index strategy',
queryType: issue._id.queryType
});
}
// Poor efficiency
if (issue.avgEfficiency < 0.1) {
recommendations.push({
type: 'SELECTIVITY_IMPROVEMENT',
priority: 'MEDIUM',
description: `Poor query selectivity detected (${(issue.avgEfficiency * 100).toFixed(1)}% efficiency)`,
action: 'Implement more selective query filters or partial indexes',
queryType: issue._id.queryType
});
}
return recommendations;
}
async performHistoricalPerformanceAnalysis(timeRange = '7d') {
console.log(`Performing historical performance analysis for ${timeRange}...`);
const timeRangeMs = this.parseTimeRange(timeRange);
const startDate = new Date(Date.now() - timeRangeMs);
const historicalAnalysis = await this.db.collection('system.profile').aggregate([
{
$match: {
ts: { $gte: startDate },
command: { $exists: true },
millis: { $exists: true }
}
},
{
$addFields: {
hour: { $dateToString: { format: '%Y-%m-%d-%H', date: '$ts' } },
collection: { $arrayElemAt: [{ $split: ['$ns', '.'] }, 1] },
queryType: {
$switch: {
branches: [
{ case: { $ne: ['$command.find', null] }, then: 'find' },
{ case: { $ne: ['$command.aggregate', null] }, then: 'aggregate' },
{ case: { $ne: ['$command.update', null] }, then: 'update' }
],
default: 'other'
}
}
}
},
{
$group: {
_id: {
hour: '$hour',
collection: '$collection',
queryType: '$queryType'
},
// Time-based metrics
queryCount: { $sum: 1 },
avgLatency: { $avg: '$millis' },
maxLatency: { $max: '$millis' },
p95Latency: {
$percentile: {
input: '$millis',
p: [0.95],
method: 'approximate'
}
},
// Efficiency metrics
totalDocsExamined: { $sum: '$docsExamined' },
totalDocsReturned: { $sum: '$nreturned' },
avgEfficiency: {
$avg: {
$cond: {
if: { $gt: ['$docsExamined', 0] },
then: { $divide: ['$nreturned', '$docsExamined'] },
else: 1
}
}
},
// Index usage tracking
collectionScans: {
$sum: {
$cond: [
{ $regexMatch: { input: { $ifNull: ['$planSummary', ''] }, regex: 'COLLSCAN' } },
1,
0
]
}
}
}
},
{
$addFields: {
indexUsageRate: {
$subtract: [1, { $divide: ['$collectionScans', '$queryCount'] }]
},
// Performance trend calculation
performanceScore: {
$add: [
{ $multiply: [{ $min: [1, { $divide: [1000, '$avgLatency'] }] }, 0.4] },
{ $multiply: ['$avgEfficiency', 0.3] },
{ $multiply: ['$indexUsageRate', 0.3] }
]
}
}
},
{
$sort: { '_id.hour': 1, performanceScore: 1 }
}
]).toArray();
// Analyze trends and patterns
const trendAnalysis = this.analyzePerformanceTrends(historicalAnalysis);
const recommendations = this.generateHistoricalRecommendations(trendAnalysis);
return {
timeRange: timeRange,
analysis: historicalAnalysis,
trends: trendAnalysis,
recommendations: recommendations,
summary: {
totalHours: new Set(historicalAnalysis.map(h => h._id.hour)).size,
collectionsAnalyzed: new Set(historicalAnalysis.map(h => h._id.collection)).size,
avgPerformanceScore: historicalAnalysis.reduce((sum, h) => sum + h.performanceScore, 0) / historicalAnalysis.length,
worstPerformingHour: historicalAnalysis[0],
bestPerformingHour: historicalAnalysis[historicalAnalysis.length - 1]
}
};
}
analyzePerformanceTrends(historicalData) {
const trends = {
latencyTrend: this.calculateTrend(historicalData, 'avgLatency'),
throughputTrend: this.calculateTrend(historicalData, 'queryCount'),
efficiencyTrend: this.calculateTrend(historicalData, 'avgEfficiency'),
indexUsageTrend: this.calculateTrend(historicalData, 'indexUsageRate'),
// Peak usage analysis
peakHours: this.identifyPeakHours(historicalData),
// Performance degradation detection
degradationPeriods: this.identifyDegradationPeriods(historicalData),
// Collection-specific trends
collectionTrends: this.analyzeCollectionTrends(historicalData)
};
return trends;
}
calculateTrend(data, metric) {
if (data.length < 2) return { direction: 'stable', magnitude: 0 };
const values = data.map(d => d[metric]).filter(v => v != null);
const n = values.length;
if (n < 2) return { direction: 'stable', magnitude: 0 };
// Simple linear regression for trend calculation
const xSum = (n * (n + 1)) / 2;
const ySum = values.reduce((sum, val) => sum + val, 0);
const xySum = values.reduce((sum, val, i) => sum + val * (i + 1), 0);
const x2Sum = (n * (n + 1) * (2 * n + 1)) / 6;
const slope = (n * xySum - xSum * ySum) / (n * x2Sum - xSum * xSum);
const magnitude = Math.abs(slope);
let direction = 'stable';
if (slope > magnitude * 0.1) direction = 'improving';
else if (slope < -magnitude * 0.1) direction = 'degrading';
return { direction, magnitude, slope };
}
async implementAutomatedOptimization(collectionName, optimizationRules) {
console.log(`Implementing automated optimization for ${collectionName}...`);
const collection = this.db.collection(collectionName);
const optimizationResults = [];
for (const rule of optimizationRules) {
try {
switch (rule.type) {
case 'AUTO_INDEX_CREATION':
const indexResult = await this.createOptimizedIndex(collection, rule);
optimizationResults.push(indexResult);
break;
case 'QUERY_REWRITE':
const rewriteResult = await this.implementQueryRewrite(collection, rule);
optimizationResults.push(rewriteResult);
break;
case 'AGGREGATION_OPTIMIZATION':
const aggResult = await this.optimizeAggregationPipeline(collection, rule);
optimizationResults.push(aggResult);
break;
default:
console.warn(`Unknown optimization rule type: ${rule.type}`);
}
} catch (error) {
console.error(`Optimization rule ${rule.type} failed:`, error);
optimizationResults.push({
rule: rule.type,
success: false,
error: error.message
});
}
}
// Validate optimization effectiveness
const validationResults = await this.validateOptimizations(collection, optimizationResults);
return {
collection: collectionName,
optimizationsApplied: optimizationResults,
validation: validationResults,
summary: {
totalRules: optimizationRules.length,
successful: optimizationResults.filter(r => r.success).length,
failed: optimizationResults.filter(r => !r.success).length
}
};
}
async createOptimizedIndex(collection, rule) {
console.log(`Creating optimized index: ${rule.indexName}`);
try {
const indexSpec = rule.indexSpec;
const indexOptions = rule.indexOptions || {};
// Add background: true for production safety
indexOptions.background = true;
await collection.createIndex(indexSpec, {
name: rule.indexName,
...indexOptions
});
// Test index effectiveness
const testResult = await this.testIndexEffectiveness(collection, rule);
return {
rule: 'AUTO_INDEX_CREATION',
indexName: rule.indexName,
indexSpec: indexSpec,
success: true,
effectiveness: testResult,
message: `Index ${rule.indexName} created successfully`
};
} catch (error) {
return {
rule: 'AUTO_INDEX_CREATION',
indexName: rule.indexName,
success: false,
error: error.message
};
}
}
async testIndexEffectiveness(collection, rule) {
if (!rule.testQuery) return { tested: false };
try {
// Execute test query with explain
const explainResult = await collection.find(rule.testQuery).explain('executionStats');
const effectiveness = {
tested: true,
indexUsed: !this.hasCollectionScan(explainResult),
executionTimeMs: explainResult.executionStats?.executionTimeMillis || 0,
docsExamined: explainResult.executionStats?.totalDocsExamined || 0,
docsReturned: explainResult.executionStats?.totalDocsReturned || 0,
efficiency: this.calculateQueryEfficiency(explainResult)
};
return effectiveness;
} catch (error) {
return {
tested: false,
error: error.message
};
}
}
// Additional helper methods...
generateIssueSignature(issue) {
const key = JSON.stringify({
collection: issue._id.collection,
queryType: issue._id.queryType,
queryShape: issue._id.queryShape
});
return require('crypto').createHash('md5').update(key).digest('hex');
}
calculateSeverity(issue) {
let score = 0;
if (issue.avgExecutionTime > 2000) score += 3;
else if (issue.avgExecutionTime > 1000) score += 2;
else if (issue.avgExecutionTime > 500) score += 1;
if (issue.avgEfficiency < 0.05) score += 3;
else if (issue.avgEfficiency < 0.1) score += 2;
else if (issue.avgEfficiency < 0.2) score += 1;
if (issue.indexUsageRate < 0.5) score += 2;
else if (issue.indexUsageRate < 0.8) score += 1;
if (score >= 6) return 'CRITICAL';
else if (score >= 4) return 'HIGH';
else if (score >= 2) return 'MEDIUM';
else return 'LOW';
}
parseTimeRange(timeRange) {
const units = {
'd': 24 * 60 * 60 * 1000,
'h': 60 * 60 * 1000,
'm': 60 * 1000
};
const match = timeRange.match(/(\d+)([dhm])/);
if (!match) return 7 * 24 * 60 * 60 * 1000; // Default 7 days
const [, amount, unit] = match;
return parseInt(amount) * units[unit];
}
async logPerformanceAlert(alert) {
try {
await this.db.collection('performance_alerts').insertOne(alert);
} catch (error) {
console.warn('Failed to log performance alert:', error.message);
}
}
}
SQL-Style Query Analysis with QueryLeaf
QueryLeaf provides familiar SQL syntax for MongoDB query optimization and explain plan analysis:
-- QueryLeaf query optimization with SQL-familiar EXPLAIN syntax
-- Basic query explain with performance analysis
EXPLAIN (ANALYZE true, BUFFERS true, TIMING true)
SELECT
user_id,
email,
first_name,
last_name,
status,
created_at
FROM users
WHERE status = 'active'
AND country IN ('US', 'CA', 'UK')
AND created_at >= CURRENT_DATE - INTERVAL '1 year'
ORDER BY created_at DESC
LIMIT 100;
-- Advanced aggregation explain with optimization recommendations
EXPLAIN (ANALYZE true, COSTS true, VERBOSE true, FORMAT JSON)
WITH user_activity_summary AS (
SELECT
u.user_id,
u.email,
u.first_name,
u.last_name,
u.country,
u.status,
COUNT(o.order_id) as order_count,
SUM(o.total_amount) as total_spent,
AVG(o.total_amount) as avg_order_value,
MAX(o.created_at) as last_order_date,
-- Customer value segmentation
CASE
WHEN SUM(o.total_amount) > 1000 THEN 'high_value'
WHEN SUM(o.total_amount) > 100 THEN 'medium_value'
ELSE 'low_value'
END as value_segment,
-- Activity recency scoring
CASE
WHEN MAX(o.created_at) >= CURRENT_DATE - INTERVAL '30 days' THEN 'recent'
WHEN MAX(o.created_at) >= CURRENT_DATE - INTERVAL '90 days' THEN 'moderate'
WHEN MAX(o.created_at) >= CURRENT_DATE - INTERVAL '1 year' THEN 'old'
ELSE 'inactive'
END as activity_segment
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE u.status = 'active'
AND u.country IN ('US', 'CA', 'UK', 'AU', 'DE')
AND u.created_at >= CURRENT_DATE - INTERVAL '2 years'
AND (o.status = 'completed' OR o.status IS NULL)
GROUP BY u.user_id, u.email, u.first_name, u.last_name, u.country, u.status
HAVING COUNT(o.order_id) > 0 OR u.created_at >= CURRENT_DATE - INTERVAL '6 months'
),
customer_insights AS (
SELECT
country,
value_segment,
activity_segment,
COUNT(*) as customer_count,
AVG(total_spent) as avg_customer_value,
SUM(order_count) as total_orders,
-- Geographic performance metrics
AVG(order_count) as avg_orders_per_customer,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_spent) as median_customer_value,
STDDEV(total_spent) as customer_value_stddev,
-- Customer concentration analysis
COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY country) as segment_concentration,
-- Activity trend indicators
COUNT(*) FILTER (WHERE activity_segment = 'recent') as recent_active_customers,
COUNT(*) FILTER (WHERE activity_segment IN ('moderate', 'old')) as declining_customers
FROM user_activity_summary
GROUP BY country, value_segment, activity_segment
)
SELECT
country,
value_segment,
activity_segment,
customer_count,
ROUND(avg_customer_value::numeric, 2) as avg_customer_ltv,
total_orders,
ROUND(avg_orders_per_customer::numeric, 1) as avg_orders_per_customer,
ROUND(median_customer_value::numeric, 2) as median_ltv,
ROUND(segment_concentration::numeric, 4) as market_concentration,
-- Performance indicators
CASE
WHEN recent_active_customers > declining_customers THEN 'growing'
WHEN recent_active_customers < declining_customers * 0.5 THEN 'declining'
ELSE 'stable'
END as segment_trend,
-- Business intelligence insights
CASE
WHEN value_segment = 'high_value' AND activity_segment = 'recent' THEN 'premium_active'
WHEN value_segment = 'high_value' AND activity_segment != 'recent' THEN 'at_risk_premium'
WHEN value_segment != 'low_value' AND activity_segment = 'recent' THEN 'growth_opportunity'
WHEN activity_segment = 'inactive' THEN 'reactivation_target'
ELSE 'standard_segment'
END as strategic_priority,
-- Ranking within country
ROW_NUMBER() OVER (
PARTITION BY country
ORDER BY avg_customer_value DESC, customer_count DESC
) as country_segment_rank
FROM customer_insights
WHERE customer_count >= 10 -- Filter small segments
ORDER BY country, avg_customer_value DESC, customer_count DESC;
-- QueryLeaf EXPLAIN output with optimization insights:
-- {
-- "queryType": "aggregation",
-- "executionTimeMillis": 245,
-- "totalDocsExamined": 45678,
-- "totalDocsReturned": 1245,
-- "efficiency": 0.027,
-- "indexUsage": {
-- "indexes": ["users_status_country_idx", "orders_user_status_idx"],
-- "effectiveness": 0.78,
-- "missingIndexes": ["users_created_at_idx", "orders_completed_date_idx"]
-- },
-- "stages": [
-- {
-- "stage": "$match",
-- "inputStage": "IXSCAN",
-- "indexName": "users_status_country_idx",
-- "keysExamined": 12456,
-- "docsExamined": 8901,
-- "executionTimeMillis": 45,
-- "optimization": "GOOD - Using compound index efficiently"
-- },
-- {
-- "stage": "$lookup",
-- "inputStage": "IXSCAN",
-- "indexName": "orders_user_status_idx",
-- "executionTimeMillis": 156,
-- "optimization": "NEEDS_IMPROVEMENT - Consider creating index on (user_id, status, created_at)"
-- },
-- {
-- "stage": "$group",
-- "executionTimeMillis": 34,
-- "memoryUsageMB": 12.3,
-- "spilledToDisk": false,
-- "optimization": "GOOD - Group operation within memory limits"
-- },
-- {
-- "stage": "$sort",
-- "executionTimeMillis": 10,
-- "memoryUsageMB": 2.1,
-- "optimization": "EXCELLENT - Sort using index order"
-- }
-- ],
-- "recommendations": [
-- {
-- "type": "CREATE_INDEX",
-- "priority": "HIGH",
-- "description": "Create compound index to improve JOIN performance",
-- "suggestedIndex": "CREATE INDEX orders_user_status_date_idx ON orders (user_id, status, created_at DESC)",
-- "estimatedImprovement": "60-80% reduction in lookup time"
-- },
-- {
-- "type": "QUERY_RESTRUCTURE",
-- "priority": "MEDIUM",
-- "description": "Consider splitting complex aggregation into smaller stages",
-- "estimatedImprovement": "20-40% better resource utilization"
-- }
-- ],
-- "performanceGrade": "C+",
-- "bottlenecks": [
-- {
-- "stage": "$lookup",
-- "issue": "Examining too many documents in joined collection",
-- "impact": "63% of total execution time"
-- }
-- ]
-- }
-- Performance monitoring and optimization tracking
WITH query_performance_analysis AS (
SELECT
DATE_TRUNC('hour', execution_timestamp) as hour_bucket,
collection_name,
query_type,
-- Performance metrics
COUNT(*) as query_count,
AVG(execution_time_ms) as avg_execution_time,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_execution_time,
MAX(execution_time_ms) as max_execution_time,
-- Resource utilization
AVG(docs_examined) as avg_docs_examined,
AVG(docs_returned) as avg_docs_returned,
AVG(docs_examined::float / GREATEST(docs_returned, 1)) as avg_scan_ratio,
-- Index effectiveness
COUNT(*) FILTER (WHERE index_used = true) as queries_with_index,
AVG(CASE WHEN index_used THEN 1.0 ELSE 0.0 END) as index_hit_rate,
STRING_AGG(DISTINCT index_name, ', ') as indexes_used,
-- Error tracking
COUNT(*) FILTER (WHERE execution_success = false) as failed_queries,
STRING_AGG(DISTINCT error_type, '; ') FILTER (WHERE error_type IS NOT NULL) as error_types,
-- Memory and I/O metrics
AVG(memory_usage_mb) as avg_memory_usage,
MAX(memory_usage_mb) as peak_memory_usage,
COUNT(*) FILTER (WHERE spilled_to_disk = true) as queries_spilled_to_disk
FROM query_execution_log
WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
AND collection_name IN ('users', 'orders', 'products', 'analytics')
GROUP BY DATE_TRUNC('hour', execution_timestamp), collection_name, query_type
),
performance_scoring AS (
SELECT
*,
-- Performance score calculation (0-100)
LEAST(100, GREATEST(0,
-- Execution time score (40% weight)
(CASE
WHEN avg_execution_time <= 50 THEN 40
WHEN avg_execution_time <= 100 THEN 30
WHEN avg_execution_time <= 250 THEN 20
WHEN avg_execution_time <= 500 THEN 10
ELSE 0
END) +
-- Index usage score (35% weight)
(index_hit_rate * 35) +
-- Scan efficiency score (25% weight)
(CASE
WHEN avg_scan_ratio <= 1.1 THEN 25
WHEN avg_scan_ratio <= 2.0 THEN 20
WHEN avg_scan_ratio <= 5.0 THEN 15
WHEN avg_scan_ratio <= 10.0 THEN 10
ELSE 0
END)
)) as performance_score,
-- Performance grade assignment
CASE
WHEN avg_execution_time <= 50 AND index_hit_rate >= 0.9 AND avg_scan_ratio <= 1.5 THEN 'A'
WHEN avg_execution_time <= 100 AND index_hit_rate >= 0.8 AND avg_scan_ratio <= 3.0 THEN 'B'
WHEN avg_execution_time <= 250 AND index_hit_rate >= 0.6 AND avg_scan_ratio <= 10.0 THEN 'C'
WHEN avg_execution_time <= 500 AND index_hit_rate >= 0.4 THEN 'D'
ELSE 'F'
END as performance_grade,
-- Trend analysis (comparing with previous period)
LAG(avg_execution_time) OVER (
PARTITION BY collection_name, query_type
ORDER BY hour_bucket
) as prev_avg_execution_time,
LAG(index_hit_rate) OVER (
PARTITION BY collection_name, query_type
ORDER BY hour_bucket
) as prev_index_hit_rate,
LAG(performance_score) OVER (
PARTITION BY collection_name, query_type
ORDER BY hour_bucket
) as prev_performance_score
FROM query_performance_analysis
),
optimization_recommendations AS (
SELECT
collection_name,
query_type,
hour_bucket,
performance_grade,
performance_score,
-- Performance trend indicators
CASE
WHEN prev_performance_score IS NOT NULL THEN
CASE
WHEN performance_score > prev_performance_score + 10 THEN 'IMPROVING'
WHEN performance_score < prev_performance_score - 10 THEN 'DEGRADING'
ELSE 'STABLE'
END
ELSE 'NEW'
END as performance_trend,
-- Specific optimization recommendations
ARRAY_REMOVE(ARRAY[
CASE
WHEN index_hit_rate < 0.8 THEN 'CREATE_MISSING_INDEXES'
ELSE NULL
END,
CASE
WHEN avg_scan_ratio > 10 THEN 'IMPROVE_QUERY_SELECTIVITY'
ELSE NULL
END,
CASE
WHEN avg_execution_time > 500 THEN 'OPTIMIZE_QUERY_STRUCTURE'
ELSE NULL
END,
CASE
WHEN failed_queries > query_count * 0.05 THEN 'INVESTIGATE_QUERY_FAILURES'
ELSE NULL
END,
CASE
WHEN queries_spilled_to_disk > 0 THEN 'REDUCE_MEMORY_USAGE'
ELSE NULL
END
], NULL) as optimization_actions,
-- Priority calculation
CASE
WHEN performance_grade IN ('D', 'F') AND query_count > 100 THEN 'CRITICAL'
WHEN performance_grade = 'C' AND query_count > 500 THEN 'HIGH'
WHEN performance_grade IN ('C', 'D') AND query_count > 50 THEN 'MEDIUM'
ELSE 'LOW'
END as optimization_priority,
-- Detailed metrics for analysis
query_count,
avg_execution_time,
p95_execution_time,
index_hit_rate,
avg_scan_ratio,
failed_queries,
indexes_used,
error_types
FROM performance_scoring
WHERE query_count >= 5 -- Filter low-volume queries
)
SELECT
collection_name,
query_type,
performance_grade,
ROUND(performance_score::numeric, 1) as performance_score,
performance_trend,
optimization_priority,
-- Key performance indicators
query_count as hourly_query_count,
ROUND(avg_execution_time::numeric, 2) as avg_latency_ms,
ROUND(p95_execution_time::numeric, 2) as p95_latency_ms,
ROUND((index_hit_rate * 100)::numeric, 1) as index_hit_rate_pct,
ROUND(avg_scan_ratio::numeric, 2) as avg_selectivity_ratio,
-- Optimization guidance
CASE
WHEN ARRAY_LENGTH(optimization_actions, 1) > 0 THEN
'Recommended actions: ' || ARRAY_TO_STRING(optimization_actions, ', ')
ELSE 'Performance within acceptable parameters'
END as optimization_guidance,
-- Resource impact assessment
CASE
WHEN query_count > 1000 AND performance_grade IN ('D', 'F') THEN 'HIGH_IMPACT'
WHEN query_count > 500 AND performance_grade = 'C' THEN 'MEDIUM_IMPACT'
ELSE 'LOW_IMPACT'
END as resource_impact,
-- Technical details
indexes_used,
error_types,
hour_bucket as analysis_hour
FROM optimization_recommendations
WHERE optimization_priority IN ('CRITICAL', 'HIGH', 'MEDIUM')
OR performance_trend = 'DEGRADING'
ORDER BY
CASE optimization_priority
WHEN 'CRITICAL' THEN 1
WHEN 'HIGH' THEN 2
WHEN 'MEDIUM' THEN 3
ELSE 4
END,
performance_score ASC,
query_count DESC;
-- Real-time query optimization with automated recommendations
CREATE OR REPLACE VIEW query_optimization_dashboard AS
WITH current_performance AS (
SELECT
collection_name,
query_hash,
query_pattern,
-- Recent performance metrics (last hour)
COUNT(*) as recent_executions,
AVG(execution_time_ms) as current_avg_time,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as current_p95_time,
AVG(docs_examined::float / GREATEST(docs_returned, 1)) as current_scan_ratio,
-- Index usage analysis
BOOL_AND(index_used) as all_queries_use_index,
COUNT(DISTINCT index_name) as unique_indexes_used,
MODE() WITHIN GROUP (ORDER BY index_name) as most_common_index,
-- Error rate tracking
AVG(CASE WHEN execution_success THEN 1.0 ELSE 0.0 END) as success_rate
FROM query_execution_log
WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
GROUP BY collection_name, query_hash, query_pattern
HAVING COUNT(*) >= 5 -- Minimum threshold for analysis
),
historical_baseline AS (
SELECT
collection_name,
query_hash,
-- Historical baseline metrics (previous 24 hours, excluding last hour)
AVG(execution_time_ms) as baseline_avg_time,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as baseline_p95_time,
AVG(docs_examined::float / GREATEST(docs_returned, 1)) as baseline_scan_ratio,
AVG(CASE WHEN execution_success THEN 1.0 ELSE 0.0 END) as baseline_success_rate
FROM query_execution_log
WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '25 hours'
AND execution_timestamp < CURRENT_TIMESTAMP - INTERVAL '1 hour'
GROUP BY collection_name, query_hash
HAVING COUNT(*) >= 20 -- Sufficient historical data
)
SELECT
cp.collection_name,
cp.query_pattern,
cp.recent_executions,
-- Performance comparison
ROUND(cp.current_avg_time::numeric, 2) as current_avg_latency_ms,
ROUND(hb.baseline_avg_time::numeric, 2) as baseline_avg_latency_ms,
ROUND(((cp.current_avg_time - hb.baseline_avg_time) / hb.baseline_avg_time * 100)::numeric, 1) as latency_change_pct,
-- Performance status classification
CASE
WHEN cp.current_avg_time > hb.baseline_avg_time * 1.5 THEN 'DEGRADED'
WHEN cp.current_avg_time > hb.baseline_avg_time * 1.2 THEN 'SLOWER'
WHEN cp.current_avg_time < hb.baseline_avg_time * 0.8 THEN 'IMPROVED'
ELSE 'STABLE'
END as performance_status,
-- Index utilization
cp.all_queries_use_index,
cp.unique_indexes_used,
cp.most_common_index,
-- Scan efficiency
ROUND(cp.current_scan_ratio::numeric, 2) as current_scan_ratio,
ROUND(hb.baseline_scan_ratio::numeric, 2) as baseline_scan_ratio,
-- Reliability metrics
ROUND((cp.success_rate * 100)::numeric, 2) as success_rate_pct,
ROUND((hb.baseline_success_rate * 100)::numeric, 2) as baseline_success_rate_pct,
-- Automated optimization recommendations
CASE
WHEN NOT cp.all_queries_use_index THEN 'CRITICAL: Create missing indexes for consistent performance'
WHEN cp.current_avg_time > hb.baseline_avg_time * 2 THEN 'HIGH: Investigate severe performance regression'
WHEN cp.current_scan_ratio > hb.baseline_scan_ratio * 2 THEN 'MEDIUM: Review query selectivity and filters'
WHEN cp.success_rate < 0.95 THEN 'MEDIUM: Address query reliability issues'
WHEN cp.current_avg_time > hb.baseline_avg_time * 1.2 THEN 'LOW: Monitor for continued degradation'
ELSE 'No immediate action required'
END as recommended_action,
-- Alert priority
CASE
WHEN NOT cp.all_queries_use_index OR cp.current_avg_time > hb.baseline_avg_time * 2 THEN 'ALERT'
WHEN cp.current_avg_time > hb.baseline_avg_time * 1.5 OR cp.success_rate < 0.9 THEN 'WARNING'
ELSE 'INFO'
END as alert_level
FROM current_performance cp
LEFT JOIN historical_baseline hb ON cp.collection_name = hb.collection_name
AND cp.query_hash = hb.query_hash
ORDER BY
CASE
WHEN NOT cp.all_queries_use_index OR cp.current_avg_time > COALESCE(hb.baseline_avg_time * 2, 1000) THEN 1
WHEN cp.current_avg_time > COALESCE(hb.baseline_avg_time * 1.5, 500) THEN 2
ELSE 3
END,
cp.recent_executions DESC;
-- QueryLeaf provides comprehensive query optimization capabilities:
-- 1. SQL-familiar EXPLAIN syntax with detailed execution plan analysis
-- 2. Advanced performance monitoring with historical trend analysis
-- 3. Automated index recommendations based on query patterns
-- 4. Real-time performance alerts and degradation detection
-- 5. Comprehensive bottleneck identification and optimization guidance
-- 6. Resource usage tracking and capacity planning insights
-- 7. Query efficiency scoring and performance grading systems
-- 8. Integration with MongoDB's native explain plan functionality
-- 9. Batch query analysis for application-wide performance review
-- 10. Production-ready monitoring dashboards and optimization workflows
Best Practices for Query Optimization Implementation
Query Analysis Strategy
Essential principles for effective MongoDB query optimization:
- Regular Monitoring: Implement continuous query performance monitoring and alerting
- Index Strategy: Design indexes based on actual query patterns and performance data
- Explain Plan Analysis: Use comprehensive explain plan analysis to identify bottlenecks
- Historical Tracking: Maintain historical performance data to identify trends and regressions
- Automated Optimization: Implement automated optimization recommendations and validation
- Production Safety: Test all optimizations thoroughly before applying to production systems
Performance Tuning Workflow
Optimize MongoDB queries systematically:
- Performance Baseline: Establish performance baselines and targets for all critical queries
- Bottleneck Identification: Use explain plans to identify specific performance bottlenecks
- Optimization Implementation: Apply optimizations following proven patterns and best practices
- Validation Testing: Validate optimization effectiveness with comprehensive testing
- Monitoring Setup: Implement ongoing monitoring to track optimization impact
- Continuous Improvement: Regular review and refinement of optimization strategies
Conclusion
MongoDB's advanced query optimization and explain plan system provides comprehensive tools for identifying performance bottlenecks, analyzing query execution patterns, and implementing effective optimization strategies. The sophisticated explain functionality offers detailed insights that enable both development and production performance tuning with automated recommendations and historical analysis capabilities.
Key MongoDB Query Optimization benefits include:
- Comprehensive Analysis: Detailed execution plan analysis with performance metrics and bottleneck identification
- Automated Recommendations: Intelligent optimization suggestions based on query patterns and performance data
- Real-time Monitoring: Continuous performance monitoring with alerting and trend analysis
- Production-Ready Tools: Sophisticated analysis tools designed for production database optimization
- Historical Intelligence: Performance trend analysis and regression detection capabilities
- Integration-Friendly: Seamless integration with existing monitoring and alerting infrastructure
Whether you're optimizing application queries, managing database performance, or implementing automated optimization workflows, MongoDB's query optimization tools with QueryLeaf's familiar SQL interface provide the foundation for high-performance database operations.
QueryLeaf Integration: QueryLeaf automatically manages MongoDB query optimization while providing SQL-familiar explain plan syntax, performance analysis functions, and optimization recommendations. Advanced query analysis patterns, automated optimization workflows, and comprehensive performance monitoring are seamlessly handled through familiar SQL constructs, making sophisticated database optimization both powerful and accessible to SQL-oriented development teams.
The combination of comprehensive query analysis capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both high-performance queries and familiar database optimization patterns, ensuring your applications achieve optimal performance while remaining maintainable as they scale and evolve.