MongoDB Performance Monitoring and Diagnostics: Advanced Optimization Techniques for Production Database Management
Production MongoDB deployments require comprehensive performance monitoring and optimization strategies to maintain optimal query response times, efficient resource utilization, and predictable application performance under varying workload conditions. Traditional database monitoring approaches often struggle with MongoDB's document-oriented structure, dynamic schema capabilities, and distributed architecture patterns, making specialized monitoring tools and techniques essential for effective performance management.
MongoDB provides sophisticated built-in performance monitoring capabilities including query profiling, execution statistics, index utilization analysis, and comprehensive metrics collection that enable deep insights into database performance characteristics. Unlike relational databases that rely primarily on table-level statistics, MongoDB's monitoring encompasses collection-level metrics, document-level analysis, aggregation pipeline performance, and shard-level resource utilization patterns.
The Traditional Database Monitoring Challenge
Conventional database monitoring approaches often lack the granularity and flexibility needed for MongoDB environments:
-- Traditional PostgreSQL performance monitoring - limited insight into document-level operations
-- Basic query performance analysis with limited MongoDB-style insights
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation,
most_common_vals,
most_common_freqs,
-- Basic statistics available in PostgreSQL
pg_stat_get_live_tuples(c.oid) as live_tuples,
pg_stat_get_dead_tuples(c.oid) as dead_tuples,
pg_stat_get_tuples_inserted(c.oid) as tuples_inserted,
pg_stat_get_tuples_updated(c.oid) as tuples_updated,
pg_stat_get_tuples_deleted(c.oid) as tuples_deleted,
-- Table scan statistics
pg_stat_get_numscans(c.oid) as table_scans,
pg_stat_get_tuples_returned(c.oid) as tuples_returned,
pg_stat_get_tuples_fetched(c.oid) as tuples_fetched,
-- Index usage statistics (limited compared to MongoDB index insights)
pg_stat_get_blocks_fetched(c.oid) as blocks_fetched,
pg_stat_get_blocks_hit(c.oid) as blocks_hit
FROM pg_stats ps
JOIN pg_class c ON ps.tablename = c.relname
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE ps.schemaname = 'public'
ORDER BY pg_stat_get_live_tuples(c.oid) DESC;
-- Query performance analysis with limited flexibility for document operations
WITH slow_queries AS (
SELECT
query,
calls,
total_time,
mean_time,
stddev_time,
min_time,
max_time,
rows,
-- Limited insight into query complexity and document operations
100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent,
-- Basic classification limited to SQL operations
CASE
WHEN query LIKE 'SELECT%' THEN 'read'
WHEN query LIKE 'INSERT%' THEN 'write'
WHEN query LIKE 'UPDATE%' THEN 'update'
WHEN query LIKE 'DELETE%' THEN 'delete'
ELSE 'other'
END as query_type
FROM pg_stat_statements
WHERE calls > 100 -- Focus on frequently executed queries
)
SELECT
query_type,
COUNT(*) as query_count,
SUM(calls) as total_calls,
AVG(mean_time) as avg_response_time,
SUM(total_time) as total_execution_time,
AVG(hit_percent) as avg_cache_hit_rate,
-- Limited aggregation capabilities compared to MongoDB aggregation insights
percentile_cont(0.95) WITHIN GROUP (ORDER BY mean_time) as p95_response_time,
percentile_cont(0.99) WITHIN GROUP (ORDER BY mean_time) as p99_response_time
FROM slow_queries
GROUP BY query_type
ORDER BY total_execution_time DESC;
-- Problems with traditional monitoring approaches:
-- 1. Limited understanding of document-level operations and nested field access
-- 2. No insight into aggregation pipeline performance and optimization
-- 3. Lack of collection-level and field-level usage statistics
-- 4. No support for analyzing dynamic schema evolution and performance impact
-- 5. Limited index utilization analysis for compound and sparse indexes
-- 6. No understanding of MongoDB-specific operations like upserts and bulk operations
-- 7. Inability to analyze shard key distribution and query routing efficiency
-- 8. No support for analyzing replica set read preference impact on performance
-- 9. Limited insight into connection pooling and driver-level optimization opportunities
-- 10. No understanding of MongoDB-specific caching behavior and working set analysis
-- Manual index analysis with limited insights into MongoDB index strategies
SELECT
schemaname,
tablename,
indexname,
idx_tup_read,
idx_tup_fetch,
idx_blks_read,
idx_blks_hit,
-- Basic index efficiency calculation (limited compared to MongoDB index metrics)
CASE
WHEN idx_tup_read > 0 THEN
ROUND(100.0 * idx_tup_fetch / idx_tup_read, 2)
ELSE 0
END as index_efficiency_percent,
-- Cache hit ratio (basic compared to MongoDB's comprehensive cache analysis)
CASE
WHEN (idx_blks_read + idx_blks_hit) > 0 THEN
ROUND(100.0 * idx_blks_hit / (idx_blks_read + idx_blks_hit), 2)
ELSE 0
END as cache_hit_percent
FROM pg_stat_user_indexes
ORDER BY idx_tup_read DESC;
-- Limitations of traditional approaches:
-- 1. No understanding of MongoDB's document structure impact on performance
-- 2. Limited aggregation pipeline analysis and optimization insights
-- 3. No collection-level sharding and distribution analysis
-- 4. Lack of real-time profiling capabilities for individual operations
-- 5. No support for analyzing GridFS performance and large document handling
-- 6. Limited understanding of MongoDB's memory management and working set optimization
-- 7. No insight into oplog performance and replica set optimization
-- 8. Inability to analyze change streams and real-time operation performance
-- 9. Limited connection and driver optimization analysis
-- 10. No support for analyzing MongoDB Atlas-specific performance metrics
MongoDB provides comprehensive performance monitoring and optimization capabilities:
// MongoDB Advanced Performance Monitoring and Optimization System
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('production_performance_monitoring');
// Comprehensive MongoDB Performance Monitoring and Diagnostics Manager
class AdvancedMongoPerformanceMonitor {
constructor(db, config = {}) {
this.db = db;
this.adminDb = db.admin();
this.collections = {
performanceMetrics: db.collection('performance_metrics'),
slowQueries: db.collection('slow_queries'),
indexAnalysis: db.collection('index_analysis'),
collectionStats: db.collection('collection_stats'),
profilingData: db.collection('profiling_data'),
optimizationRecommendations: db.collection('optimization_recommendations')
};
// Advanced monitoring configuration
this.config = {
profilingLevel: config.profilingLevel || 2, // Profile all operations
slowOperationThreshold: config.slowOperationThreshold || 100, // 100ms
samplingRate: config.samplingRate || 1.0, // Sample all operations
metricsCollectionInterval: config.metricsCollectionInterval || 60000, // 1 minute
indexAnalysisInterval: config.indexAnalysisInterval || 300000, // 5 minutes
performanceReportInterval: config.performanceReportInterval || 900000, // 15 minutes
// Advanced monitoring features
enableOperationProfiling: config.enableOperationProfiling !== false,
enableIndexAnalysis: config.enableIndexAnalysis !== false,
enableCollectionStats: config.enableCollectionStats !== false,
enableQueryOptimization: config.enableQueryOptimization !== false,
enableRealTimeAlerts: config.enableRealTimeAlerts !== false,
enablePerformanceBaseline: config.enablePerformanceBaseline !== false,
// Alerting thresholds
alertThresholds: {
avgResponseTime: config.alertThresholds?.avgResponseTime || 500, // 500ms
connectionCount: config.alertThresholds?.connectionCount || 1000,
indexHitRatio: config.alertThresholds?.indexHitRatio || 0.95,
replicationLag: config.alertThresholds?.replicationLag || 5000, // 5 seconds
diskUtilization: config.alertThresholds?.diskUtilization || 0.8, // 80%
memoryUtilization: config.alertThresholds?.memoryUtilization || 0.85 // 85%
},
// Optimization settings
optimizationRules: {
enableAutoIndexSuggestions: true,
enableQueryRewriting: false,
enableCollectionCompaction: false,
enableShardKeyAnalysis: true
}
};
// Performance metrics storage
this.metrics = {
operationCounts: new Map(),
responseTimes: new Map(),
indexUsage: new Map(),
collectionMetrics: new Map()
};
// Initialize monitoring systems
this.initializePerformanceMonitoring();
this.setupRealTimeProfiler();
this.startPerformanceCollection();
}
async initializePerformanceMonitoring() {
console.log('Initializing comprehensive MongoDB performance monitoring...');
try {
// Enable database profiling with advanced configuration
await this.enableAdvancedProfiling();
// Setup performance metrics collection
await this.setupMetricsCollection();
// Initialize index analysis
await this.initializeIndexAnalysis();
// Setup collection statistics monitoring
await this.setupCollectionStatsMonitoring();
// Initialize performance baseline
if (this.config.enablePerformanceBaseline) {
await this.initializePerformanceBaseline();
}
console.log('Performance monitoring system initialized successfully');
} catch (error) {
console.error('Error initializing performance monitoring:', error);
throw error;
}
}
async enableAdvancedProfiling() {
console.log('Enabling advanced database profiling...');
try {
// Enable profiling for all operations with detailed analysis
const profilingResult = await this.db.command({
profile: this.config.profilingLevel,
slowms: this.config.slowOperationThreshold,
sampleRate: this.config.samplingRate,
// Advanced profiling options
filter: {
// Profile operations based on specific criteria
$or: [
{ ts: { $gte: new Date(Date.now() - 3600000) } }, // Last hour
{ millis: { $gte: this.config.slowOperationThreshold } }, // Slow operations
{ planSummary: { $regex: 'COLLSCAN' } }, // Collection scans
{ 'locks.Global.acquireCount.r': { $exists: true } } // Lock-intensive operations
]
}
});
console.log('Database profiling enabled:', profilingResult);
// Configure profiler collection size for optimal performance
await this.configureProfilerCollection();
} catch (error) {
console.error('Error enabling profiling:', error);
throw error;
}
}
async configureProfilerCollection() {
try {
// Ensure profiler collection is appropriately sized
const profilerCollStats = await this.db.collection('system.profile').stats();
if (profilerCollStats.capped && profilerCollStats.maxSize < 100 * 1024 * 1024) {
console.log('Recreating profiler collection with larger size...');
// Drop and recreate with optimal size
await this.db.collection('system.profile').drop();
await this.db.createCollection('system.profile', {
capped: true,
size: 100 * 1024 * 1024, // 100MB
max: 1000000 // 1M documents
});
}
} catch (error) {
console.warn('Could not configure profiler collection:', error.message);
}
}
async collectComprehensivePerformanceMetrics() {
console.log('Collecting comprehensive performance metrics...');
try {
const startTime = Date.now();
// Collect server status metrics
const serverStatus = await this.adminDb.command({ serverStatus: 1 });
// Collect database statistics
const dbStats = await this.db.stats();
// Collect profiling data
const profilingData = await this.analyzeProfilingData();
// Collect index usage statistics
const indexStats = await this.analyzeIndexUsage();
// Collect collection-level metrics
const collectionMetrics = await this.collectCollectionMetrics();
// Collect operation metrics
const operationMetrics = await this.analyzeOperationMetrics();
// Collect connection metrics
const connectionMetrics = this.extractConnectionMetrics(serverStatus);
// Collect memory and resource metrics
const resourceMetrics = this.extractResourceMetrics(serverStatus);
// Collect replication metrics (if applicable)
const replicationMetrics = await this.collectReplicationMetrics();
// Collect sharding metrics (if applicable)
const shardingMetrics = await this.collectShardingMetrics();
// Assemble comprehensive performance report
const performanceReport = {
timestamp: new Date(),
collectionTime: Date.now() - startTime,
// Core performance metrics
serverStatus: {
uptime: serverStatus.uptime,
version: serverStatus.version,
process: serverStatus.process,
pid: serverStatus.pid,
host: serverStatus.host
},
// Database-level metrics
database: {
collections: dbStats.collections,
objects: dbStats.objects,
avgObjSize: dbStats.avgObjSize,
dataSize: dbStats.dataSize,
storageSize: dbStats.storageSize,
indexes: dbStats.indexes,
indexSize: dbStats.indexSize,
// Efficiency metrics
dataToIndexRatio: dbStats.indexSize > 0 ? dbStats.dataSize / dbStats.indexSize : 0,
storageEfficiency: dbStats.dataSize / dbStats.storageSize,
avgDocumentSize: dbStats.avgObjSize
},
// Operation performance metrics
operations: operationMetrics,
// Query performance analysis
queryPerformance: profilingData,
// Index performance analysis
indexPerformance: indexStats,
// Collection-level metrics
collections: collectionMetrics,
// Connection and concurrency metrics
connections: connectionMetrics,
// Resource utilization metrics
resources: resourceMetrics,
// Replication metrics
replication: replicationMetrics,
// Sharding metrics (if applicable)
sharding: shardingMetrics,
// Performance analysis
analysis: await this.generatePerformanceAnalysis({
serverStatus,
dbStats,
profilingData,
indexStats,
collectionMetrics,
operationMetrics,
connectionMetrics,
resourceMetrics
}),
// Optimization recommendations
recommendations: await this.generateOptimizationRecommendations({
profilingData,
indexStats,
collectionMetrics,
operationMetrics
})
};
// Store performance metrics
await this.collections.performanceMetrics.insertOne(performanceReport);
// Update real-time metrics
this.updateRealTimeMetrics(performanceReport);
// Check for performance alerts
await this.checkPerformanceAlerts(performanceReport);
return performanceReport;
} catch (error) {
console.error('Error collecting performance metrics:', error);
throw error;
}
}
async analyzeProfilingData(timeWindow = 300000) {
console.log('Analyzing profiling data for query performance insights...');
try {
const cutoffTime = new Date(Date.now() - timeWindow);
// Aggregate profiling data with comprehensive analysis
const profilingAnalysis = await this.db.collection('system.profile').aggregate([
{
$match: {
ts: { $gte: cutoffTime },
ns: { $regex: `^${this.db.databaseName}\.` } // Current database only
}
},
{
$addFields: {
// Categorize operations
operationType: {
$switch: {
branches: [
{ case: { $ne: ['$command.find', null] }, then: 'find' },
{ case: { $ne: ['$command.aggregate', null] }, then: 'aggregate' },
{ case: { $ne: ['$command.insert', null] }, then: 'insert' },
{ case: { $ne: ['$command.update', null] }, then: 'update' },
{ case: { $ne: ['$command.delete', null] }, then: 'delete' },
{ case: { $ne: ['$command.count', null] }, then: 'count' },
{ case: { $ne: ['$command.distinct', null] }, then: 'distinct' }
],
default: 'other'
}
},
// Analyze execution efficiency
executionEfficiency: {
$cond: {
if: { $and: [{ $gt: ['$docsExamined', 0] }, { $gt: ['$nreturned', 0] }] },
then: { $divide: ['$nreturned', '$docsExamined'] },
else: 0
}
},
// Categorize response times
responseTimeCategory: {
$switch: {
branches: [
{ case: { $lt: ['$millis', 10] }, then: 'very_fast' },
{ case: { $lt: ['$millis', 100] }, then: 'fast' },
{ case: { $lt: ['$millis', 500] }, then: 'moderate' },
{ case: { $lt: ['$millis', 2000] }, then: 'slow' }
],
default: 'very_slow'
}
},
// Index usage analysis
indexUsageType: {
$cond: {
if: { $regexMatch: { input: { $ifNull: ['$planSummary', ''] }, regex: 'IXSCAN' } },
then: 'index_scan',
else: {
$cond: {
if: { $regexMatch: { input: { $ifNull: ['$planSummary', ''] }, regex: 'COLLSCAN' } },
then: 'collection_scan',
else: 'other'
}
}
}
}
}
},
{
$group: {
_id: {
collection: { $arrayElemAt: [{ $split: ['$ns', '.'] }, -1] },
operationType: '$operationType',
indexUsageType: '$indexUsageType'
},
// Performance statistics
totalOperations: { $sum: 1 },
avgResponseTime: { $avg: '$millis' },
minResponseTime: { $min: '$millis' },
maxResponseTime: { $max: '$millis' },
p95ResponseTime: { $percentile: { input: '$millis', p: [0.95] } },
p99ResponseTime: { $percentile: { input: '$millis', p: [0.99] } },
// Document examination efficiency
totalDocsExamined: { $sum: { $ifNull: ['$docsExamined', 0] } },
totalDocsReturned: { $sum: { $ifNull: ['$nreturned', 0] } },
avgExecutionEfficiency: { $avg: '$executionEfficiency' },
// Response time distribution
veryFastOps: { $sum: { $cond: [{ $eq: ['$responseTimeCategory', 'very_fast'] }, 1, 0] } },
fastOps: { $sum: { $cond: [{ $eq: ['$responseTimeCategory', 'fast'] }, 1, 0] } },
moderateOps: { $sum: { $cond: [{ $eq: ['$responseTimeCategory', 'moderate'] }, 1, 0] } },
slowOps: { $sum: { $cond: [{ $eq: ['$responseTimeCategory', 'slow'] }, 1, 0] } },
verySlowOps: { $sum: { $cond: [{ $eq: ['$responseTimeCategory', 'very_slow'] }, 1, 0] } },
// Sample queries for analysis
sampleQueries: { $push: {
command: '$command',
millis: '$millis',
planSummary: '$planSummary',
ts: '$ts'
} }
}
},
{
$addFields: {
// Calculate efficiency metrics
overallEfficiency: {
$cond: {
if: { $gt: ['$totalDocsExamined', 0] },
then: { $divide: ['$totalDocsReturned', '$totalDocsExamined'] },
else: 1
}
},
// Calculate performance score
performanceScore: {
$multiply: [
// Response time component (lower is better)
{ $subtract: [1, { $min: [{ $divide: ['$avgResponseTime', 2000] }, 1] }] },
// Efficiency component (higher is better)
{ $multiply: ['$avgExecutionEfficiency', 100] }
]
},
// Performance classification
performanceClass: {
$switch: {
branches: [
{ case: { $gte: ['$performanceScore', 80] }, then: 'excellent' },
{ case: { $gte: ['$performanceScore', 60] }, then: 'good' },
{ case: { $gte: ['$performanceScore', 40] }, then: 'fair' },
{ case: { $gte: ['$performanceScore', 20] }, then: 'poor' }
],
default: 'critical'
}
}
}
},
{
$project: {
collection: '$_id.collection',
operationType: '$_id.operationType',
indexUsageType: '$_id.indexUsageType',
// Core metrics
totalOperations: 1,
avgResponseTime: { $round: ['$avgResponseTime', 2] },
minResponseTime: 1,
maxResponseTime: 1,
p95ResponseTime: { $round: [{ $arrayElemAt: ['$p95ResponseTime', 0] }, 2] },
p99ResponseTime: { $round: [{ $arrayElemAt: ['$p99ResponseTime', 0] }, 2] },
// Efficiency metrics
totalDocsExamined: 1,
totalDocsReturned: 1,
overallEfficiency: { $round: ['$overallEfficiency', 4] },
avgExecutionEfficiency: { $round: ['$avgExecutionEfficiency', 4] },
// Performance distribution
responseTimeDistribution: {
veryFast: '$veryFastOps',
fast: '$fastOps',
moderate: '$moderateOps',
slow: '$slowOps',
verySlow: '$verySlowOps'
},
// Performance scoring
performanceScore: { $round: ['$performanceScore', 2] },
performanceClass: 1,
// Sample queries (limit to 3 most recent)
sampleQueries: { $slice: [{ $sortArray: { input: '$sampleQueries', sortBy: { ts: -1 } } }, 3] }
}
},
{ $sort: { avgResponseTime: -1 } }
]).toArray();
return {
analysisTimeWindow: timeWindow,
totalProfiledOperations: profilingAnalysis.reduce((sum, item) => sum + item.totalOperations, 0),
collections: profilingAnalysis,
// Summary statistics
summary: {
avgResponseTimeOverall: profilingAnalysis.reduce((sum, item) => sum + (item.avgResponseTime * item.totalOperations), 0) /
Math.max(profilingAnalysis.reduce((sum, item) => sum + item.totalOperations, 0), 1),
slowOperationsCount: profilingAnalysis.reduce((sum, item) => sum + item.responseTimeDistribution.slow + item.responseTimeDistribution.verySlow, 0),
collectionScansCount: profilingAnalysis.filter(item => item.indexUsageType === 'collection_scan')
.reduce((sum, item) => sum + item.totalOperations, 0),
inefficientOperationsCount: profilingAnalysis.filter(item => item.overallEfficiency < 0.1)
.reduce((sum, item) => sum + item.totalOperations, 0)
}
};
} catch (error) {
console.error('Error analyzing profiling data:', error);
return { error: error.message, collections: [] };
}
}
async analyzeIndexUsage() {
console.log('Analyzing index usage and efficiency...');
try {
const collections = await this.db.listCollections().toArray();
const indexAnalysis = [];
for (const collInfo of collections) {
const collection = this.db.collection(collInfo.name);
try {
// Get index statistics
const indexStats = await collection.aggregate([
{ $indexStats: {} }
]).toArray();
// Get collection statistics for context
const collStats = await collection.stats();
// Analyze each index
for (const index of indexStats) {
const indexAnalysisItem = {
collection: collInfo.name,
indexName: index.name,
indexSpec: index.spec,
// Usage statistics
accesses: {
ops: index.accesses?.ops || 0,
since: index.accesses?.since || new Date()
},
// Index characteristics
indexSize: index.size || 0,
isUnique: index.spec && Object.values(index.spec).some(v => v === 1 && index.unique),
isSparse: index.sparse || false,
isPartial: !!index.partialFilterExpression,
isCompound: Object.keys(index.spec || {}).length > 1,
// Calculate index efficiency metrics
collectionDocuments: collStats.count,
collectionSize: collStats.size,
indexToCollectionRatio: collStats.size > 0 ? index.size / collStats.size : 0,
// Usage analysis
usageCategory: this.categorizeIndexUsage(index.accesses?.ops || 0, collStats.count),
// Performance metrics
avgDocumentSize: collStats.avgObjSize || 0,
indexSelectivity: this.estimateIndexSelectivity(index.spec, collStats.count)
};
indexAnalysis.push(indexAnalysisItem);
}
} catch (collError) {
console.warn(`Error analyzing indexes for collection ${collInfo.name}:`, collError.message);
}
}
// Generate index usage report
return {
totalIndexes: indexAnalysis.length,
indexes: indexAnalysis,
// Index usage summary
usageSummary: {
highUsage: indexAnalysis.filter(idx => idx.usageCategory === 'high').length,
mediumUsage: indexAnalysis.filter(idx => idx.usageCategory === 'medium').length,
lowUsage: indexAnalysis.filter(idx => idx.usageCategory === 'low').length,
unused: indexAnalysis.filter(idx => idx.usageCategory === 'unused').length
},
// Index type distribution
typeDistribution: {
simple: indexAnalysis.filter(idx => !idx.isCompound).length,
compound: indexAnalysis.filter(idx => idx.isCompound).length,
unique: indexAnalysis.filter(idx => idx.isUnique).length,
sparse: indexAnalysis.filter(idx => idx.isSparse).length,
partial: indexAnalysis.filter(idx => idx.isPartial).length
},
// Performance insights
performanceInsights: {
totalIndexSize: indexAnalysis.reduce((sum, idx) => sum + idx.indexSize, 0),
avgIndexToCollectionRatio: indexAnalysis.reduce((sum, idx) => sum + idx.indexToCollectionRatio, 0) / indexAnalysis.length,
potentiallyRedundantIndexes: indexAnalysis.filter(idx => idx.usageCategory === 'unused' && idx.indexName !== '_id_'),
oversizedIndexes: indexAnalysis.filter(idx => idx.indexToCollectionRatio > 0.5)
}
};
} catch (error) {
console.error('Error analyzing index usage:', error);
return { error: error.message, indexes: [] };
}
}
categorizeIndexUsage(accessCount, collectionDocuments) {
if (accessCount === 0) return 'unused';
if (accessCount < collectionDocuments * 0.01) return 'low';
if (accessCount < collectionDocuments * 0.1) return 'medium';
return 'high';
}
estimateIndexSelectivity(indexSpec, collectionDocuments) {
// Simple estimation - in practice, would need sampling
if (!indexSpec || collectionDocuments === 0) return 1;
// Compound indexes generally more selective
if (Object.keys(indexSpec).length > 1) return 0.1;
// Simple heuristic based on field types
return 0.5; // Default moderate selectivity
}
async collectCollectionMetrics() {
console.log('Collecting detailed collection-level metrics...');
try {
const collections = await this.db.listCollections().toArray();
const collectionMetrics = [];
for (const collInfo of collections) {
try {
const collection = this.db.collection(collInfo.name);
const stats = await collection.stats();
// Calculate additional metrics
const avgDocSize = stats.avgObjSize || 0;
const storageEfficiency = stats.size > 0 ? stats.size / stats.storageSize : 0;
const indexOverhead = stats.size > 0 ? stats.totalIndexSize / stats.size : 0;
const collectionMetric = {
name: collInfo.name,
type: collInfo.type,
// Core statistics
documentCount: stats.count,
dataSize: stats.size,
storageSize: stats.storageSize,
avgDocumentSize: avgDocSize,
// Index statistics
indexCount: stats.nindexes,
totalIndexSize: stats.totalIndexSize,
// Efficiency metrics
storageEfficiency: storageEfficiency,
indexOverhead: indexOverhead,
fragmentationRatio: stats.storageSize > 0 ? 1 - (stats.size / stats.storageSize) : 0,
// Performance characteristics
performanceCategory: this.categorizeCollectionPerformance({
documentCount: stats.count,
avgDocumentSize: avgDocSize,
indexOverhead: indexOverhead,
storageEfficiency: storageEfficiency
}),
// Optimization opportunities
optimizationFlags: {
highFragmentation: (1 - storageEfficiency) > 0.3,
excessiveIndexing: indexOverhead > 1.0,
largeDocs: avgDocSize > 16384, // 16KB
noIndexes: stats.nindexes <= 1 // Only _id index
},
timestamp: new Date()
};
collectionMetrics.push(collectionMetric);
} catch (collError) {
console.warn(`Error collecting stats for collection ${collInfo.name}:`, collError.message);
}
}
return {
collections: collectionMetrics,
summary: {
totalCollections: collectionMetrics.length,
totalDocuments: collectionMetrics.reduce((sum, c) => sum + c.documentCount, 0),
totalDataSize: collectionMetrics.reduce((sum, c) => sum + c.dataSize, 0),
totalStorageSize: collectionMetrics.reduce((sum, c) => sum + c.storageSize, 0),
totalIndexSize: collectionMetrics.reduce((sum, c) => sum + c.totalIndexSize, 0),
avgStorageEfficiency: collectionMetrics.reduce((sum, c) => sum + c.storageEfficiency, 0) / collectionMetrics.length
}
};
} catch (error) {
console.error('Error collecting collection metrics:', error);
return { error: error.message, collections: [] };
}
}
categorizeCollectionPerformance({ documentCount, avgDocumentSize, indexOverhead, storageEfficiency }) {
let score = 0;
// Document count efficiency
if (documentCount < 10000) score += 10;
else if (documentCount < 1000000) score += 5;
// Document size efficiency
if (avgDocumentSize < 1024) score += 10; // < 1KB
else if (avgDocumentSize < 16384) score += 5; // < 16KB
// Index efficiency
if (indexOverhead < 0.2) score += 10;
else if (indexOverhead < 0.5) score += 5;
// Storage efficiency
if (storageEfficiency > 0.8) score += 10;
else if (storageEfficiency > 0.6) score += 5;
if (score >= 30) return 'excellent';
if (score >= 20) return 'good';
if (score >= 10) return 'fair';
return 'poor';
}
async generateOptimizationRecommendations(performanceData) {
console.log('Generating performance optimization recommendations...');
const recommendations = [];
try {
// Analyze profiling data for query optimization
if (performanceData.profilingData?.collections) {
for (const collection of performanceData.profilingData.collections) {
// Recommend indexes for collection scans
if (collection.indexUsageType === 'collection_scan' && collection.totalOperations > 100) {
recommendations.push({
type: 'index_recommendation',
priority: 'high',
collection: collection.collection,
title: 'Add index to eliminate collection scans',
description: `Collection "${collection.collection}" has ${collection.totalOperations} collection scans with average response time of ${collection.avgResponseTime}ms`,
recommendation: `Consider adding an index on frequently queried fields for ${collection.operationType} operations`,
impact: 'high',
effort: 'medium',
estimatedImprovement: '60-90% response time reduction'
});
}
// Recommend query optimization for slow operations
if (collection.avgResponseTime > 1000) {
recommendations.push({
type: 'query_optimization',
priority: 'high',
collection: collection.collection,
title: 'Optimize slow queries',
description: `Queries on "${collection.collection}" average ${collection.avgResponseTime}ms response time`,
recommendation: 'Review query patterns and consider compound indexes or query restructuring',
impact: 'high',
effort: 'medium',
estimatedImprovement: '40-70% response time reduction'
});
}
// Recommend efficiency improvements
if (collection.overallEfficiency < 0.1) {
recommendations.push({
type: 'efficiency_improvement',
priority: 'medium',
collection: collection.collection,
title: 'Improve query efficiency',
description: `Queries examine ${collection.totalDocsExamined} documents but return only ${collection.totalDocsReturned} (${Math.round(collection.overallEfficiency * 100)}% efficiency)`,
recommendation: 'Add more selective indexes or modify query patterns to reduce document examination',
impact: 'medium',
effort: 'medium',
estimatedImprovement: '30-50% efficiency improvement'
});
}
}
}
// Analyze index usage for recommendations
if (performanceData.indexStats?.indexes) {
for (const index of performanceData.indexStats.indexes) {
// Recommend removing unused indexes
if (index.usageCategory === 'unused' && index.indexName !== '_id_') {
recommendations.push({
type: 'index_removal',
priority: 'low',
collection: index.collection,
title: 'Remove unused index',
description: `Index "${index.indexName}" on collection "${index.collection}" is unused`,
recommendation: 'Consider removing this index to reduce storage overhead and improve write performance',
impact: 'low',
effort: 'low',
estimatedImprovement: 'Reduced storage usage and faster writes'
});
}
// Recommend index optimization for oversized indexes
if (index.indexToCollectionRatio > 0.5) {
recommendations.push({
type: 'index_optimization',
priority: 'medium',
collection: index.collection,
title: 'Optimize oversized index',
description: `Index "${index.indexName}" size is ${Math.round(index.indexToCollectionRatio * 100)}% of collection size`,
recommendation: 'Review index design and consider using sparse or partial indexes',
impact: 'medium',
effort: 'medium',
estimatedImprovement: '20-40% storage reduction'
});
}
}
}
// Analyze collection metrics for recommendations
if (performanceData.collectionMetrics?.collections) {
for (const collection of performanceData.collectionMetrics.collections) {
// Recommend addressing fragmentation
if (collection.optimizationFlags.highFragmentation) {
recommendations.push({
type: 'storage_optimization',
priority: 'medium',
collection: collection.name,
title: 'Address storage fragmentation',
description: `Collection "${collection.name}" has ${Math.round(collection.fragmentationRatio * 100)}% fragmentation`,
recommendation: 'Consider running compact command or rebuilding indexes during maintenance window',
impact: 'medium',
effort: 'high',
estimatedImprovement: '15-30% storage efficiency improvement'
});
}
// Recommend index strategy for collections with no custom indexes
if (collection.optimizationFlags.noIndexes && collection.documentCount > 1000) {
recommendations.push({
type: 'index_strategy',
priority: 'medium',
collection: collection.name,
title: 'Implement indexing strategy',
description: `Collection "${collection.name}" has ${collection.documentCount} documents but no custom indexes`,
recommendation: 'Analyze query patterns and add appropriate indexes for common queries',
impact: 'high',
effort: 'medium',
estimatedImprovement: '50-80% query performance improvement'
});
}
}
}
// Sort recommendations by priority and impact
recommendations.sort((a, b) => {
const priorityOrder = { high: 3, medium: 2, low: 1 };
const impactOrder = { high: 3, medium: 2, low: 1 };
const priorityDiff = priorityOrder[b.priority] - priorityOrder[a.priority];
if (priorityDiff !== 0) return priorityDiff;
return impactOrder[b.impact] - impactOrder[a.impact];
});
return {
totalRecommendations: recommendations.length,
recommendations: recommendations,
// Summary by type
summaryByType: {
indexRecommendations: recommendations.filter(r => r.type.includes('index')).length,
queryOptimizations: recommendations.filter(r => r.type === 'query_optimization').length,
storageOptimizations: recommendations.filter(r => r.type === 'storage_optimization').length,
efficiencyImprovements: recommendations.filter(r => r.type === 'efficiency_improvement').length
},
// Priority distribution
priorityDistribution: {
high: recommendations.filter(r => r.priority === 'high').length,
medium: recommendations.filter(r => r.priority === 'medium').length,
low: recommendations.filter(r => r.priority === 'low').length
},
generatedAt: new Date()
};
} catch (error) {
console.error('Error generating optimization recommendations:', error);
return { error: error.message, recommendations: [] };
}
}
async generatePerformanceReport() {
console.log('Generating comprehensive performance report...');
try {
// Collect all performance metrics
const performanceData = await this.collectComprehensivePerformanceMetrics();
// Generate executive summary
const executiveSummary = this.generateExecutiveSummary(performanceData);
// Create comprehensive report
const performanceReport = {
reportId: require('crypto').randomUUID(),
generatedAt: new Date(),
reportPeriod: {
start: new Date(Date.now() - 3600000), // Last hour
end: new Date()
},
// Executive summary
executiveSummary: executiveSummary,
// Detailed performance data
performanceData: performanceData,
// Key performance indicators
kpis: {
avgResponseTime: performanceData.queryPerformance?.summary?.avgResponseTimeOverall || 0,
slowQueriesCount: performanceData.queryPerformance?.summary?.slowOperationsCount || 0,
collectionScansCount: performanceData.queryPerformance?.summary?.collectionScansCount || 0,
indexEfficiency: this.calculateOverallIndexEfficiency(performanceData.indexPerformance),
storageEfficiency: performanceData.collections?.summary?.avgStorageEfficiency || 0,
connectionUtilization: performanceData.connections?.utilizationPercent || 0
},
// Performance trends (if baseline available)
trends: await this.calculatePerformanceTrends(),
// Optimization recommendations
recommendations: performanceData.recommendations,
// Action items
actionItems: this.generateActionItems(performanceData.recommendations),
// Health score
overallHealthScore: this.calculateOverallHealthScore(performanceData)
};
// Store report
await this.collections.performanceMetrics.insertOne(performanceReport);
return performanceReport;
} catch (error) {
console.error('Error generating performance report:', error);
throw error;
}
}
generateExecutiveSummary(performanceData) {
const issues = [];
const highlights = [];
// Identify key issues
if (performanceData.queryPerformance?.summary?.avgResponseTimeOverall > 500) {
issues.push(`Average query response time is ${Math.round(performanceData.queryPerformance.summary.avgResponseTimeOverall)}ms (target: <100ms)`);
}
if (performanceData.queryPerformance?.summary?.collectionScansCount > 0) {
issues.push(`${performanceData.queryPerformance.summary.collectionScansCount} queries are performing collection scans`);
}
if (performanceData.collections?.summary?.avgStorageEfficiency < 0.7) {
issues.push(`Storage efficiency is ${Math.round(performanceData.collections.summary.avgStorageEfficiency * 100)}% (target: >80%)`);
}
// Identify highlights
if (performanceData.queryPerformance?.summary?.avgResponseTimeOverall < 100) {
highlights.push('Query performance is excellent with average response time under 100ms');
}
if (performanceData.indexPerformance?.usageSummary?.unused < 2) {
highlights.push('Index usage is well optimized with minimal unused indexes');
}
return {
status: issues.length === 0 ? 'healthy' : issues.length < 3 ? 'warning' : 'critical',
keyIssues: issues,
highlights: highlights,
recommendationsCount: performanceData.recommendations?.totalRecommendations || 0,
criticalRecommendations: performanceData.recommendations?.priorityDistribution?.high || 0
};
}
calculateOverallIndexEfficiency(indexPerformance) {
if (!indexPerformance?.indexes || indexPerformance.indexes.length === 0) return 0;
const usedIndexes = indexPerformance.indexes.filter(idx => idx.usageCategory !== 'unused').length;
return usedIndexes / indexPerformance.indexes.length;
}
generateActionItems(recommendations) {
if (!recommendations?.recommendations) return [];
return recommendations.recommendations
.filter(rec => rec.priority === 'high')
.slice(0, 5) // Top 5 high-priority items
.map(rec => ({
title: rec.title,
collection: rec.collection,
action: rec.recommendation,
estimatedEffort: rec.effort,
expectedImpact: rec.estimatedImprovement
}));
}
calculateOverallHealthScore(performanceData) {
let score = 100;
// Query performance impact
const avgResponseTime = performanceData.queryPerformance?.summary?.avgResponseTimeOverall || 0;
if (avgResponseTime > 1000) score -= 30;
else if (avgResponseTime > 500) score -= 20;
else if (avgResponseTime > 100) score -= 10;
// Collection scans impact
const collectionScans = performanceData.queryPerformance?.summary?.collectionScansCount || 0;
if (collectionScans > 100) score -= 25;
else if (collectionScans > 10) score -= 15;
else if (collectionScans > 0) score -= 5;
// Storage efficiency impact
const storageEfficiency = performanceData.collections?.summary?.avgStorageEfficiency || 1;
if (storageEfficiency < 0.5) score -= 20;
else if (storageEfficiency < 0.7) score -= 10;
// Index efficiency impact
const indexEfficiency = this.calculateOverallIndexEfficiency(performanceData.indexPerformance);
if (indexEfficiency < 0.7) score -= 15;
else if (indexEfficiency < 0.9) score -= 5;
return Math.max(0, score);
}
// Additional helper methods for comprehensive monitoring
extractConnectionMetrics(serverStatus) {
const connections = serverStatus.connections || {};
const network = serverStatus.network || {};
return {
current: connections.current || 0,
available: connections.available || 0,
totalCreated: connections.totalCreated || 0,
utilizationPercent: connections.available > 0 ?
(connections.current / (connections.current + connections.available)) * 100 : 0,
// Network metrics
bytesIn: network.bytesIn || 0,
bytesOut: network.bytesOut || 0,
numRequests: network.numRequests || 0
};
}
extractResourceMetrics(serverStatus) {
const mem = serverStatus.mem || {};
const extra_info = serverStatus.extra_info || {};
return {
// Memory usage
residentMemoryMB: mem.resident || 0,
virtualMemoryMB: mem.virtual || 0,
mappedMemoryMB: mem.mapped || 0,
// System metrics
pageFaults: extra_info.page_faults || 0,
heapUsageMB: mem.heap_usage_bytes ? mem.heap_usage_bytes / (1024 * 1024) : 0,
// CPU and system load would require additional system commands
cpuUsagePercent: 0, // Would need external monitoring
diskIOPS: 0 // Would need external monitoring
};
}
async collectReplicationMetrics() {
try {
const replSetStatus = await this.adminDb.command({ replSetGetStatus: 1 });
if (!replSetStatus.ok) {
return { replicated: false };
}
const primary = replSetStatus.members.find(m => m.state === 1);
const secondaries = replSetStatus.members.filter(m => m.state === 2);
return {
replicated: true,
setName: replSetStatus.set,
primary: primary ? {
name: primary.name,
health: primary.health,
uptime: primary.uptime
} : null,
secondaries: secondaries.map(s => ({
name: s.name,
health: s.health,
lag: primary && s.optimeDate ? primary.optimeDate - s.optimeDate : 0,
uptime: s.uptime
})),
totalMembers: replSetStatus.members.length
};
} catch (error) {
return { replicated: false, error: error.message };
}
}
async collectShardingMetrics() {
try {
const shardingStatus = await this.adminDb.command({ isdbgrid: 1 });
if (!shardingStatus.isdbgrid) {
return { sharded: false };
}
const configDB = this.client.db('config');
const shards = await configDB.collection('shards').find().toArray();
const chunks = await configDB.collection('chunks').find().toArray();
return {
sharded: true,
shardCount: shards.length,
totalChunks: chunks.length,
shards: shards.map(s => ({
id: s._id,
host: s.host,
state: s.state
}))
};
} catch (error) {
return { sharded: false, error: error.message };
}
}
async startPerformanceCollection() {
console.log('Starting continuous performance metrics collection...');
// Collect metrics at regular intervals
setInterval(async () => {
try {
await this.collectComprehensivePerformanceMetrics();
} catch (error) {
console.error('Error in scheduled performance collection:', error);
}
}, this.config.metricsCollectionInterval);
// Generate reports at longer intervals
setInterval(async () => {
try {
await this.generatePerformanceReport();
} catch (error) {
console.error('Error in scheduled report generation:', error);
}
}, this.config.performanceReportInterval);
}
updateRealTimeMetrics(performanceData) {
// Update in-memory metrics for real-time dashboard
this.metrics.operationCounts.set('current', performanceData.operations);
this.metrics.responseTimes.set('current', performanceData.queryPerformance);
this.metrics.indexUsage.set('current', performanceData.indexPerformance);
this.metrics.collectionMetrics.set('current', performanceData.collections);
}
async checkPerformanceAlerts(performanceData) {
const alerts = [];
// Check response time thresholds
const avgResponseTime = performanceData.queryPerformance?.summary?.avgResponseTimeOverall || 0;
if (avgResponseTime > this.config.alertThresholds.avgResponseTime) {
alerts.push({
type: 'high_response_time',
severity: 'warning',
message: `Average response time ${avgResponseTime}ms exceeds threshold ${this.config.alertThresholds.avgResponseTime}ms`
});
}
// Check collection scans
const collectionScans = performanceData.queryPerformance?.summary?.collectionScansCount || 0;
if (collectionScans > 0) {
alerts.push({
type: 'collection_scans',
severity: 'warning',
message: `${collectionScans} queries performing collection scans`
});
}
// Process alerts if any
if (alerts.length > 0 && this.config.enableRealTimeAlerts) {
await this.processPerformanceAlerts(alerts);
}
}
async processPerformanceAlerts(alerts) {
for (const alert of alerts) {
console.warn(`⚠️ Performance Alert [${alert.severity}]: ${alert.message}`);
// Store alert for historical tracking
await this.collections.performanceMetrics.insertOne({
type: 'alert',
alert: alert,
timestamp: new Date()
});
// Trigger external alerting systems here
// (email, Slack, PagerDuty, etc.)
}
}
}
// Benefits of MongoDB Advanced Performance Monitoring:
// - Comprehensive query profiling with detailed execution analysis
// - Advanced index usage analysis and optimization recommendations
// - Collection-level performance metrics and storage efficiency tracking
// - Real-time performance monitoring with automated alerting
// - Intelligent optimization recommendations based on actual usage patterns
// - Integration with MongoDB's native profiling and statistics capabilities
// - Production-ready monitoring suitable for large-scale deployments
// - Historical performance trend analysis and baseline establishment
// - Automated performance report generation with executive summaries
// - SQL-compatible monitoring operations through QueryLeaf integration
module.exports = {
AdvancedMongoPerformanceMonitor
};
Understanding MongoDB Performance Monitoring Architecture
Advanced Profiling and Optimization Strategies
Implement sophisticated monitoring patterns for production MongoDB deployments:
// Production-ready MongoDB performance monitoring with advanced optimization patterns
class ProductionPerformanceOptimizer extends AdvancedMongoPerformanceMonitor {
constructor(db, productionConfig) {
super(db, productionConfig);
this.productionConfig = {
...productionConfig,
enablePredictiveAnalytics: true,
enableAutomaticOptimization: false, // Require manual approval
enableCapacityPlanning: true,
enablePerformanceBaseline: true,
enableAnomalyDetection: true,
enableCostOptimization: true
};
this.setupProductionOptimizations();
this.initializePredictiveAnalytics();
this.setupCapacityPlanningModels();
}
async implementAdvancedQueryOptimization(optimizationConfig) {
console.log('Implementing advanced query optimization strategies...');
const optimizationStrategies = {
// Intelligent index recommendations
indexOptimization: {
compoundIndexAnalysis: true,
partialIndexOptimization: true,
sparseIndexRecommendations: true,
indexIntersectionAnalysis: true
},
// Query pattern analysis
queryOptimization: {
aggregationPipelineOptimization: true,
queryShapeAnalysis: true,
executionPlanOptimization: true,
sortOptimization: true
},
// Schema optimization
schemaOptimization: {
documentStructureAnalysis: true,
fieldUsageAnalysis: true,
embeddingVsReferencingAnalysis: true,
denormalizationRecommendations: true
},
// Resource optimization
resourceOptimization: {
connectionPoolOptimization: true,
memoryUsageOptimization: true,
diskIOOptimization: true,
networkOptimization: true
}
};
return await this.executeOptimizationStrategies(optimizationStrategies);
}
async setupCapacityPlanningModels(planningRequirements) {
console.log('Setting up capacity planning and growth prediction models...');
const planningModels = {
// Growth prediction models
growthPrediction: {
documentGrowthRate: await this.analyzeDocumentGrowthRate(),
storageGrowthProjection: await this.projectStorageGrowth(),
queryVolumeProjection: await this.projectQueryVolumeGrowth(),
indexGrowthAnalysis: await this.analyzeIndexGrowthPatterns()
},
// Resource requirement models
resourcePlanning: {
cpuRequirements: await this.calculateCPURequirements(),
memoryRequirements: await this.calculateMemoryRequirements(),
storageRequirements: await this.calculateStorageRequirements(),
networkRequirements: await this.calculateNetworkRequirements()
},
// Scaling recommendations
scalingStrategy: {
verticalScaling: await this.analyzeVerticalScalingNeeds(),
horizontalScaling: await this.analyzeHorizontalScalingNeeds(),
shardingRecommendations: await this.analyzeShardingRequirements(),
replicaSetOptimization: await this.analyzeReplicaSetOptimization()
}
};
return await this.implementCapacityPlanningModels(planningModels);
}
async enableAnomalyDetection(detectionConfig) {
console.log('Enabling performance anomaly detection system...');
const anomalyDetectionSystem = {
// Statistical anomaly detection
statisticalDetection: {
responseTimeAnomalies: true,
queryVolumeAnomalies: true,
indexUsageAnomalies: true,
resourceUsageAnomalies: true
},
// Machine learning based detection
mlDetection: {
queryPatternAnomalies: true,
performanceDegradationPrediction: true,
capacityThresholdPrediction: true,
failurePatternRecognition: true
},
// Business logic anomalies
businessLogicDetection: {
unexpectedDataPatterns: true,
unusualApplicationBehavior: true,
securityAnomalies: true,
complianceViolations: true
}
};
return await this.implementAnomalyDetectionSystem(anomalyDetectionSystem);
}
}
SQL-Style Performance Monitoring with QueryLeaf
QueryLeaf provides familiar SQL syntax for MongoDB performance monitoring and optimization operations:
-- QueryLeaf advanced performance monitoring and optimization with SQL-familiar syntax
-- Enable comprehensive database profiling with advanced configuration
CONFIGURE PROFILING
SET profiling_level = 2,
slow_operation_threshold = 100,
sample_rate = 1.0,
filter_criteria = {
include_slow_ops: true,
include_collection_scans: true,
include_lock_operations: true,
include_index_analysis: true
},
collection_size = '100MB',
max_documents = 1000000;
-- Comprehensive performance metrics analysis with detailed insights
WITH performance_analysis AS (
SELECT
-- Operation characteristics
operation_type,
collection_name,
execution_time_ms,
documents_examined,
documents_returned,
index_keys_examined,
execution_plan,
-- Efficiency calculations
CASE
WHEN documents_examined > 0 THEN
CAST(documents_returned AS FLOAT) / documents_examined
ELSE 1.0
END as query_efficiency,
-- Performance categorization
CASE
WHEN execution_time_ms < 10 THEN 'very_fast'
WHEN execution_time_ms < 100 THEN 'fast'
WHEN execution_time_ms < 500 THEN 'moderate'
WHEN execution_time_ms < 2000 THEN 'slow'
ELSE 'very_slow'
END as performance_category,
-- Index usage analysis
CASE
WHEN execution_plan LIKE '%IXSCAN%' THEN 'index_scan'
WHEN execution_plan LIKE '%COLLSCAN%' THEN 'collection_scan'
ELSE 'other'
END as index_usage_type,
-- Lock analysis
locks_acquired,
lock_wait_time_ms,
-- Resource usage
cpu_time_ms,
memory_usage_bytes,
-- Timestamp for trend analysis
DATE_TRUNC('minute', operation_timestamp) as time_bucket
FROM PROFILE_DATA
WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
AND database_name = CURRENT_DATABASE()
),
aggregated_metrics AS (
SELECT
collection_name,
operation_type,
index_usage_type,
time_bucket,
-- Operation volume metrics
COUNT(*) as operation_count,
-- Performance metrics
AVG(execution_time_ms) as avg_response_time,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY execution_time_ms) as median_response_time,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_response_time,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY execution_time_ms) as p99_response_time,
MIN(execution_time_ms) as min_response_time,
MAX(execution_time_ms) as max_response_time,
-- Efficiency metrics
AVG(query_efficiency) as avg_efficiency,
SUM(documents_examined) as total_docs_examined,
SUM(documents_returned) as total_docs_returned,
SUM(index_keys_examined) as total_index_keys_examined,
-- Performance distribution
COUNT(*) FILTER (WHERE performance_category = 'very_fast') as very_fast_ops,
COUNT(*) FILTER (WHERE performance_category = 'fast') as fast_ops,
COUNT(*) FILTER (WHERE performance_category = 'moderate') as moderate_ops,
COUNT(*) FILTER (WHERE performance_category = 'slow') as slow_ops,
COUNT(*) FILTER (WHERE performance_category = 'very_slow') as very_slow_ops,
-- Resource utilization
AVG(cpu_time_ms) as avg_cpu_time,
AVG(memory_usage_bytes) as avg_memory_usage,
SUM(lock_wait_time_ms) as total_lock_wait_time,
-- Index efficiency
COUNT(*) FILTER (WHERE index_usage_type = 'collection_scan') as collection_scan_count,
COUNT(*) FILTER (WHERE index_usage_type = 'index_scan') as index_scan_count,
-- Calculate performance score
(
-- Response time component (lower is better)
(1000 - LEAST(AVG(execution_time_ms), 1000)) / 1000 * 40 +
-- Efficiency component (higher is better)
AVG(query_efficiency) * 30 +
-- Index usage component (index scans preferred)
CASE
WHEN COUNT(*) FILTER (WHERE index_usage_type = 'index_scan') >
COUNT(*) FILTER (WHERE index_usage_type = 'collection_scan') THEN 20
ELSE 0
END +
-- Volume stability component
LEAST(COUNT(*) / 100.0, 1.0) * 10
) as performance_score
FROM performance_analysis
GROUP BY collection_name, operation_type, index_usage_type, time_bucket
),
performance_trends AS (
SELECT
am.*,
-- Trend analysis with window functions
LAG(avg_response_time) OVER (
PARTITION BY collection_name, operation_type, index_usage_type
ORDER BY time_bucket
) as prev_response_time,
LAG(operation_count) OVER (
PARTITION BY collection_name, operation_type, index_usage_type
ORDER BY time_bucket
) as prev_operation_count,
-- Moving averages for smoothing
AVG(avg_response_time) OVER (
PARTITION BY collection_name, operation_type, index_usage_type
ORDER BY time_bucket
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as moving_avg_response_time,
AVG(performance_score) OVER (
PARTITION BY collection_name, operation_type, index_usage_type
ORDER BY time_bucket
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as moving_avg_performance_score
FROM aggregated_metrics am
)
SELECT
collection_name,
operation_type,
index_usage_type,
time_bucket,
-- Core performance metrics
operation_count,
ROUND(avg_response_time::NUMERIC, 2) as avg_response_time_ms,
ROUND(median_response_time::NUMERIC, 2) as median_response_time_ms,
ROUND(p95_response_time::NUMERIC, 2) as p95_response_time_ms,
ROUND(p99_response_time::NUMERIC, 2) as p99_response_time_ms,
-- Efficiency metrics
ROUND((avg_efficiency * 100)::NUMERIC, 2) as efficiency_percentage,
total_docs_examined,
total_docs_returned,
-- Performance distribution
JSON_OBJECT(
'very_fast', very_fast_ops,
'fast', fast_ops,
'moderate', moderate_ops,
'slow', slow_ops,
'very_slow', very_slow_ops
) as performance_distribution,
-- Index usage analysis
collection_scan_count,
index_scan_count,
ROUND(
(index_scan_count::FLOAT / NULLIF(collection_scan_count + index_scan_count, 0) * 100)::NUMERIC,
2
) as index_usage_percentage,
-- Performance scoring
ROUND(performance_score::NUMERIC, 2) as performance_score,
CASE
WHEN performance_score >= 90 THEN 'excellent'
WHEN performance_score >= 75 THEN 'good'
WHEN performance_score >= 60 THEN 'fair'
WHEN performance_score >= 40 THEN 'poor'
ELSE 'critical'
END as performance_grade,
-- Trend analysis
CASE
WHEN prev_response_time IS NOT NULL THEN
ROUND(((avg_response_time - prev_response_time) / prev_response_time * 100)::NUMERIC, 2)
ELSE NULL
END as response_time_change_percent,
CASE
WHEN prev_operation_count IS NOT NULL THEN
ROUND(((operation_count - prev_operation_count)::FLOAT / prev_operation_count * 100)::NUMERIC, 2)
ELSE NULL
END as volume_change_percent,
-- Moving averages for trend smoothing
ROUND(moving_avg_response_time::NUMERIC, 2) as trend_response_time,
ROUND(moving_avg_performance_score::NUMERIC, 2) as trend_performance_score,
-- Resource utilization
ROUND(avg_cpu_time::NUMERIC, 2) as avg_cpu_time_ms,
ROUND((avg_memory_usage / 1024.0 / 1024)::NUMERIC, 2) as avg_memory_usage_mb,
total_lock_wait_time as total_lock_wait_ms,
-- Alert indicators
CASE
WHEN avg_response_time > 1000 THEN 'high_response_time'
WHEN collection_scan_count > index_scan_count THEN 'excessive_collection_scans'
WHEN avg_efficiency < 0.1 THEN 'low_efficiency'
WHEN total_lock_wait_time > 1000 THEN 'lock_contention'
ELSE 'normal'
END as alert_status,
CURRENT_TIMESTAMP as analysis_timestamp
FROM performance_trends
WHERE operation_count > 0 -- Filter out empty buckets
ORDER BY
performance_score ASC, -- Show problematic areas first
avg_response_time DESC,
collection_name,
operation_type;
-- Advanced index analysis and optimization recommendations
WITH index_statistics AS (
SELECT
collection_name,
index_name,
index_spec,
index_size_bytes,
-- Usage statistics
access_count,
last_access_time,
-- Index characteristics
is_unique,
is_sparse,
is_partial,
is_compound,
-- Calculate metrics
EXTRACT(DAYS FROM CURRENT_TIMESTAMP - last_access_time) as days_since_access,
-- Index type classification
CASE
WHEN access_count = 0 THEN 'unused'
WHEN access_count < 100 THEN 'low_usage'
WHEN access_count < 10000 THEN 'medium_usage'
ELSE 'high_usage'
END as usage_category,
-- Get collection statistics for context
(SELECT document_count FROM COLLECTION_STATS cs WHERE cs.collection_name = idx.collection_name) as collection_doc_count,
(SELECT total_size_bytes FROM COLLECTION_STATS cs WHERE cs.collection_name = idx.collection_name) as collection_size_bytes
FROM INDEX_STATS idx
WHERE database_name = CURRENT_DATABASE()
),
index_analysis AS (
SELECT
*,
-- Calculate index efficiency metrics
CASE
WHEN collection_size_bytes > 0 THEN
CAST(index_size_bytes AS FLOAT) / collection_size_bytes
ELSE 0
END as size_ratio,
-- Usage intensity
CASE
WHEN collection_doc_count > 0 THEN
CAST(access_count AS FLOAT) / collection_doc_count
ELSE 0
END as usage_intensity,
-- ROI calculation (simplified)
CASE
WHEN index_size_bytes > 0 THEN
CAST(access_count AS FLOAT) / (index_size_bytes / 1024 / 1024) -- accesses per MB
ELSE 0
END as access_per_mb,
-- Optimization opportunity scoring
CASE
WHEN access_count = 0 AND index_name != '_id_' THEN 100 -- Remove unused
WHEN access_count < 10 AND days_since_access > 30 THEN 80 -- Consider removal
WHEN size_ratio > 0.5 THEN 60 -- Oversized index
WHEN is_compound = false AND usage_intensity < 0.01 THEN 40 -- Underutilized single field
ELSE 0
END as optimization_priority
FROM index_statistics
),
optimization_recommendations AS (
SELECT
collection_name,
index_name,
usage_category,
-- Current metrics
access_count,
ROUND((index_size_bytes / 1024.0 / 1024)::NUMERIC, 2) as index_size_mb,
ROUND((size_ratio * 100)::NUMERIC, 2) as size_ratio_percent,
days_since_access,
-- Optimization recommendations
CASE
WHEN optimization_priority >= 100 THEN
JSON_OBJECT(
'action', 'remove_index',
'reason', 'Index is unused and consuming storage',
'impact', 'Reduced storage usage and faster writes',
'priority', 'high'
)
WHEN optimization_priority >= 80 THEN
JSON_OBJECT(
'action', 'consider_removal',
'reason', 'Index has very low usage and is stale',
'impact', 'Potential storage savings with minimal risk',
'priority', 'medium'
)
WHEN optimization_priority >= 60 THEN
JSON_OBJECT(
'action', 'optimize_index',
'reason', 'Index size is disproportionately large',
'impact', 'Consider sparse or partial index options',
'priority', 'medium'
)
WHEN optimization_priority >= 40 THEN
JSON_OBJECT(
'action', 'review_usage',
'reason', 'Single field index with low utilization',
'impact', 'Evaluate if compound index would be more effective',
'priority', 'low'
)
ELSE
JSON_OBJECT(
'action', 'maintain',
'reason', 'Index appears to be well utilized',
'impact', 'No immediate action required',
'priority', 'none'
)
END as recommendation,
-- Performance impact estimation
CASE
WHEN optimization_priority >= 80 THEN
JSON_OBJECT(
'storage_savings_mb', ROUND((index_size_bytes / 1024.0 / 1024)::NUMERIC, 2),
'write_performance_improvement', '5-15%',
'query_performance_impact', 'minimal'
)
WHEN optimization_priority >= 40 THEN
JSON_OBJECT(
'storage_savings_mb', ROUND((index_size_bytes / 1024.0 / 1024 * 0.3)::NUMERIC, 2),
'write_performance_improvement', '2-8%',
'query_performance_impact', 'requires_analysis'
)
ELSE
JSON_OBJECT(
'storage_savings_mb', 0,
'write_performance_improvement', '0%',
'query_performance_impact', 'none'
)
END as impact_estimate,
optimization_priority
FROM index_analysis
WHERE optimization_priority > 0
)
SELECT
collection_name,
index_name,
usage_category,
access_count,
index_size_mb,
size_ratio_percent,
days_since_access,
-- Recommendation details
JSON_EXTRACT(recommendation, '$.action') as recommended_action,
JSON_EXTRACT(recommendation, '$.reason') as recommendation_reason,
JSON_EXTRACT(recommendation, '$.impact') as expected_impact,
JSON_EXTRACT(recommendation, '$.priority') as priority_level,
-- Impact estimation
CAST(JSON_EXTRACT(impact_estimate, '$.storage_savings_mb') AS DECIMAL(10,2)) as potential_storage_savings_mb,
JSON_EXTRACT(impact_estimate, '$.write_performance_improvement') as write_performance_gain,
JSON_EXTRACT(impact_estimate, '$.query_performance_impact') as query_impact_assessment,
-- Implementation guidance
CASE
WHEN JSON_EXTRACT(recommendation, '$.action') = 'remove_index' THEN
'DROP INDEX ' || index_name || ' ON ' || collection_name
WHEN JSON_EXTRACT(recommendation, '$.action') = 'optimize_index' THEN
'Review index definition and consider sparse/partial options'
ELSE 'Monitor usage patterns before taking action'
END as implementation_command,
optimization_priority,
CURRENT_TIMESTAMP as analysis_date
FROM optimization_recommendations
ORDER BY optimization_priority DESC, index_size_mb DESC;
-- Real-time performance monitoring dashboard query
CREATE VIEW real_time_performance_dashboard AS
WITH current_metrics AS (
SELECT
-- Time-based grouping for real-time updates
DATE_TRUNC('minute', CURRENT_TIMESTAMP) as current_minute,
-- Operation volume in last minute
(SELECT COUNT(*) FROM PROFILE_DATA
WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 minute') as ops_per_minute,
-- Average response time in last minute
(SELECT AVG(execution_time_ms) FROM PROFILE_DATA
WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 minute') as avg_response_time_1m,
-- Collection scans in last minute
(SELECT COUNT(*) FROM PROFILE_DATA
WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 minute'
AND execution_plan LIKE '%COLLSCAN%') as collection_scans_1m,
-- Slow queries in last minute (>500ms)
(SELECT COUNT(*) FROM PROFILE_DATA
WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 minute'
AND execution_time_ms > 500) as slow_queries_1m,
-- Connection statistics
(SELECT current_connections FROM CONNECTION_STATS) as current_connections,
(SELECT max_connections FROM CONNECTION_STATS) as max_connections,
-- Memory usage
(SELECT resident_memory_mb FROM MEMORY_STATS) as memory_usage_mb,
(SELECT cache_hit_ratio FROM MEMORY_STATS) as cache_hit_ratio,
-- Storage metrics
(SELECT SUM(data_size_bytes) FROM COLLECTION_STATS) as total_data_size_bytes,
(SELECT SUM(storage_size_bytes) FROM COLLECTION_STATS) as total_storage_size_bytes,
(SELECT SUM(index_size_bytes) FROM COLLECTION_STATS) as total_index_size_bytes
),
health_indicators AS (
SELECT
cm.*,
-- Calculate health scores
CASE
WHEN avg_response_time_1m > 1000 THEN 'critical'
WHEN avg_response_time_1m > 500 THEN 'warning'
WHEN avg_response_time_1m > 100 THEN 'ok'
ELSE 'excellent'
END as response_time_health,
CASE
WHEN collection_scans_1m > 10 THEN 'critical'
WHEN collection_scans_1m > 5 THEN 'warning'
WHEN collection_scans_1m > 0 THEN 'ok'
ELSE 'excellent'
END as index_usage_health,
CASE
WHEN current_connections::FLOAT / NULLIF(max_connections, 0) > 0.9 THEN 'critical'
WHEN current_connections::FLOAT / NULLIF(max_connections, 0) > 0.8 THEN 'warning'
WHEN current_connections::FLOAT / NULLIF(max_connections, 0) > 0.7 THEN 'ok'
ELSE 'excellent'
END as connection_health,
CASE
WHEN cache_hit_ratio < 0.8 THEN 'critical'
WHEN cache_hit_ratio < 0.9 THEN 'warning'
WHEN cache_hit_ratio < 0.95 THEN 'ok'
ELSE 'excellent'
END as memory_health
FROM current_metrics cm
)
SELECT
current_minute,
-- Real-time performance metrics
ops_per_minute,
ROUND(avg_response_time_1m::NUMERIC, 2) as avg_response_time_ms,
collection_scans_1m,
slow_queries_1m,
-- Health indicators
response_time_health,
index_usage_health,
connection_health,
memory_health,
-- Overall health score
CASE
WHEN response_time_health = 'critical' OR index_usage_health = 'critical' OR
connection_health = 'critical' OR memory_health = 'critical' THEN 'critical'
WHEN response_time_health = 'warning' OR index_usage_health = 'warning' OR
connection_health = 'warning' OR memory_health = 'warning' THEN 'warning'
WHEN response_time_health = 'ok' OR index_usage_health = 'ok' OR
connection_health = 'ok' OR memory_health = 'ok' THEN 'ok'
ELSE 'excellent'
END as overall_health,
-- Resource utilization
current_connections,
max_connections,
ROUND((current_connections::FLOAT / NULLIF(max_connections, 0) * 100)::NUMERIC, 2) as connection_usage_percent,
memory_usage_mb,
ROUND((cache_hit_ratio * 100)::NUMERIC, 2) as cache_hit_percent,
-- Storage information
ROUND((total_data_size_bytes / 1024.0 / 1024 / 1024)::NUMERIC, 2) as total_data_gb,
ROUND((total_storage_size_bytes / 1024.0 / 1024 / 1024)::NUMERIC, 2) as total_storage_gb,
ROUND((total_index_size_bytes / 1024.0 / 1024 / 1024)::NUMERIC, 2) as total_index_gb,
-- Efficiency metrics
ROUND((total_data_size_bytes::FLOAT / NULLIF(total_storage_size_bytes, 0))::NUMERIC, 4) as storage_efficiency,
ROUND((total_index_size_bytes::FLOAT / NULLIF(total_data_size_bytes, 0))::NUMERIC, 4) as index_to_data_ratio,
-- Alert conditions
CASE
WHEN ops_per_minute = 0 THEN 'no_activity'
WHEN slow_queries_1m > ops_per_minute * 0.1 THEN 'high_slow_query_ratio'
WHEN collection_scans_1m > ops_per_minute * 0.05 THEN 'excessive_collection_scans'
ELSE 'normal'
END as alert_condition,
-- Recommendations
ARRAY[
CASE WHEN response_time_health IN ('critical', 'warning') THEN 'Review slow queries and indexing strategy' END,
CASE WHEN index_usage_health IN ('critical', 'warning') THEN 'Add indexes to eliminate collection scans' END,
CASE WHEN connection_health IN ('critical', 'warning') THEN 'Monitor connection pooling and usage patterns' END,
CASE WHEN memory_health IN ('critical', 'warning') THEN 'Review memory allocation and cache settings' END
]::TEXT[] as immediate_recommendations
FROM health_indicators;
-- QueryLeaf provides comprehensive MongoDB performance monitoring capabilities:
-- 1. SQL-familiar syntax for MongoDB profiling configuration and analysis
-- 2. Advanced performance metrics collection with detailed execution insights
-- 3. Real-time index usage analysis and optimization recommendations
-- 4. Comprehensive query performance analysis with efficiency scoring
-- 5. Production-ready monitoring dashboards with health indicators
-- 6. Automated optimization recommendations based on actual usage patterns
-- 7. Trend analysis and performance baseline establishment
-- 8. Integration with MongoDB's native profiling and statistics systems
-- 9. Advanced alerting and anomaly detection capabilities
-- 10. Capacity planning and resource optimization insights
Best Practices for Production MongoDB Performance Monitoring
Monitoring Strategy Implementation
Essential principles for effective MongoDB performance monitoring and optimization:
- Profiling Configuration: Configure appropriate profiling levels and sampling rates to balance insight with performance impact
- Metrics Collection: Implement comprehensive metrics collection covering queries, indexes, resources, and business operations
- Baseline Establishment: Establish performance baselines to enable meaningful trend analysis and anomaly detection
- Alert Strategy: Design intelligent alerting that focuses on actionable issues rather than metric noise
- Optimization Workflow: Implement systematic optimization workflows with testing and validation procedures
- Capacity Planning: Utilize historical data and growth patterns for proactive capacity planning and scaling decisions
Production Deployment Optimization
Optimize MongoDB monitoring deployments for enterprise environments:
- Automated Analysis: Implement automated performance analysis and recommendation generation to reduce manual overhead
- Integration Ecosystem: Integrate monitoring with existing observability platforms and operational workflows
- Cost Optimization: Balance monitoring comprehensiveness with resource costs and performance impact
- Scalability Design: Design monitoring systems that scale effectively with database growth and complexity
- Security Integration: Ensure monitoring systems comply with security requirements and access control policies
- Documentation Standards: Maintain comprehensive documentation of monitoring configurations, thresholds, and procedures
Conclusion
MongoDB performance monitoring and optimization requires sophisticated tooling and methodologies that understand the unique characteristics of document databases, distributed architectures, and dynamic schema patterns. Advanced monitoring capabilities including query profiling, index analysis, resource tracking, and automated optimization recommendations enable proactive performance management that prevents issues before they impact application users.
Key MongoDB Performance Monitoring benefits include:
- Comprehensive Profiling: Deep insights into query execution, index usage, and resource utilization patterns
- Intelligent Optimization: Automated analysis and recommendations based on actual usage patterns and performance data
- Real-time Monitoring: Continuous performance tracking with proactive alerting and anomaly detection
- Capacity Planning: Data-driven insights for scaling decisions and resource optimization
- Production Integration: Enterprise-ready monitoring that integrates with existing operational workflows
- SQL Accessibility: Familiar SQL-style monitoring operations through QueryLeaf for accessible performance management
Whether you're managing development environments, production deployments, or large-scale distributed MongoDB systems, comprehensive performance monitoring with QueryLeaf's familiar SQL interface provides the foundation for optimal database performance and reliability.
QueryLeaf Integration: QueryLeaf automatically translates SQL-style monitoring queries into MongoDB's native profiling and statistics operations, making advanced performance analysis accessible to SQL-oriented teams. Complex profiling configurations, index analysis, and optimization recommendations are seamlessly handled through familiar SQL constructs, enabling sophisticated performance management without requiring deep MongoDB expertise.
The combination of MongoDB's robust performance monitoring capabilities with SQL-style analysis operations makes it an ideal platform for applications requiring both advanced performance optimization and familiar database management patterns, ensuring your MongoDB deployments maintain optimal performance as they scale and evolve.