MongoDB Indexing Strategies and Performance Optimization: Advanced Techniques for High-Performance Database Operations
High-performance database applications depend heavily on strategic indexing to deliver fast query response times, efficient data retrieval, and optimal resource utilization. Poor indexing decisions can lead to slow queries, excessive memory usage, and degraded application performance that becomes increasingly problematic as data volumes grow.
MongoDB's flexible indexing system provides powerful capabilities for optimizing query performance across diverse data patterns and access scenarios. Unlike rigid relational indexing approaches, MongoDB indexes support complex document structures, array fields, geospatial data, and text search, enabling sophisticated optimization strategies that align with modern application requirements while maintaining query performance at scale.
The Traditional Database Indexing Limitations
Conventional relational database indexing approaches have significant constraints for modern application patterns:
-- Traditional PostgreSQL indexing - rigid structure with limited flexibility
-- Basic single-column indexes with limited optimization potential
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_users_country ON users(country);
-- Simple compound index with fixed column order
CREATE INDEX idx_users_country_status_created ON users(country, status, created_at);
-- Basic partial index (PostgreSQL specific)
CREATE INDEX idx_active_users_email ON users(email)
WHERE status = 'active';
-- Limited text search capabilities
CREATE INDEX idx_users_name_fts ON users
USING GIN(to_tsvector('english', first_name || ' ' || last_name));
-- Complex query with multiple conditions
WITH user_search AS (
SELECT
user_id,
email,
first_name,
last_name,
status,
country,
created_at,
last_login_at,
-- Multiple index usage may not be optimal
CASE
WHEN status = 'active' AND last_login_at >= CURRENT_DATE - INTERVAL '30 days' THEN 'active_recent'
WHEN status = 'active' AND last_login_at < CURRENT_DATE - INTERVAL '30 days' THEN 'active_stale'
WHEN status = 'inactive' THEN 'inactive'
ELSE 'pending'
END as user_category,
-- Basic scoring for relevance
CASE country
WHEN 'US' THEN 3
WHEN 'CA' THEN 2
WHEN 'UK' THEN 2
ELSE 1
END as priority_score
FROM users
WHERE
-- Multiple WHERE conditions that may require different indexes
status IN ('active', 'inactive')
AND country IN ('US', 'CA', 'UK', 'AU', 'DE')
AND created_at >= CURRENT_DATE - INTERVAL '2 years'
AND (
email ILIKE '%@company.com' OR
first_name ILIKE 'John%' OR
last_name ILIKE 'Smith%'
)
),
user_enrichment AS (
SELECT
us.*,
-- Subquery requiring additional index
(SELECT COUNT(*)
FROM orders o
WHERE o.user_id = us.user_id
AND o.created_at >= CURRENT_DATE - INTERVAL '1 year'
) as orders_last_year,
-- Another subquery with different access pattern
(SELECT SUM(total_amount)
FROM orders o
WHERE o.user_id = us.user_id
AND o.status = 'completed'
) as total_spent,
-- JSON field access (limited optimization)
preferences->>'theme' as preferred_theme,
preferences->>'language' as preferred_language,
-- Array field contains check (poor performance without GIN)
CASE
WHEN tags && ARRAY['premium', 'vip'] THEN true
ELSE false
END as is_premium_user
FROM user_search us
),
final_results AS (
SELECT
ue.user_id,
ue.email,
ue.first_name,
ue.last_name,
ue.status,
ue.country,
ue.user_category,
ue.priority_score,
ue.orders_last_year,
ue.total_spent,
ue.preferred_theme,
ue.preferred_language,
ue.is_premium_user,
-- Complex ranking calculation
(ue.priority_score * 0.3 +
CASE
WHEN ue.orders_last_year > 10 THEN 5
WHEN ue.orders_last_year > 5 THEN 3
WHEN ue.orders_last_year > 0 THEN 1
ELSE 0
END * 0.4 +
CASE
WHEN ue.total_spent > 1000 THEN 5
WHEN ue.total_spent > 500 THEN 3
WHEN ue.total_spent > 100 THEN 1
ELSE 0
END * 0.3
) as relevance_score,
-- Row number for pagination
ROW_NUMBER() OVER (
ORDER BY
ue.priority_score DESC,
ue.orders_last_year DESC,
ue.total_spent DESC,
ue.created_at DESC
) as row_num,
COUNT(*) OVER () as total_results
FROM user_enrichment ue
WHERE ue.orders_last_year > 0 OR ue.total_spent > 50
)
SELECT
user_id,
email,
first_name || ' ' || last_name as full_name,
status,
country,
user_category,
orders_last_year,
ROUND(total_spent::numeric, 2) as total_spent,
is_premium_user,
ROUND(relevance_score::numeric, 2) as relevance_score,
row_num,
total_results
FROM final_results
WHERE row_num BETWEEN 1 AND 50
ORDER BY relevance_score DESC, row_num ASC;
-- PostgreSQL indexing problems:
-- 1. Fixed column order in compound indexes limits query flexibility
-- 2. Limited support for JSON field indexing and optimization
-- 3. Poor performance with array field operations and contains queries
-- 4. Complex partial index syntax with limited conditional logic
-- 5. Inefficient handling of multi-field text search scenarios
-- 6. Index maintenance overhead increases significantly with table size
-- 7. Limited support for dynamic query patterns and field combinations
-- 8. Poor integration with application-level data structures
-- 9. Complex index selection logic requires deep database expertise
-- 10. Inflexible index types for specialized data patterns (geo, time-series)
-- Additional index requirements for above query
CREATE INDEX idx_users_compound_search ON users(status, country, created_at)
WHERE status IN ('active', 'inactive');
CREATE INDEX idx_users_email_pattern ON users(email)
WHERE email LIKE '%@company.com';
CREATE INDEX idx_users_name_pattern ON users(first_name, last_name)
WHERE first_name LIKE 'John%' OR last_name LIKE 'Smith%';
CREATE INDEX idx_orders_user_recent ON orders(user_id, created_at)
WHERE created_at >= CURRENT_DATE - INTERVAL '1 year';
CREATE INDEX idx_orders_user_completed ON orders(user_id, total_amount)
WHERE status = 'completed';
-- JSON field indexing (limited capabilities)
CREATE INDEX idx_users_preferences_gin ON users USING GIN(preferences);
-- Array field indexing
CREATE INDEX idx_users_tags_gin ON users USING GIN(tags);
-- MySQL approach (even more limited)
-- Basic indexes only
CREATE INDEX idx_mysql_users_email ON mysql_users(email);
CREATE INDEX idx_mysql_users_status_country ON mysql_users(status, country);
CREATE INDEX idx_mysql_users_created ON mysql_users(created_at);
-- Limited JSON support in older versions
-- ALTER TABLE mysql_users ADD INDEX idx_preferences ((preferences->>'$.theme'));
-- Basic query with limited optimization
SELECT
user_id,
email,
first_name,
last_name,
status,
country,
created_at
FROM mysql_users
WHERE status = 'active'
AND country IN ('US', 'CA')
AND created_at >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR)
ORDER BY created_at DESC
LIMIT 50;
-- MySQL limitations:
-- - Very limited JSON indexing capabilities
-- - No partial indexes or conditional indexing
-- - Basic compound index support with rigid column ordering
-- - Poor performance with complex queries and joins
-- - Limited text search capabilities without additional engines
-- - Minimal support for array operations and specialized data types
-- - Simple index optimization with limited query planner sophistication
MongoDB's advanced indexing system provides comprehensive optimization capabilities:
// MongoDB Advanced Indexing - flexible, powerful, and application-optimized
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('user_analytics_platform');
// Advanced MongoDB indexing strategy manager
class MongoDBIndexingManager {
constructor(db) {
this.db = db;
this.collections = {
users: db.collection('users'),
orders: db.collection('orders'),
products: db.collection('products'),
analytics: db.collection('analytics'),
indexMetrics: db.collection('index_metrics')
};
this.indexingStrategies = new Map();
this.performanceTargets = {
maxQueryTime: 100, // milliseconds
maxIndexSize: 1024, // MB
minSelectivity: 0.01 // 1% selectivity threshold
};
}
async createComprehensiveIndexingStrategy() {
console.log('Creating comprehensive MongoDB indexing strategy...');
// 1. Single field indexes for basic queries
await this.createSingleFieldIndexes();
// 2. Compound indexes for complex multi-field queries
await this.createCompoundIndexes();
// 3. Partial indexes for filtered queries
await this.createPartialIndexes();
// 4. Text indexes for search functionality
await this.createTextSearchIndexes();
// 5. Geospatial indexes for location-based queries
await this.createGeospatialIndexes();
// 6. Sparse indexes for optional fields
await this.createSparseIndexes();
// 7. TTL indexes for data expiration
await this.createTTLIndexes();
// 8. Wildcard indexes for flexible schemas
await this.createWildcardIndexes();
console.log('Comprehensive indexing strategy implemented successfully');
}
async createSingleFieldIndexes() {
console.log('Creating optimized single field indexes...');
const userIndexes = [
// High-cardinality unique fields
{ email: 1 }, // Unique identifier, high selectivity
{ username: 1 }, // Unique identifier, high selectivity
// High-frequency filter fields
{ status: 1 }, // Limited values but frequently queried
{ country: 1 }, // Geographic filtering
{ accountType: 1 }, // User segmentation
// Temporal fields for range queries
{ createdAt: 1 }, // Registration date queries
{ lastLoginAt: 1 }, // Activity-based filtering
{ subscriptionExpiresAt: 1 }, // Subscription management
// Numerical fields for range and sort operations
{ totalSpent: -1 }, // Customer value analysis (descending)
{ loyaltyPoints: -1 }, // Rewards program queries
{ riskScore: 1 } // Security and fraud detection
];
for (const indexSpec of userIndexes) {
const fieldName = Object.keys(indexSpec)[0];
const indexName = `idx_users_${fieldName}`;
try {
await this.collections.users.createIndex(indexSpec, {
name: indexName,
background: true,
// Add performance hints
partialFilterExpression: this.getPartialFilterForField(fieldName)
});
console.log(`Created single field index: ${indexName}`);
await this.recordIndexMetrics(indexName, 'single_field', indexSpec);
} catch (error) {
console.error(`Failed to create index ${indexName}:`, error);
}
}
// Order indexes for e-commerce scenarios
const orderIndexes = [
{ userId: 1 }, // Customer order lookup
{ status: 1 }, // Order status filtering
{ createdAt: -1 }, // Recent orders first
{ totalAmount: -1 }, // High-value orders
{ paymentStatus: 1 }, // Payment tracking
{ shippingMethod: 1 } // Fulfillment queries
];
for (const indexSpec of orderIndexes) {
const fieldName = Object.keys(indexSpec)[0];
const indexName = `idx_orders_${fieldName}`;
await this.collections.orders.createIndex(indexSpec, {
name: indexName,
background: true
});
console.log(`Created order index: ${indexName}`);
}
}
async createCompoundIndexes() {
console.log('Creating optimized compound indexes...');
// User compound indexes following ESR (Equality, Sort, Range) rule
const userCompoundIndexes = [
{
name: 'idx_users_country_status_created',
spec: { country: 1, status: 1, createdAt: -1 },
purpose: 'Geographic user filtering with status and recency',
queryPatterns: ['country + status filters', 'country + status + date range']
},
{
name: 'idx_users_status_activity_spent',
spec: { status: 1, lastLoginAt: -1, totalSpent: -1 },
purpose: 'Active user analysis with spending patterns',
queryPatterns: ['status + activity analysis', 'customer value segmentation']
},
{
name: 'idx_users_type_tier_points',
spec: { accountType: 1, loyaltyTier: 1, loyaltyPoints: -1 },
purpose: 'Customer segmentation and loyalty program queries',
queryPatterns: ['loyalty program analysis', 'customer tier management']
},
{
name: 'idx_users_email_verification_created',
spec: { 'verification.email': 1, 'verification.phone': 1, createdAt: -1 },
purpose: 'User verification status with registration timeline',
queryPatterns: ['verification status queries', 'onboarding analytics']
},
{
name: 'idx_users_preferences_activity',
spec: { 'preferences.marketing': 1, 'preferences.notifications': 1, lastLoginAt: -1 },
purpose: 'Marketing segmentation with activity correlation',
queryPatterns: ['marketing campaign targeting', 'notification preferences']
}
];
for (const indexConfig of userCompoundIndexes) {
try {
await this.collections.users.createIndex(indexConfig.spec, {
name: indexConfig.name,
background: true
});
console.log(`Created compound index: ${indexConfig.name}`);
console.log(` Purpose: ${indexConfig.purpose}`);
console.log(` Query patterns: ${indexConfig.queryPatterns.join(', ')}`);
await this.recordIndexMetrics(indexConfig.name, 'compound', indexConfig.spec, {
purpose: indexConfig.purpose,
queryPatterns: indexConfig.queryPatterns
});
} catch (error) {
console.error(`Failed to create compound index ${indexConfig.name}:`, error);
}
}
// Order compound indexes for e-commerce analytics
const orderCompoundIndexes = [
{
name: 'idx_orders_user_status_date',
spec: { userId: 1, status: 1, createdAt: -1 },
purpose: 'Customer order history with status filtering'
},
{
name: 'idx_orders_status_payment_amount',
spec: { status: 1, paymentStatus: 1, totalAmount: -1 },
purpose: 'Revenue analysis and payment processing queries'
},
{
name: 'idx_orders_product_date_amount',
spec: { 'items.productId': 1, createdAt: -1, totalAmount: -1 },
purpose: 'Product performance analysis with sales trends'
},
{
name: 'idx_orders_shipping_region_date',
spec: { 'shippingAddress.country': 1, 'shippingAddress.state': 1, createdAt: -1 },
purpose: 'Geographic sales analysis and shipping optimization'
}
];
for (const indexConfig of orderCompoundIndexes) {
await this.collections.orders.createIndex(indexConfig.spec, {
name: indexConfig.name,
background: true
});
console.log(`Created order compound index: ${indexConfig.name}`);
}
}
async createPartialIndexes() {
console.log('Creating partial indexes for filtered queries...');
const partialIndexes = [
{
name: 'idx_users_active_email',
collection: 'users',
spec: { email: 1 },
filter: { status: 'active' },
purpose: 'Active user email lookups (reduces index size by ~70%)'
},
{
name: 'idx_users_premium_spending',
collection: 'users',
spec: { totalSpent: -1, loyaltyPoints: -1 },
filter: { accountType: 'premium' },
purpose: 'Premium customer analysis and loyalty tracking'
},
{
name: 'idx_users_recent_active',
collection: 'users',
spec: { lastLoginAt: -1, country: 1 },
filter: {
status: 'active',
lastLoginAt: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
},
purpose: 'Recently active users for engagement campaigns'
},
{
name: 'idx_orders_high_value_completed',
collection: 'orders',
spec: { totalAmount: -1, createdAt: -1 },
filter: {
status: 'completed',
totalAmount: { $gte: 500 }
},
purpose: 'High-value completed orders for VIP customer analysis'
},
{
name: 'idx_orders_pending_payment',
collection: 'orders',
spec: { createdAt: 1, userId: 1 },
filter: {
status: { $in: ['pending', 'processing'] },
paymentStatus: 'pending'
},
purpose: 'Orders requiring payment processing attention'
},
{
name: 'idx_users_verification_required',
collection: 'users',
spec: { createdAt: 1, riskScore: -1 },
filter: {
$or: [
{ 'verification.email': false },
{ 'verification.phone': false },
{ 'verification.identity': false }
]
},
purpose: 'Users requiring additional verification steps'
}
];
for (const partialIndex of partialIndexes) {
try {
const collection = this.collections[partialIndex.collection];
await collection.createIndex(partialIndex.spec, {
name: partialIndex.name,
partialFilterExpression: partialIndex.filter,
background: true
});
console.log(`Created partial index: ${partialIndex.name}`);
console.log(` Filter: ${JSON.stringify(partialIndex.filter)}`);
console.log(` Purpose: ${partialIndex.purpose}`);
// Measure index size reduction
const fullIndexStats = await this.estimateIndexSize(partialIndex.spec);
const partialIndexStats = await collection.aggregate([
{ $match: partialIndex.filter },
{ $count: "documentCount" }
]).toArray();
const reductionPercent = ((1 - (partialIndexStats[0]?.documentCount || 0) / fullIndexStats.documentCount) * 100).toFixed(1);
console.log(` Index size reduction: ~${reductionPercent}%`);
} catch (error) {
console.error(`Failed to create partial index ${partialIndex.name}:`, error);
}
}
}
async createTextSearchIndexes() {
console.log('Creating text search indexes for full-text search...');
const textIndexes = [
{
name: 'idx_users_fulltext_search',
collection: 'users',
spec: {
firstName: 'text',
lastName: 'text',
email: 'text',
'profile.bio': 'text',
'profile.company': 'text'
},
weights: {
firstName: 10,
lastName: 10,
email: 5,
'profile.bio': 1,
'profile.company': 3
},
purpose: 'Comprehensive user search across name, email, and profile data'
},
{
name: 'idx_products_search',
collection: 'products',
spec: {
name: 'text',
description: 'text',
brand: 'text',
'tags': 'text',
'specifications.features': 'text'
},
weights: {
name: 20,
brand: 15,
tags: 10,
description: 5,
'specifications.features': 3
},
purpose: 'Product catalog search with relevance weighting'
},
{
name: 'idx_orders_search',
collection: 'orders',
spec: {
orderNumber: 'text',
'customer.email': 'text',
'items.productName': 'text',
'shippingAddress.street': 'text',
'shippingAddress.city': 'text'
},
weights: {
orderNumber: 20,
'customer.email': 15,
'items.productName': 10,
'shippingAddress.street': 3,
'shippingAddress.city': 5
},
purpose: 'Order search by number, customer, products, or shipping details'
}
];
for (const textIndex of textIndexes) {
try {
const collection = this.collections[textIndex.collection];
await collection.createIndex(textIndex.spec, {
name: textIndex.name,
weights: textIndex.weights,
background: true,
// Configure text search options
default_language: 'english',
language_override: 'language' // Field name for document language
});
console.log(`Created text search index: ${textIndex.name}`);
console.log(` Purpose: ${textIndex.purpose}`);
console.log(` Weighted fields: ${Object.keys(textIndex.weights).join(', ')}`);
} catch (error) {
console.error(`Failed to create text index ${textIndex.name}:`, error);
}
}
}
async createGeospatialIndexes() {
console.log('Creating geospatial indexes for location-based queries...');
const geoIndexes = [
{
name: 'idx_users_location_2dsphere',
collection: 'users',
spec: { 'location.coordinates': '2dsphere' },
purpose: 'User location queries for proximity and regional analysis'
},
{
name: 'idx_orders_shipping_location',
collection: 'orders',
spec: { 'shippingAddress.coordinates': '2dsphere' },
purpose: 'Shipping destination analysis and route optimization'
},
{
name: 'idx_stores_location_2dsphere',
collection: 'stores',
spec: { 'address.coordinates': '2dsphere' },
purpose: 'Store locator and catchment area analysis'
}
];
for (const geoIndex of geoIndexes) {
try {
const collection = this.collections[geoIndex.collection] || this.db.collection(geoIndex.collection);
await collection.createIndex(geoIndex.spec, {
name: geoIndex.name,
background: true,
// 2dsphere specific options
'2dsphereIndexVersion': 3 // Use latest version
});
console.log(`Created geospatial index: ${geoIndex.name}`);
console.log(` Purpose: ${geoIndex.purpose}`);
} catch (error) {
console.error(`Failed to create geo index ${geoIndex.name}:`, error);
}
}
}
async createSparseIndexes() {
console.log('Creating sparse indexes for optional fields...');
const sparseIndexes = [
{
name: 'idx_users_social_profiles_sparse',
collection: 'users',
spec: { 'socialProfiles.twitter': 1, 'socialProfiles.linkedin': 1 },
purpose: 'Social media profile lookups (only for users with social profiles)'
},
{
name: 'idx_users_subscription_sparse',
collection: 'users',
spec: { 'subscription.planId': 1, 'subscription.renewsAt': 1 },
purpose: 'Subscription management (only for subscribed users)'
},
{
name: 'idx_users_referral_sparse',
collection: 'users',
spec: { 'referral.code': 1, 'referral.referredBy': 1 },
purpose: 'Referral program tracking (only for users in referral program)'
},
{
name: 'idx_orders_tracking_sparse',
collection: 'orders',
spec: { 'shipping.trackingNumber': 1, 'shipping.carrier': 1 },
purpose: 'Package tracking (only for shipped orders)'
}
];
for (const sparseIndex of sparseIndexes) {
try {
const collection = this.collections[sparseIndex.collection];
await collection.createIndex(sparseIndex.spec, {
name: sparseIndex.name,
sparse: true, // Skip documents where indexed fields are missing
background: true
});
console.log(`Created sparse index: ${sparseIndex.name}`);
console.log(` Purpose: ${sparseIndex.purpose}`);
} catch (error) {
console.error(`Failed to create sparse index ${sparseIndex.name}:`, error);
}
}
}
async createTTLIndexes() {
console.log('Creating TTL indexes for automatic data expiration...');
const ttlIndexes = [
{
name: 'idx_analytics_events_ttl',
collection: 'analytics',
spec: { createdAt: 1 },
expireAfterSeconds: 30 * 24 * 60 * 60, // 30 days
purpose: 'Automatic cleanup of analytics events after 30 days'
},
{
name: 'idx_user_sessions_ttl',
collection: 'userSessions',
spec: { lastActivity: 1 },
expireAfterSeconds: 7 * 24 * 60 * 60, // 7 days
purpose: 'Session cleanup after 7 days of inactivity'
},
{
name: 'idx_password_resets_ttl',
collection: 'passwordResets',
spec: { createdAt: 1 },
expireAfterSeconds: 24 * 60 * 60, // 24 hours
purpose: 'Password reset token expiration after 24 hours'
},
{
name: 'idx_email_verification_ttl',
collection: 'emailVerifications',
spec: { createdAt: 1 },
expireAfterSeconds: 7 * 24 * 60 * 60, // 7 days
purpose: 'Email verification token cleanup after 7 days'
}
];
for (const ttlIndex of ttlIndexes) {
try {
const collection = this.db.collection(ttlIndex.collection);
await collection.createIndex(ttlIndex.spec, {
name: ttlIndex.name,
expireAfterSeconds: ttlIndex.expireAfterSeconds,
background: true
});
const expireDays = Math.round(ttlIndex.expireAfterSeconds / (24 * 60 * 60));
console.log(`Created TTL index: ${ttlIndex.name} (expires after ${expireDays} days)`);
console.log(` Purpose: ${ttlIndex.purpose}`);
} catch (error) {
console.error(`Failed to create TTL index ${ttlIndex.name}:`, error);
}
}
}
async createWildcardIndexes() {
console.log('Creating wildcard indexes for flexible schema queries...');
const wildcardIndexes = [
{
name: 'idx_users_metadata_wildcard',
collection: 'users',
spec: { 'metadata.$**': 1 },
purpose: 'Flexible querying of user metadata fields with varying schemas'
},
{
name: 'idx_products_attributes_wildcard',
collection: 'products',
spec: { 'attributes.$**': 1 },
purpose: 'Dynamic product attribute queries for catalog flexibility'
},
{
name: 'idx_orders_customFields_wildcard',
collection: 'orders',
spec: { 'customFields.$**': 1 },
purpose: 'Custom order fields for different business verticals'
}
];
for (const wildcardIndex of wildcardIndexes) {
try {
const collection = this.collections[wildcardIndex.collection] || this.db.collection(wildcardIndex.collection);
await collection.createIndex(wildcardIndex.spec, {
name: wildcardIndex.name,
background: true,
// Wildcard index options
wildcardProjection: {
_id: 1 // Always include _id for efficiency
}
});
console.log(`Created wildcard index: ${wildcardIndex.name}`);
console.log(` Purpose: ${wildcardIndex.purpose}`);
} catch (error) {
console.error(`Failed to create wildcard index ${wildcardIndex.name}:`, error);
}
}
}
async performQueryOptimizationAnalysis() {
console.log('Performing comprehensive query optimization analysis...');
const analysisResults = {
slowQueries: [],
indexUsage: [],
recommendedIndexes: [],
performanceMetrics: {}
};
// 1. Analyze slow queries from profiler data
const slowQueries = await this.db.collection('system.profile').find({
ts: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }, // Last 24 hours
millis: { $gte: 100 } // Queries taking > 100ms
}).sort({ millis: -1 }).limit(50).toArray();
analysisResults.slowQueries = slowQueries.map(query => ({
namespace: query.ns,
duration: query.millis,
command: query.command,
executionStats: query.execStats,
timestamp: query.ts,
recommendation: this.generateOptimizationRecommendation(query)
}));
// 2. Analyze index usage statistics
for (const collectionName of Object.keys(this.collections)) {
const collection = this.collections[collectionName];
try {
const indexStats = await collection.aggregate([
{ $indexStats: {} }
]).toArray();
const indexUsage = indexStats.map(stat => ({
collection: collectionName,
indexName: stat.name,
usageCount: stat.accesses.ops,
lastUsed: stat.accesses.since,
size: stat.size,
efficiency: this.calculateIndexEfficiency(stat)
}));
analysisResults.indexUsage.push(...indexUsage);
} catch (error) {
console.warn(`Could not get index stats for ${collectionName}:`, error.message);
}
}
// 3. Generate index recommendations
analysisResults.recommendedIndexes = await this.generateIndexRecommendations(analysisResults.slowQueries);
// 4. Calculate performance metrics
analysisResults.performanceMetrics = await this.calculatePerformanceMetrics();
console.log('Query optimization analysis completed');
// Store analysis results for historical tracking
await this.collections.indexMetrics.insertOne({
analysisType: 'query_optimization',
timestamp: new Date(),
results: analysisResults
});
return analysisResults;
}
generateOptimizationRecommendation(slowQuery) {
const recommendations = [];
// Check for missing indexes based on query pattern
if (slowQuery.execStats?.executionStats?.stage === 'COLLSCAN') {
recommendations.push('Query requires collection scan - consider adding index');
}
if (slowQuery.execStats?.executionStats?.stage === 'IXSCAN' &&
slowQuery.execStats?.executionStats?.keysExamined > slowQuery.execStats?.executionStats?.docsExamined * 10) {
recommendations.push('Index selectivity is poor - consider compound index or partial index');
}
// Check for sort optimization
if (slowQuery.command?.sort &&
slowQuery.execStats?.executionStats?.stages?.some(stage => stage.stage === 'SORT')) {
recommendations.push('Sort operation not using index - add sort fields to index');
}
// Check for projection optimization
if (slowQuery.command?.projection && Object.keys(slowQuery.command.projection).length < 5) {
recommendations.push('Consider covered query with projection fields in index');
}
return recommendations.length > 0 ? recommendations : ['Query performance acceptable'];
}
calculateIndexEfficiency(indexStat) {
// Calculate index efficiency based on usage patterns
const size = indexStat.size || 0;
const usage = indexStat.accesses?.ops || 0;
const daysSinceCreated = (Date.now() - indexStat.creationTime) / (24 * 60 * 60 * 1000);
// Efficiency metric: usage per day per MB
const efficiency = usage / Math.max(daysSinceCreated, 1) / Math.max(size / (1024 * 1024), 1);
return Math.round(efficiency * 100) / 100;
}
async generateIndexRecommendations(slowQueries) {
const recommendations = [];
const queryPatterns = new Map();
// Analyze query patterns to suggest indexes
for (const query of slowQueries) {
const command = query.command;
if (!command?.find && !command?.aggregate) continue;
const collection = query.namespace.split('.')[1];
const filter = command.find ? command.filter :
command.aggregate?.[0]?.$match;
if (filter) {
const pattern = this.extractQueryPattern(filter);
const key = `${collection}:${pattern}`;
if (!queryPatterns.has(key)) {
queryPatterns.set(key, {
collection,
pattern,
frequency: 0,
avgDuration: 0,
queries: []
});
}
const patternData = queryPatterns.get(key);
patternData.frequency++;
patternData.avgDuration = (patternData.avgDuration * (patternData.frequency - 1) + query.duration) / patternData.frequency;
patternData.queries.push(query);
}
}
// Generate recommendations based on frequent slow patterns
for (const [key, patternData] of queryPatterns) {
if (patternData.frequency >= 3 && patternData.avgDuration >= 100) {
const recommendedIndex = this.generateIndexSpecFromPattern(patternData.pattern);
recommendations.push({
collection: patternData.collection,
recommendedIndex,
reason: `Frequent slow queries (${patternData.frequency} occurrences, avg ${patternData.avgDuration}ms)`,
queryPattern: patternData.pattern,
estimatedImprovement: this.estimatePerformanceImprovement(patternData)
});
}
}
return recommendations;
}
extractQueryPattern(filter) {
// Extract query pattern for index recommendation
const pattern = {};
for (const [field, condition] of Object.entries(filter)) {
if (field === '$and' || field === '$or') {
// Handle logical operators
pattern[field] = 'logical_operator';
} else if (typeof condition === 'object' && condition !== null) {
// Handle range/comparison queries
const operators = Object.keys(condition);
if (operators.some(op => ['$gt', '$gte', '$lt', '$lte'].includes(op))) {
pattern[field] = 'range';
} else if (operators.includes('$in')) {
pattern[field] = 'in_list';
} else if (operators.includes('$regex')) {
pattern[field] = 'regex';
} else {
pattern[field] = 'equality';
}
} else {
pattern[field] = 'equality';
}
}
return JSON.stringify(pattern);
}
generateIndexSpecFromPattern(patternStr) {
const pattern = JSON.parse(patternStr);
const indexSpec = {};
// Apply ESR (Equality, Sort, Range) rule
const equalityFields = [];
const rangeFields = [];
for (const [field, type] of Object.entries(pattern)) {
if (type === 'equality' || type === 'in_list') {
equalityFields.push(field);
} else if (type === 'range') {
rangeFields.push(field);
}
}
// Build index spec: equality fields first, then range fields
for (const field of equalityFields) {
indexSpec[field] = 1;
}
for (const field of rangeFields) {
indexSpec[field] = 1;
}
return indexSpec;
}
estimatePerformanceImprovement(patternData) {
// Estimate performance improvement based on query characteristics
const baseImprovement = 50; // Base 50% improvement assumption
// Higher improvement for collection scans
if (patternData.queries.some(q => q.executionStats?.stage === 'COLLSCAN')) {
return Math.min(90, baseImprovement + 30);
}
// Moderate improvement for index scans with poor selectivity
if (patternData.avgDuration > 500) {
return Math.min(80, baseImprovement + 20);
}
return baseImprovement;
}
async calculatePerformanceMetrics() {
const metrics = {};
try {
// Get database stats
const dbStats = await this.db.stats();
metrics.totalIndexSize = dbStats.indexSize;
metrics.totalDataSize = dbStats.dataSize;
metrics.indexToDataRatio = (dbStats.indexSize / dbStats.dataSize * 100).toFixed(1) + '%';
// Get collection-level metrics
for (const collectionName of Object.keys(this.collections)) {
const collection = this.collections[collectionName];
const stats = await collection.stats();
metrics[collectionName] = {
documentCount: stats.count,
avgDocumentSize: stats.avgObjSize,
indexCount: stats.nindexes,
totalIndexSize: stats.totalIndexSize,
indexSizeRatio: (stats.totalIndexSize / stats.size * 100).toFixed(1) + '%'
};
}
} catch (error) {
console.warn('Could not calculate all performance metrics:', error.message);
}
return metrics;
}
async recordIndexMetrics(indexName, indexType, indexSpec, metadata = {}) {
try {
await this.collections.indexMetrics.insertOne({
indexName,
indexType,
indexSpec,
metadata,
createdAt: new Date(),
status: 'active'
});
} catch (error) {
console.warn('Failed to record index metrics:', error.message);
}
}
getPartialFilterForField(fieldName) {
// Return appropriate partial filter expressions for common fields
const partialFilters = {
email: { email: { $exists: true, $ne: null } },
lastLoginAt: { lastLoginAt: { $exists: true } },
totalSpent: { totalSpent: { $gt: 0 } },
riskScore: { riskScore: { $exists: true } }
};
return partialFilters[fieldName] || null;
}
async estimateIndexSize(indexSpec) {
// Estimate index size based on collection statistics
try {
const collection = this.collections.users; // Default to users collection
const sampleDoc = await collection.findOne();
const stats = await collection.stats();
if (sampleDoc && stats) {
const avgDocSize = stats.avgObjSize;
const fieldSize = this.estimateFieldSize(sampleDoc, Object.keys(indexSpec));
const indexOverhead = fieldSize * 1.2; // 20% overhead for B-tree structure
return {
documentCount: stats.count,
estimatedIndexSize: indexOverhead * stats.count,
avgFieldSize: fieldSize
};
}
} catch (error) {
console.warn('Could not estimate index size:', error.message);
}
return { documentCount: 0, estimatedIndexSize: 0, avgFieldSize: 0 };
}
estimateFieldSize(document, fieldNames) {
let totalSize = 0;
for (const fieldName of fieldNames) {
const value = this.getNestedValue(document, fieldName);
totalSize += this.calculateValueSize(value);
}
return totalSize;
}
getNestedValue(obj, path) {
return path.split('.').reduce((current, key) => current?.[key], obj);
}
calculateValueSize(value) {
if (value === null || value === undefined) return 0;
if (typeof value === 'string') return value.length * 2; // UTF-8 overhead
if (typeof value === 'number') return 8; // 64-bit numbers
if (typeof value === 'boolean') return 1;
if (value instanceof Date) return 8;
if (Array.isArray(value)) return value.reduce((sum, item) => sum + this.calculateValueSize(item), 0);
if (typeof value === 'object') return Object.values(value).reduce((sum, val) => sum + this.calculateValueSize(val), 0);
return 50; // Default estimate for unknown types
}
async optimizeExistingIndexes() {
console.log('Optimizing existing indexes...');
const optimizationResults = {
rebuiltIndexes: [],
droppedIndexes: [],
recommendations: []
};
for (const collectionName of Object.keys(this.collections)) {
const collection = this.collections[collectionName];
try {
// Get current indexes
const indexes = await collection.indexes();
const indexStats = await collection.aggregate([{ $indexStats: {} }]).toArray();
for (const index of indexes) {
if (index.name === '_id_') continue; // Skip default _id index
const stat = indexStats.find(s => s.name === index.name);
const usage = stat?.accesses?.ops || 0;
const daysSinceCreated = stat ? (Date.now() - stat.accesses.since) / (24 * 60 * 60 * 1000) : 0;
// Check for unused indexes (no usage in 30 days)
if (daysSinceCreated > 30 && usage === 0) {
console.log(`Dropping unused index: ${index.name} in ${collectionName}`);
await collection.dropIndex(index.name);
optimizationResults.droppedIndexes.push({
collection: collectionName,
indexName: index.name,
reason: 'Unused for 30+ days'
});
}
// Check for low-efficiency indexes
const efficiency = stat ? this.calculateIndexEfficiency(stat) : 0;
if (efficiency < 0.1 && usage > 0) {
optimizationResults.recommendations.push({
collection: collectionName,
indexName: index.name,
recommendation: 'Low efficiency - consider redesigning or adding partial filter',
currentEfficiency: efficiency
});
}
}
} catch (error) {
console.error(`Error optimizing indexes for ${collectionName}:`, error);
}
}
console.log('Index optimization completed');
return optimizationResults;
}
}
// Benefits of MongoDB Advanced Indexing:
// - Flexible compound indexes with optimal field ordering for complex queries
// - Partial indexes that dramatically reduce index size and improve performance
// - Text search indexes with weighted relevance and language support
// - Geospatial indexes for location-based queries and proximity searches
// - Sparse indexes for optional fields that save storage and improve efficiency
// - TTL indexes for automatic data lifecycle management
// - Wildcard indexes for dynamic schema flexibility
// - Real-time index usage analysis and optimization recommendations
// - Integration with query profiler for performance bottleneck identification
// - Sophisticated index strategy management with automated optimization
module.exports = {
MongoDBIndexingManager
};
Understanding MongoDB Indexing Architecture
Advanced Index Design Patterns and Strategies
Implement sophisticated indexing patterns for optimal query performance:
// Advanced indexing patterns for specialized use cases
class AdvancedIndexingPatterns {
constructor(db) {
this.db = db;
this.performanceTargets = {
maxQueryTime: 50, // milliseconds for standard queries
maxComplexQueryTime: 200, // milliseconds for complex analytical queries
maxIndexSizeRatio: 0.3 // Index size should not exceed 30% of data size
};
}
async implementCoveredQueryOptimization() {
console.log('Implementing covered query optimization patterns...');
// Covered queries that can be satisfied entirely from index
const coveredQueryIndexes = [
{
name: 'idx_user_dashboard_covered',
collection: 'users',
spec: {
status: 1,
country: 1,
email: 1,
firstName: 1,
lastName: 1,
totalSpent: 1,
loyaltyPoints: 1,
createdAt: 1
},
purpose: 'Cover user dashboard queries without document retrieval',
coveredQueries: [
'User listing with basic info and spending',
'Geographic user distribution',
'Customer segmentation queries'
]
},
{
name: 'idx_order_summary_covered',
collection: 'orders',
spec: {
userId: 1,
status: 1,
totalAmount: 1,
createdAt: 1,
paymentStatus: 1,
'shipping.method': 1
},
purpose: 'Cover order summary queries for customer service',
coveredQueries: [
'Customer order history summaries',
'Revenue reporting by status and date',
'Shipping method analysis'
]
}
];
for (const coveredIndex of coveredQueryIndexes) {
const collection = this.db.collection(coveredIndex.collection);
await collection.createIndex(coveredIndex.spec, {
name: coveredIndex.name,
background: true
});
console.log(`Created covered query index: ${coveredIndex.name}`);
console.log(` Covered queries: ${coveredIndex.coveredQueries.join(', ')}`);
}
}
async implementHashedIndexingStrategy() {
console.log('Implementing hashed indexing for sharded collections...');
// Hashed indexes for even distribution across shards
const hashedIndexes = [
{
name: 'idx_users_id_hashed',
collection: 'users',
spec: { _id: 'hashed' },
purpose: 'Even distribution of users across shards'
},
{
name: 'idx_orders_customer_hashed',
collection: 'orders',
spec: { userId: 'hashed' },
purpose: 'Distribute customer orders evenly across shards'
},
{
name: 'idx_analytics_session_hashed',
collection: 'analytics',
spec: { sessionId: 'hashed' },
purpose: 'Balance analytics data across sharded cluster'
}
];
for (const hashedIndex of hashedIndexes) {
const collection = this.db.collection(hashedIndex.collection);
await collection.createIndex(hashedIndex.spec, {
name: hashedIndex.name,
background: true
});
console.log(`Created hashed index: ${hashedIndex.name}`);
}
}
async implementMultikeyIndexOptimization() {
console.log('Implementing multikey index optimization for arrays...');
// Optimized indexes for array fields
const multikeyIndexes = [
{
name: 'idx_users_tags_interests',
collection: 'users',
spec: { tags: 1, 'interests.category': 1 },
purpose: 'User segmentation by tags and interest categories'
},
{
name: 'idx_products_categories_brands',
collection: 'products',
spec: { categories: 1, brand: 1, status: 1 },
purpose: 'Product catalog queries with category and brand filtering'
},
{
name: 'idx_orders_product_items',
collection: 'orders',
spec: { 'items.productId': 1, 'items.category': 1, status: 1 },
purpose: 'Product performance analysis across orders'
}
];
for (const multikeyIndex of multikeyIndexes) {
const collection = this.db.collection(multikeyIndex.collection);
// Check if index involves multiple array fields (compound multikey limitation)
const sampleDoc = await collection.findOne();
const arrayFields = this.identifyArrayFields(sampleDoc, Object.keys(multikeyIndex.spec));
if (arrayFields.length > 1) {
console.warn(`Index ${multikeyIndex.name} may have compound multikey limitations`);
// Create alternative single-array indexes
for (const arrayField of arrayFields) {
const alternativeSpec = { [arrayField]: 1 };
await collection.createIndex(alternativeSpec, {
name: `${multikeyIndex.name}_${arrayField}`,
background: true
});
}
} else {
await collection.createIndex(multikeyIndex.spec, {
name: multikeyIndex.name,
background: true
});
}
console.log(`Created multikey index: ${multikeyIndex.name}`);
}
}
identifyArrayFields(document, fieldNames) {
const arrayFields = [];
for (const fieldName of fieldNames) {
const value = this.getNestedValue(document, fieldName);
if (Array.isArray(value)) {
arrayFields.push(fieldName);
}
}
return arrayFields;
}
getNestedValue(obj, path) {
return path.split('.').reduce((current, key) => current?.[key], obj);
}
async implementIndexIntersectionStrategies() {
console.log('Implementing index intersection strategies...');
// Design indexes that work well together for intersection
const intersectionIndexes = [
{
name: 'idx_users_status_single',
collection: 'users',
spec: { status: 1 },
purpose: 'Status filtering for intersection'
},
{
name: 'idx_users_country_single',
collection: 'users',
spec: { country: 1 },
purpose: 'Geographic filtering for intersection'
},
{
name: 'idx_users_activity_single',
collection: 'users',
spec: { lastLoginAt: -1 },
purpose: 'Activity-based filtering for intersection'
},
{
name: 'idx_users_spending_single',
collection: 'users',
spec: { totalSpent: -1 },
purpose: 'Spending analysis for intersection'
}
];
// Create single-field indexes that can be intersected
for (const index of intersectionIndexes) {
const collection = this.db.collection(index.collection);
await collection.createIndex(index.spec, {
name: index.name,
background: true
});
console.log(`Created intersection index: ${index.name}`);
}
// Test intersection performance
await this.testIndexIntersectionPerformance();
}
async testIndexIntersectionPerformance() {
console.log('Testing index intersection performance...');
const collection = this.db.collection('users');
// Query that should use index intersection
const intersectionQuery = {
status: 'active',
country: 'US',
lastLoginAt: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) },
totalSpent: { $gte: 100 }
};
const explain = await collection.find(intersectionQuery).explain('executionStats');
if (explain.executionStats.executionStages.stage === 'AND_HASH' ||
explain.executionStats.executionStages.stage === 'AND_SORTED') {
console.log('✅ Query successfully using index intersection');
console.log(`Execution time: ${explain.executionStats.executionTimeMillis}ms`);
} else {
console.log('❌ Query not using index intersection, consider compound index');
console.log(`Current stage: ${explain.executionStats.executionStages.stage}`);
}
}
async implementTimesSeriesIndexing() {
console.log('Implementing time-series optimized indexing...');
const timeSeriesIndexes = [
{
name: 'idx_metrics_time_metric',
collection: 'metrics',
spec: { timestamp: 1, metricType: 1, value: 1 },
purpose: 'Time-series metrics queries with metric type filtering'
},
{
name: 'idx_events_time_user',
collection: 'events',
spec: { timestamp: 1, userId: 1, eventType: 1 },
purpose: 'User activity timeline and event analysis'
},
{
name: 'idx_logs_time_level',
collection: 'logs',
spec: { timestamp: 1, level: 1, service: 1 },
purpose: 'Log analysis with severity and service filtering'
}
];
for (const tsIndex of timeSeriesIndexes) {
const collection = this.db.collection(tsIndex.collection);
await collection.createIndex(tsIndex.spec, {
name: tsIndex.name,
background: true
});
console.log(`Created time-series index: ${tsIndex.name}`);
}
// Create time-based partial indexes for recent data
const recentDataIndexes = [
{
name: 'idx_metrics_recent_hot',
collection: 'metrics',
spec: { timestamp: 1, metricType: 1, userId: 1 },
filter: {
timestamp: { $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) }
},
purpose: 'Hot data access for recent metrics (last 7 days)'
},
{
name: 'idx_events_recent_active',
collection: 'events',
spec: { userId: 1, eventType: 1, timestamp: -1 },
filter: {
timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
},
purpose: 'Recent user activity (last 24 hours)'
}
];
for (const recentIndex of recentDataIndexes) {
const collection = this.db.collection(recentIndex.collection);
await collection.createIndex(recentIndex.spec, {
name: recentIndex.name,
partialFilterExpression: recentIndex.filter,
background: true
});
console.log(`Created recent data index: ${recentIndex.name}`);
}
}
async monitorIndexPerformanceMetrics() {
console.log('Monitoring index performance metrics...');
const performanceMetrics = {
collections: {},
globalMetrics: {},
recommendations: []
};
for (const collectionName of ['users', 'orders', 'products', 'analytics']) {
const collection = this.db.collection(collectionName);
try {
// Get collection statistics
const stats = await collection.stats();
const indexStats = await collection.aggregate([{ $indexStats: {} }]).toArray();
performanceMetrics.collections[collectionName] = {
documentCount: stats.count,
avgDocumentSize: stats.avgObjSize,
dataSize: stats.size,
indexCount: stats.nindexes,
totalIndexSize: stats.totalIndexSize,
indexSizeRatio: (stats.totalIndexSize / stats.size).toFixed(3),
indexes: indexStats.map(stat => ({
name: stat.name,
size: stat.size,
usageCount: stat.accesses?.ops || 0,
lastUsed: stat.accesses?.since,
efficiency: this.calculateIndexEfficiency(stat, stats)
}))
};
// Generate recommendations
const collectionRecommendations = this.generateCollectionIndexRecommendations(
collectionName,
performanceMetrics.collections[collectionName]
);
performanceMetrics.recommendations.push(...collectionRecommendations);
} catch (error) {
console.warn(`Could not analyze ${collectionName}:`, error.message);
}
}
// Calculate global metrics
const totalDataSize = Object.values(performanceMetrics.collections)
.reduce((sum, col) => sum + col.dataSize, 0);
const totalIndexSize = Object.values(performanceMetrics.collections)
.reduce((sum, col) => sum + col.totalIndexSize, 0);
performanceMetrics.globalMetrics = {
totalDataSize,
totalIndexSize,
globalIndexRatio: (totalIndexSize / totalDataSize).toFixed(3),
totalIndexCount: Object.values(performanceMetrics.collections)
.reduce((sum, col) => sum + col.indexCount, 0),
avgIndexEfficiency: this.calculateAverageIndexEfficiency(performanceMetrics.collections)
};
console.log('Index performance monitoring completed');
console.log(`Global index ratio: ${performanceMetrics.globalMetrics.globalIndexRatio}`);
console.log(`Total indexes: ${performanceMetrics.globalMetrics.totalIndexCount}`);
console.log(`Recommendations generated: ${performanceMetrics.recommendations.length}`);
return performanceMetrics;
}
calculateIndexEfficiency(indexStat, collectionStats) {
const usagePerMB = (indexStat.accesses?.ops || 0) / Math.max(indexStat.size / (1024 * 1024), 0.1);
const sizeRatio = indexStat.size / collectionStats.size;
const daysSinceLastUse = indexStat.accesses?.since ?
(Date.now() - indexStat.accesses.since) / (24 * 60 * 60 * 1000) : 999;
// Efficiency score: usage frequency weighted by size efficiency and recency
const efficiencyScore = (usagePerMB * 0.5) +
((1 - sizeRatio) * 50 * 0.3) +
(Math.max(0, 30 - daysSinceLastUse) * 0.2);
return Math.round(efficiencyScore * 100) / 100;
}
calculateAverageIndexEfficiency(collections) {
let totalEfficiency = 0;
let indexCount = 0;
for (const collection of Object.values(collections)) {
for (const index of collection.indexes) {
if (index.name !== '_id_') { // Exclude default _id index
totalEfficiency += index.efficiency;
indexCount++;
}
}
}
return indexCount > 0 ? (totalEfficiency / indexCount).toFixed(2) : 0;
}
generateCollectionIndexRecommendations(collectionName, collectionData) {
const recommendations = [];
// Check for high index-to-data ratio
if (parseFloat(collectionData.indexSizeRatio) > this.performanceTargets.maxIndexSizeRatio) {
recommendations.push({
collection: collectionName,
type: 'SIZE_WARNING',
message: `Index size ratio (${collectionData.indexSizeRatio}) exceeds recommended threshold`,
suggestion: 'Review index necessity and consider partial indexes'
});
}
// Check for unused indexes
const unusedIndexes = collectionData.indexes.filter(idx =>
idx.name !== '_id_' && idx.usageCount === 0
);
if (unusedIndexes.length > 0) {
recommendations.push({
collection: collectionName,
type: 'UNUSED_INDEXES',
message: `Found ${unusedIndexes.length} unused indexes`,
suggestion: `Consider dropping: ${unusedIndexes.map(idx => idx.name).join(', ')}`
});
}
// Check for low-efficiency indexes
const inefficientIndexes = collectionData.indexes.filter(idx =>
idx.name !== '_id_' && idx.efficiency < 1.0
);
if (inefficientIndexes.length > 0) {
recommendations.push({
collection: collectionName,
type: 'LOW_EFFICIENCY',
message: `Found ${inefficientIndexes.length} low-efficiency indexes`,
suggestion: 'Review usage patterns and consider redesigning or adding partial filters'
});
}
// Check for missing compound indexes (heuristic)
if (collectionData.indexCount < 3 && collectionData.documentCount > 10000) {
recommendations.push({
collection: collectionName,
type: 'MISSING_COMPOUND_INDEXES',
message: 'Large collection with few indexes may benefit from compound indexes',
suggestion: 'Analyze query patterns and create compound indexes for frequently combined filters'
});
}
return recommendations;
}
}
SQL-Style Index Management with QueryLeaf
QueryLeaf provides familiar SQL syntax for MongoDB index operations:
-- QueryLeaf index management with SQL-familiar syntax
-- Create single-field indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_users_country ON users(country);
CREATE INDEX idx_users_created_at ON users(created_at DESC); -- Descending sort
-- Create compound indexes following ESR (Equality, Sort, Range) principle
CREATE INDEX idx_users_compound_esr ON users(
status, -- Equality: exact match filters
country, -- Equality: exact match filters
total_spent DESC, -- Sort: ordering field
created_at -- Range: range queries
);
-- Create partial indexes with conditions
CREATE INDEX idx_users_active_email ON users(email)
WHERE status = 'active';
CREATE INDEX idx_users_premium_spending ON users(total_spent DESC, loyalty_points DESC)
WHERE account_type = 'premium' AND total_spent > 100;
CREATE INDEX idx_orders_recent_high_value ON orders(total_amount DESC, created_at DESC)
WHERE status = 'completed'
AND created_at >= CURRENT_TIMESTAMP - INTERVAL '90 days'
AND total_amount >= 500;
-- Create text search indexes with weights
CREATE TEXT INDEX idx_users_search ON users(
first_name WEIGHT 10,
last_name WEIGHT 10,
email WEIGHT 5,
company WEIGHT 3,
bio WEIGHT 1
) WITH (
default_language = 'english',
language_override = 'language'
);
CREATE TEXT INDEX idx_products_search ON products(
name WEIGHT 20,
brand WEIGHT 15,
tags WEIGHT 10,
description WEIGHT 5,
features WEIGHT 3
);
-- Create geospatial indexes
CREATE INDEX idx_users_location ON users(location) USING GEO2DSPHERE;
CREATE INDEX idx_stores_address ON stores(address.coordinates) USING GEO2DSPHERE;
-- Create sparse indexes for optional fields
CREATE INDEX idx_users_social_profiles ON users(
social_profiles.twitter,
social_profiles.linkedin
) WITH SPARSE;
CREATE INDEX idx_users_subscription ON users(
subscription.plan_id,
subscription.expires_at
) WITH SPARSE;
-- Create TTL indexes for automatic data expiration
CREATE INDEX idx_sessions_ttl ON user_sessions(last_activity)
WITH TTL = '7 days';
CREATE INDEX idx_analytics_ttl ON analytics_events(created_at)
WITH TTL = '30 days';
CREATE INDEX idx_password_resets_ttl ON password_resets(created_at)
WITH TTL = '24 hours';
-- Create wildcard indexes for flexible schemas
CREATE INDEX idx_users_metadata ON users("metadata.$**");
CREATE INDEX idx_products_attributes ON products("attributes.$**");
CREATE INDEX idx_orders_custom_fields ON orders("custom_fields.$**");
-- Advanced compound index patterns
WITH user_activity_analysis AS (
SELECT
user_id,
status,
country,
DATE_TRUNC('month', created_at) as signup_month,
last_login_at,
total_spent,
loyalty_tier,
-- User categorization
CASE
WHEN total_spent > 1000 THEN 'high_value'
WHEN total_spent > 100 THEN 'medium_value'
ELSE 'low_value'
END as value_segment,
CASE
WHEN last_login_at >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'active'
WHEN last_login_at >= CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'recent'
WHEN last_login_at >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'inactive'
ELSE 'dormant'
END as activity_segment
FROM users
WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '2 years'
),
index_optimization_analysis AS (
SELECT
-- Query pattern analysis for index design
COUNT(*) as total_queries,
COUNT(*) FILTER (WHERE status = 'active') as active_user_queries,
COUNT(*) FILTER (WHERE country IN ('US', 'CA', 'UK')) as geographic_queries,
COUNT(*) FILTER (WHERE total_spent > 100) as spending_queries,
COUNT(*) FILTER (WHERE last_login_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_activity_queries,
-- Compound query patterns
COUNT(*) FILTER (WHERE status = 'active' AND country = 'US') as status_country_queries,
COUNT(*) FILTER (WHERE status = 'active' AND total_spent > 100) as status_spending_queries,
COUNT(*) FILTER (WHERE country = 'US' AND total_spent > 500) as country_spending_queries,
-- Complex filtering patterns
COUNT(*) FILTER (
WHERE status = 'active'
AND country IN ('US', 'CA')
AND total_spent > 100
AND last_login_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
) as complex_filter_queries,
-- Sorting patterns
COUNT(*) FILTER (WHERE ORDER BY created_at DESC IS NOT NULL) as date_sort_queries,
COUNT(*) FILTER (WHERE ORDER BY total_spent DESC IS NOT NULL) as spending_sort_queries,
COUNT(*) FILTER (WHERE ORDER BY last_login_at DESC IS NOT NULL) as activity_sort_queries,
-- Range query patterns
COUNT(*) FILTER (WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year') as date_range_queries,
COUNT(*) FILTER (WHERE total_spent BETWEEN 100 AND 1000) as spending_range_queries
FROM user_activity_analysis
)
-- Optimal index recommendations based on query patterns
SELECT
'CREATE INDEX idx_users_status_country_spending ON users(status, country, total_spent DESC)' as recommended_index,
'High frequency status + country + spending queries' as justification,
status_country_queries + country_spending_queries as query_frequency
FROM index_optimization_analysis
WHERE status_country_queries > 100 OR country_spending_queries > 100
UNION ALL
SELECT
'CREATE INDEX idx_users_active_recent_spending ON users(status, last_login_at DESC, total_spent DESC) WHERE status = ''active''',
'Active user analysis with recent activity and spending',
active_user_queries + recent_activity_queries
FROM index_optimization_analysis
WHERE active_user_queries > 50
UNION ALL
SELECT
'CREATE INDEX idx_users_geographic_value ON users(country, value_segment, activity_segment)',
'Geographic segmentation with customer value analysis',
geographic_queries
FROM index_optimization_analysis
WHERE geographic_queries > 75;
-- Index performance monitoring and optimization
WITH index_usage_stats AS (
SELECT
collection_name,
index_name,
index_size_mb,
usage_count,
last_used,
-- Calculate index efficiency metrics
usage_count / GREATEST(index_size_mb, 1) as usage_per_mb,
EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - last_used)) as days_since_last_use,
-- Index selectivity estimation
CASE
WHEN index_name LIKE '%email%' THEN 'high' -- Unique fields
WHEN index_name LIKE '%status%' THEN 'low' -- Few distinct values
WHEN index_name LIKE '%country%' THEN 'medium' -- Geographic distribution
WHEN index_name LIKE '%created_at%' THEN 'high' -- Timestamp fields
ELSE 'unknown'
END as estimated_selectivity,
-- Index type classification
CASE
WHEN index_name LIKE '%compound%' OR index_name LIKE '%_%_%' THEN 'compound'
WHEN index_name LIKE '%text%' OR index_name LIKE '%search%' THEN 'text'
WHEN index_name LIKE '%geo%' OR index_name LIKE '%location%' THEN 'geospatial'
WHEN index_name LIKE '%ttl%' THEN 'ttl'
ELSE 'single_field'
END as index_type
FROM mongodb_index_stats -- Hypothetical system table
WHERE collection_name IN ('users', 'orders', 'products', 'analytics')
),
index_health_assessment AS (
SELECT
collection_name,
index_name,
index_type,
usage_per_mb,
days_since_last_use,
estimated_selectivity,
-- Health score calculation
CASE
WHEN days_since_last_use > 30 AND usage_count = 0 THEN 'UNUSED'
WHEN usage_per_mb < 0.1 THEN 'LOW_EFFICIENCY'
WHEN usage_per_mb > 10 AND estimated_selectivity = 'high' THEN 'OPTIMAL'
WHEN usage_per_mb > 5 AND estimated_selectivity = 'medium' THEN 'GOOD'
WHEN usage_per_mb > 1 THEN 'ACCEPTABLE'
ELSE 'NEEDS_REVIEW'
END as health_status,
-- Optimization recommendations
CASE
WHEN days_since_last_use > 30 THEN 'Consider dropping unused index'
WHEN usage_per_mb < 0.1 AND estimated_selectivity = 'low' THEN 'Add partial filter to improve selectivity'
WHEN index_type = 'single_field' AND usage_per_mb > 5 THEN 'Consider compound index for better coverage'
WHEN index_size_mb > 100 AND usage_per_mb < 1 THEN 'Large index with low usage - review necessity'
ELSE 'Index performing within acceptable parameters'
END as optimization_recommendation
FROM index_usage_stats
)
SELECT
collection_name,
index_name,
index_type,
health_status,
ROUND(usage_per_mb, 2) as usage_efficiency,
days_since_last_use,
optimization_recommendation,
-- Priority scoring for optimization
CASE health_status
WHEN 'UNUSED' THEN 100
WHEN 'LOW_EFFICIENCY' THEN 80
WHEN 'NEEDS_REVIEW' THEN 60
WHEN 'ACCEPTABLE' THEN 20
ELSE 0
END as optimization_priority
FROM index_health_assessment
WHERE health_status != 'OPTIMAL'
ORDER BY optimization_priority DESC, collection_name, index_name;
-- Real-time query performance analysis with index recommendations
WITH slow_queries AS (
SELECT
collection_name,
query_pattern,
avg_execution_time_ms,
query_count,
index_used,
documents_examined,
documents_returned,
-- Calculate query efficiency metrics
documents_examined / GREATEST(documents_returned, 1) as scan_efficiency,
query_count * avg_execution_time_ms as total_time_impact,
-- Identify optimization opportunities
CASE
WHEN index_used IS NULL OR index_used = 'COLLSCAN' THEN 'MISSING_INDEX'
WHEN scan_efficiency > 100 THEN 'POOR_SELECTIVITY'
WHEN avg_execution_time_ms > 100 THEN 'SLOW_QUERY'
ELSE 'ACCEPTABLE'
END as performance_issue
FROM query_performance_log -- Hypothetical query log table
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
AND avg_execution_time_ms > 50
),
index_recommendations AS (
SELECT
collection_name,
query_pattern,
performance_issue,
total_time_impact,
-- Generate specific index recommendations
CASE performance_issue
WHEN 'MISSING_INDEX' THEN
'CREATE INDEX ON ' || collection_name || ' FOR: ' || query_pattern
WHEN 'POOR_SELECTIVITY' THEN
'CREATE PARTIAL INDEX ON ' || collection_name || ' WITH SELECTIVE FILTER'
WHEN 'SLOW_QUERY' THEN
'OPTIMIZE INDEX ON ' || collection_name || ' FOR QUERY: ' || query_pattern
ELSE 'No immediate action required'
END as recommended_action,
-- Estimate performance improvement
CASE performance_issue
WHEN 'MISSING_INDEX' THEN LEAST(avg_execution_time_ms * 0.8, 50) -- 80% improvement
WHEN 'POOR_SELECTIVITY' THEN LEAST(avg_execution_time_ms * 0.6, 30) -- 60% improvement
WHEN 'SLOW_QUERY' THEN LEAST(avg_execution_time_ms * 0.4, 20) -- 40% improvement
ELSE 0
END as estimated_improvement_ms
FROM slow_queries
WHERE performance_issue != 'ACCEPTABLE'
)
SELECT
collection_name,
recommended_action,
COUNT(*) as affected_query_patterns,
SUM(total_time_impact) as total_performance_impact,
ROUND(AVG(estimated_improvement_ms), 1) as avg_improvement_ms,
-- Calculate ROI for optimization effort
ROUND(SUM(total_time_impact * estimated_improvement_ms / 1000), 2) as optimization_value_score,
-- Priority ranking
ROW_NUMBER() OVER (ORDER BY SUM(total_time_impact) DESC) as optimization_priority
FROM index_recommendations
GROUP BY collection_name, recommended_action
HAVING COUNT(*) >= 3 -- Focus on patterns affecting multiple queries
ORDER BY optimization_priority ASC;
-- QueryLeaf provides comprehensive index management capabilities:
-- 1. SQL-familiar index creation syntax with advanced options
-- 2. Partial indexes with complex conditional expressions
-- 3. Text search indexes with customizable weights and language support
-- 4. Geospatial indexing for location-based queries and analysis
-- 5. TTL indexes with flexible expiration rules and time units
-- 6. Compound index optimization following ESR principles
-- 7. Real-time index performance monitoring and health assessment
-- 8. Automated index recommendations based on query patterns
-- 9. Index usage analytics and optimization priority scoring
-- 10. Integration with MongoDB's native indexing optimizations
Best Practices for MongoDB Index Implementation
Index Design Guidelines
Essential principles for optimal MongoDB index design:
- ESR Rule: Design compound indexes following Equality, Sort, Range field ordering
- Selectivity Focus: Prioritize high-selectivity fields early in compound indexes
- Query Pattern Analysis: Design indexes based on actual application query patterns
- Partial Index Usage: Use partial indexes to reduce size and improve performance
- Index Intersection: Consider single-field indexes that can be intersected efficiently
- Covered Queries: Design indexes to cover frequently executed queries entirely
Performance and Maintenance
Optimize MongoDB indexes for production workloads:
- Regular Monitoring: Implement continuous index usage and performance monitoring
- Size Management: Keep total index size reasonable relative to data size
- Background Building: Always build indexes in background for production systems
- Usage Analysis: Regularly review and remove unused or inefficient indexes
- Testing Strategy: Test index changes thoroughly before production deployment
- Documentation: Maintain clear documentation of index purpose and query patterns
Conclusion
MongoDB's advanced indexing capabilities provide comprehensive optimization strategies that eliminate the limitations and constraints of traditional relational database indexing approaches. The flexible indexing system supports complex document structures, dynamic schemas, and specialized data types while delivering exceptional query performance at scale.
Key MongoDB Indexing benefits include:
- Flexible Index Types: Support for compound, partial, text, geospatial, sparse, TTL, and wildcard indexes
- Advanced Query Optimization: Sophisticated query planner with index intersection and covered query support
- Dynamic Schema Support: Indexing capabilities that adapt to evolving document structures
- Specialized Data Support: Native indexing for arrays, embedded documents, and geospatial data
- Performance Analytics: Comprehensive index usage monitoring and optimization recommendations
- Scalable Architecture: Index strategies that work across replica sets and sharded clusters
Whether you're optimizing query performance, managing large-scale data operations, or building applications with complex data access patterns, MongoDB's indexing system with QueryLeaf's familiar SQL interface provides the foundation for high-performance database operations.
QueryLeaf Integration: QueryLeaf automatically manages MongoDB indexing operations while providing SQL-familiar index creation, optimization, and monitoring syntax. Advanced indexing patterns, performance analysis, and automated recommendations are seamlessly handled through familiar SQL constructs, making sophisticated database optimization both powerful and accessible to SQL-oriented development teams.
The integration of native MongoDB indexing capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both flexible data modeling and familiar database optimization patterns, ensuring your applications achieve optimal performance while remaining maintainable as they scale and evolve.