MongoDB Search and Full-Text Indexing: Advanced Query Optimization Strategies for Production Applications
Modern applications require sophisticated search capabilities that go far beyond simple equality matches. Users expect fast, relevant, and intuitive search experiences across large datasets, requiring robust full-text search implementations that can handle complex queries, multiple languages, and high-volume concurrent search workloads.
MongoDB's text indexing and search capabilities provide powerful tools for implementing production-grade search functionality directly within your database, eliminating the need for external search engines in many use cases while offering advanced features like stemming, language-specific analysis, weighted field scoring, and comprehensive search result ranking.
The Search Performance Challenge
Traditional approaches to search in databases often rely on inefficient pattern matching that becomes prohibitively slow as data volumes grow:
-- Inefficient traditional search approaches that don't scale
-- Problem 1: LIKE patterns with leading wildcards (full table scan)
SELECT
product_id,
product_name,
description,
category,
price
FROM products
WHERE product_name ILIKE '%wireless%'
OR description ILIKE '%bluetooth%'
OR category ILIKE '%audio%'
ORDER BY product_name;
-- Problems with this approach:
-- 1. No index utilization - requires full collection scan
-- 2. Case-insensitive LIKE operations are expensive
-- 3. No relevance scoring or ranking
-- 4. Poor performance with large datasets (>100K documents)
-- 5. No language-specific search capabilities
-- 6. Cannot handle synonyms, stemming, or fuzzy matching
-- 7. Complex multi-field searches become increasingly expensive
-- Problem 2: Multiple field searches with OR conditions (inefficient)
SELECT
blog_id,
title,
content,
author,
tags,
published_date,
view_count
FROM blog_posts
WHERE (title ILIKE '%machine learning%' OR title ILIKE '%AI%' OR title ILIKE '%artificial intelligence%')
OR (content ILIKE '%neural network%' OR content ILIKE '%deep learning%')
OR (tags::text ILIKE '%data science%')
OR (author ILIKE '%expert%')
ORDER BY published_date DESC, view_count DESC;
-- Problems:
-- 1. Multiple OR conditions prevent index optimization
-- 2. No relevance scoring - results ordered by publication date, not relevance
-- 3. Searches across different field types (text, arrays) are complex
-- 4. Cannot boost importance of matches in title vs content
-- 5. No support for partial word matches or typo tolerance
-- 6. Performance degrades exponentially with dataset size
-- Problem 3: Complex e-commerce search with filters (unoptimized)
SELECT
p.product_id,
p.product_name,
p.description,
p.category,
p.brand,
p.price,
p.rating,
p.review_count,
CASE
WHEN p.product_name ILIKE '%search_term%' THEN 3
WHEN p.description ILIKE '%search_term%' THEN 2
WHEN p.category ILIKE '%search_term%' THEN 1
ELSE 0
END as relevance_score
FROM products p
JOIN product_inventory pi ON p.product_id = pi.product_id
WHERE (p.product_name ILIKE '%wireless headphones%'
OR p.description ILIKE '%wireless headphones%'
OR p.category ILIKE '%headphones%')
AND pi.quantity_available > 0
AND p.price BETWEEN 50 AND 300
AND p.rating >= 4.0
AND p.brand IN ('Sony', 'Bose', 'Apple', 'Samsung')
ORDER BY relevance_score DESC, p.rating DESC, p.review_count DESC;
-- Problems:
-- 1. Manual relevance scoring is simplistic and doesn't handle phrase matching
-- 2. CASE statement for scoring prevents index usage
-- 3. Multiple ILIKE operations across large text fields are expensive
-- 4. Cannot handle variations in search terms (e.g., "headphone" vs "headphones")
-- 5. No support for fuzzy matching or typo tolerance
-- 6. Filter conditions after search prevent search optimization
-- 7. Results ranking doesn't consider text search relevance properly
-- Problem 4: Multi-language content search (inadequate)
SELECT
document_id,
title_english,
title_spanish,
title_french,
content_english,
content_spanish,
content_french,
language,
created_date
FROM multilingual_documents
WHERE (language = 'en' AND (title_english ILIKE '%innovation%' OR content_english ILIKE '%technology%'))
OR (language = 'es' AND (title_spanish ILIKE '%innovación%' OR content_spanish ILIKE '%tecnología%'))
OR (language = 'fr' AND (title_french ILIKE '%innovation%' OR content_french ILIKE '%technologie%'))
ORDER BY created_date DESC;
-- Problems:
-- 1. Requires maintaining separate fields for each language
-- 2. No language-specific stemming or analysis
-- 3. Search terms must be manually translated
-- 4. Complex query structure for multiple languages
-- 5. Cannot handle mixed-language content
-- 6. No support for language-specific stop words or stemming
-- 7. Difficult to maintain and extend to new languages
-- These traditional approaches face fundamental limitations:
-- 1. Performance Issues: Full table scans, no search-optimized indexes
-- 2. Relevance Problems: No intelligent ranking or scoring
-- 3. Language Barriers: Limited multi-language and stemming support
-- 4. Maintenance Complexity: Complex query structures that are hard to optimize
-- 5. Scalability Limitations: Performance degrades significantly with data growth
-- 6. User Experience: Poor search quality and slow response times
-- 7. Development Overhead: Manual implementation of search features
MongoDB's text indexing provides comprehensive solutions to these search challenges:
// MongoDB Advanced Full-Text Search Implementation
// Create comprehensive text index with field weighting
db.products.createIndex({
product_name: "text",
description: "text",
category: "text",
tags: "text",
specifications: "text"
}, {
weights: {
product_name: 10, // Highest priority - exact name matches
category: 8, // High priority - category relevance
tags: 6, // Medium-high priority - structured metadata
description: 4, // Medium priority - detailed descriptions
specifications: 2 // Lower priority - technical details
},
name: "product_search_index",
default_language: "english",
language_override: "search_language"
})
// Advanced search with scoring and filtering
db.products.aggregate([
// Stage 1: Text search with advanced matching
{
$match: {
$and: [
{
$text: {
$search: "wireless bluetooth headphones",
$caseSensitive: false,
$diacriticSensitive: false
}
},
{
price: { $gte: 50, $lte: 300 }
},
{
"inventory.quantity_available": { $gt: 0 }
},
{
rating: { $gte: 4.0 }
},
{
brand: { $in: ["Sony", "Bose", "Apple", "Samsung"] }
}
]
}
},
// Stage 2: Add comprehensive search scoring
{
$addFields: {
// MongoDB's built-in text relevance score
text_score: { $meta: "textScore" },
// Business logic scoring
business_score: {
$add: [
{ $multiply: ["$rating", 2] }, // Rating boost
{ $divide: [{ $ln: "$review_count" }, 10] }, // Review count boost (logarithmic)
{ $cond: [{ $eq: ["$featured", true] }, 5, 0] }, // Featured product boost
{ $cond: [{ $gt: ["$inventory.quantity_available", 10] }, 2, 0] } // High inventory boost
]
},
// Combined relevance score
combined_score: {
$add: [
{ $multiply: [{ $meta: "textScore" }, 3] }, // Text relevance (3x weight)
"$business_score" // Business scoring
]
},
// Add search result metadata
search_metadata: {
matched_fields: {
$switch: {
branches: [
{ case: { $regexMatch: { input: "$product_name", regex: /wireless|bluetooth|headphones/i } }, then: ["product_name"] },
{ case: { $regexMatch: { input: "$description", regex: /wireless|bluetooth|headphones/i } }, then: ["description"] },
{ case: { $regexMatch: { input: "$category", regex: /wireless|bluetooth|headphones/i } }, then: ["category"] }
],
default: ["description"]
}
},
search_terms_found: {
$size: {
$filter: {
input: ["wireless", "bluetooth", "headphones"],
cond: {
$or: [
{ $regexMatch: { input: "$product_name", regex: { $concat: [".*", "$$this", ".*"] }, options: "i" } },
{ $regexMatch: { input: "$description", regex: { $concat: [".*", "$$this", ".*"] }, options: "i" } }
]
}
}
}
}
}
}
},
// Stage 3: Sort by combined relevance and business metrics
{
$sort: {
combined_score: -1,
rating: -1,
review_count: -1
}
},
// Stage 4: Add search result highlighting (simulated)
{
$addFields: {
highlighted_name: {
$replaceAll: {
input: "$product_name",
find: { $regex: "(wireless|bluetooth|headphones)", $options: "i" },
replacement: "<mark>$1</mark>"
}
},
highlighted_description: {
$substr: [
{
$replaceAll: {
input: "$description",
find: { $regex: "(wireless|bluetooth|headphones)", $options: "i" },
replacement: "<mark>$1</mark>"
}
},
0, 200
]
}
}
},
// Stage 5: Project final search results
{
$project: {
product_id: 1,
product_name: 1,
highlighted_name: 1,
description: 1,
highlighted_description: 1,
category: 1,
brand: 1,
price: 1,
rating: 1,
review_count: 1,
text_score: 1,
business_score: 1,
combined_score: 1,
search_metadata: 1,
"inventory.quantity_available": 1,
featured: 1
}
},
// Stage 6: Limit results for pagination
{ $limit: 20 }
])
Advanced Text Index Configuration
Multi-Field Text Indexes with Strategic Weighting
Design text indexes that optimize for your specific search requirements:
// E-commerce product search with sophisticated field weighting
db.products.createIndex({
// Primary product information (highest weights)
product_name: "text",
brand: "text",
model: "text",
// Product categorization (high weights)
category: "text",
subcategory: "text",
tags: "text",
// Descriptive content (medium weights)
short_description: "text",
long_description: "text",
key_features: "text",
// Technical specifications (lower weights)
specifications: "text",
technical_details: "text",
// User-generated content (contextual weights)
"reviews.title": "text",
"reviews.content": "text"
}, {
weights: {
// Product identity - highest priority
product_name: 15,
brand: 12,
model: 10,
// Categorization - high priority
category: 9,
subcategory: 8,
tags: 7,
// Marketing content - medium-high priority
short_description: 6,
key_features: 5,
long_description: 4,
// Technical content - medium priority
specifications: 3,
technical_details: 2,
// User content - lower priority but valuable for discovery
"reviews.title": 3,
"reviews.content": 1
},
name: "comprehensive_product_search",
default_language: "english",
language_override: "product_language",
textIndexVersion: 3 // Latest version for better performance
})
// Blog/Content search index with content-specific weighting
db.blog_posts.createIndex({
title: "text",
subtitle: "text",
content: "text",
summary: "text",
tags: "text",
category: "text",
"author.name": "text",
"author.bio": "text"
}, {
weights: {
title: 20, // Titles are most important for relevance
subtitle: 15, // Secondary headlines
summary: 10, // Executive summaries
tags: 8, // Structured metadata
category: 6, // Topic categorization
"author.name": 5, // Author attribution
content: 3, // Full content (lower weight due to length)
"author.bio": 1 // Background information
},
name: "blog_content_search",
default_language: "english"
})
// Multi-language document search with language-specific optimization
db.documents.createIndex({
"title.english": "text",
"title.spanish": "text",
"title.french": "text",
"content.english": "text",
"content.spanish": "text",
"content.french": "text",
keywords: "text",
global_tags: "text"
}, {
weights: {
"title.english": 10,
"title.spanish": 10,
"title.french": 10,
"content.english": 5,
"content.spanish": 5,
"content.french": 5,
keywords: 8,
global_tags: 6
},
name: "multilingual_document_search",
language_override: "primary_language"
})
Language-Specific Search Optimization
Implement language-aware search with proper stemming and stop word handling:
// Language-specific search implementation
async function performLanguageAwareSearch(searchTerms, targetLanguage = 'english', options = {}) {
const languageConfigs = {
'english': {
stemming: true,
stopWords: ['the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
synonyms: {
'car': ['automobile', 'vehicle', 'auto'],
'phone': ['mobile', 'smartphone', 'cell'],
'computer': ['pc', 'laptop', 'desktop', 'workstation']
}
},
'spanish': {
stemming: true,
stopWords: ['el', 'la', 'de', 'que', 'y', 'a', 'en', 'un', 'es', 'se', 'no', 'te', 'lo', 'le'],
synonyms: {
'coche': ['automóvil', 'vehículo', 'auto'],
'teléfono': ['móvil', 'smartphone', 'celular'],
'computadora': ['ordenador', 'pc', 'portátil']
}
},
'french': {
stemming: true,
stopWords: ['le', 'de', 'et', 'à', 'un', 'il', 'être', 'et', 'en', 'avoir', 'que', 'pour'],
synonyms: {
'voiture': ['automobile', 'véhicule', 'auto'],
'téléphone': ['mobile', 'smartphone', 'portable'],
'ordinateur': ['pc', 'portable', 'laptop']
}
}
};
const config = languageConfigs[targetLanguage] || languageConfigs['english'];
// Expand search terms with synonyms
let expandedTerms = searchTerms;
for (const [original, synonyms] of Object.entries(config.synonyms)) {
if (searchTerms.toLowerCase().includes(original)) {
expandedTerms += ' ' + synonyms.join(' ');
}
}
// Build language-aware search query
const searchQuery = {
$text: {
$search: expandedTerms,
$language: targetLanguage,
$caseSensitive: false,
$diacriticSensitive: false
}
};
// Add additional filters if provided
if (options.filters) {
Object.assign(searchQuery, options.filters);
}
return await db.multilingual_content.aggregate([
{ $match: searchQuery },
{
$addFields: {
relevance_score: { $meta: "textScore" },
language_match_boost: {
$cond: [
{ $eq: ["$primary_language", targetLanguage] },
2.0, // Boost documents in target language
1.0
]
},
final_score: {
$multiply: [{ $meta: "textScore" }, "$language_match_boost"]
}
}
},
{ $sort: { final_score: -1, created_at: -1 } },
{ $limit: options.limit || 20 }
]).toArray();
}
// Usage examples for different languages
const englishResults = await performLanguageAwareSearch(
"machine learning artificial intelligence",
"english",
{ filters: { category: "technology" }, limit: 15 }
);
const spanishResults = await performLanguageAwareSearch(
"aprendizaje automático inteligencia artificial",
"spanish",
{ filters: { category: "tecnología" }, limit: 15 }
);
const frenchResults = await performLanguageAwareSearch(
"apprentissage automatique intelligence artificielle",
"french",
{ filters: { category: "technologie" }, limit: 15 }
);
Search Result Ranking and Scoring
Advanced Relevance Scoring Strategies
Implement sophisticated scoring that combines text relevance with business metrics:
// Advanced search result ranking with multiple scoring factors
db.products.aggregate([
// Stage 1: Initial text search
{
$match: {
$text: {
$search: "premium wireless noise cancelling headphones",
$caseSensitive: false
}
}
},
// Stage 2: Comprehensive scoring algorithm
{
$addFields: {
// Base text relevance score
text_relevance: { $meta: "textScore" },
// Popularity scoring (normalized)
popularity_score: {
$add: [
// Review count influence (logarithmic to prevent dominance)
{ $multiply: [{ $ln: { $add: ["$review_count", 1] } }, 0.1] },
// Rating influence (4.0+ ratings get boost)
{ $cond: [{ $gte: ["$average_rating", 4.0] }, { $multiply: ["$average_rating", 0.5] }, 0] },
// Sales velocity (recent sales boost)
{ $multiply: [{ $ifNull: ["$sales_last_30_days", 0] }, 0.001] }
]
},
// Business priority scoring
business_priority_score: {
$add: [
// Featured product boost
{ $cond: [{ $eq: ["$featured_product", true] }, 3.0, 0] },
// New product launch boost (within 90 days)
{
$cond: [
{ $gte: ["$launch_date", { $subtract: [new Date(), 90 * 24 * 60 * 60 * 1000] }] },
2.0,
0
]
},
// High margin product boost
{ $cond: [{ $gte: ["$profit_margin", 0.4] }, 1.5, 0] },
// Brand partnership boost
{ $cond: [{ $in: ["$brand", ["Apple", "Sony", "Bose"]] }, 1.0, 0] }
]
},
// Availability and inventory scoring
inventory_score: {
$switch: {
branches: [
{ case: { $gt: ["$inventory_quantity", 100] }, then: 2.0 }, // High stock
{ case: { $gt: ["$inventory_quantity", 50] }, then: 1.5 }, // Medium stock
{ case: { $gt: ["$inventory_quantity", 10] }, then: 1.0 }, // Low stock
{ case: { $gt: ["$inventory_quantity", 0] }, then: 0.5 } // Very low stock
],
default: 0 // Out of stock
}
},
// Price competitiveness scoring
price_competitiveness_score: {
$cond: [
{ $and: [{ $gte: ["$price", "$category_price_min"] }, { $lte: ["$price", "$category_price_max"] }] },
{
$subtract: [
2.0,
{ $divide: [{ $subtract: ["$price", "$category_price_min"] }, { $subtract: ["$category_price_max", "$category_price_min"] }] }
]
},
0
]
},
// Search term match quality scoring
search_quality_score: {
$add: [
// Exact phrase match bonus
{ $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "premium wireless noise cancelling headphones" } }, 5.0, 0] },
// Individual term matches in title
{ $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "premium" } }, 1.0, 0] },
{ $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "wireless" } }, 1.0, 0] },
{ $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "noise cancelling" } }, 2.0, 0] },
{ $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "headphones" } }, 1.0, 0] }
]
},
// User behavior scoring (if available)
user_behavior_score: {
$add: [
// Click-through rate boost
{ $multiply: [{ $ifNull: ["$search_ctr", 0] }, 3.0] },
// Conversion rate boost
{ $multiply: [{ $ifNull: ["$conversion_rate", 0] }, 5.0] },
// View-to-purchase rate
{ $multiply: [{ $ifNull: ["$view_to_purchase_rate", 0] }, 4.0] }
]
},
// Calculate final composite score
final_search_score: {
$add: [
{ $multiply: ["$text_relevance", 4.0] }, // Text relevance (40% weight)
{ $multiply: ["$popularity_score", 2.0] }, // Popularity (20% weight)
{ $multiply: ["$business_priority_score", 1.5] }, // Business priority (15% weight)
{ $multiply: ["$inventory_score", 1.0] }, // Inventory (10% weight)
{ $multiply: ["$price_competitiveness_score", 0.75] }, // Price (7.5% weight)
{ $multiply: ["$search_quality_score", 0.5] }, // Search quality (5% weight)
{ $multiply: ["$user_behavior_score", 0.25] } // User behavior (2.5% weight)
]
}
}
},
// Stage 3: Sort by final score and apply business rules
{
$sort: {
final_search_score: -1,
inventory_quantity: -1, // Secondary sort for tied scores
average_rating: -1 // Tertiary sort
}
},
// Stage 4: Add search result metadata for analytics
{
$addFields: {
search_result_metadata: {
query_timestamp: new Date(),
scoring_breakdown: {
text_relevance: "$text_relevance",
popularity_score: "$popularity_score",
business_priority_score: "$business_priority_score",
inventory_score: "$inventory_score",
price_competitiveness_score: "$price_competitiveness_score",
search_quality_score: "$search_quality_score",
user_behavior_score: "$user_behavior_score",
final_score: "$final_search_score"
},
result_position: { $add: [{ $indexOfArray: [{ $map: { input: "$$ROOT", as: "doc", in: "$$doc._id" } }, "$_id"] }, 1] }
}
}
},
// Stage 5: Project final search results
{
$project: {
product_id: 1,
product_name: 1,
brand: 1,
model: 1,
price: 1,
average_rating: 1,
review_count: 1,
inventory_quantity: 1,
product_images: { $slice: ["$images", 3] }, // Limit images for performance
key_features: { $slice: ["$features", 5] }, // Top 5 features
final_search_score: 1,
search_result_metadata: 1,
// Add highlighted content for search results display
highlighted_content: {
title_highlight: {
$replaceAll: {
input: "$product_name",
find: { $regex: "(premium|wireless|noise cancelling|headphones)", $options: "i" },
replacement: "<mark>$1</mark>"
}
},
description_snippet: {
$substr: ["$short_description", 0, 150]
}
}
}
},
{ $limit: 20 }
])
Personalized Search Ranking
Implement user-specific search ranking based on behavior and preferences:
// Personalized search implementation
async function performPersonalizedSearch(userId, searchQuery, options = {}) {
// Get user profile and search history
const userProfile = await db.user_profiles.findOne({ user_id: userId });
const searchHistory = await db.search_history.find({
user_id: userId
}).sort({ timestamp: -1 }).limit(100).toArray();
// Extract user preferences from history
const userPreferences = {
preferred_brands: extractPreferredBrands(searchHistory),
preferred_categories: extractPreferredCategories(searchHistory),
price_range_preference: calculatePriceRangePreference(searchHistory),
feature_preferences: extractFeaturePreferences(searchHistory),
search_patterns: analyzeSearchPatterns(searchHistory)
};
return await db.products.aggregate([
// Stage 1: Text search
{
$match: {
$text: {
$search: searchQuery,
$caseSensitive: false
}
}
},
// Stage 2: Add personalization scoring
{
$addFields: {
base_text_score: { $meta: "textScore" },
// Personalization factors
personalization_score: {
$add: [
// Brand preference boost
{
$cond: [
{ $in: ["$brand", userPreferences.preferred_brands] },
2.0,
0
]
},
// Category preference boost
{
$cond: [
{ $in: ["$category", userPreferences.preferred_categories] },
1.5,
0
]
},
// Price range compatibility
{
$cond: [
{
$and: [
{ $gte: ["$price", userPreferences.price_range_preference.min] },
{ $lte: ["$price", userPreferences.price_range_preference.max] }
]
},
1.0,
-0.5
]
},
// Feature preference alignment
{
$size: {
$setIntersection: [
"$key_features",
userPreferences.feature_preferences
]
}
}
]
},
// Demographic targeting (if applicable)
demographic_score: {
$add: [
// Age group targeting
{
$cond: [
{ $in: [userProfile.age_group, "$target_demographics.age_groups"] },
0.5,
0
]
},
// Interest targeting
{
$multiply: [
{
$size: {
$setIntersection: [
userProfile.interests,
"$target_demographics.interests"
]
}
},
0.1
]
}
]
},
// Calculate personalized final score
personalized_final_score: {
$add: [
{ $multiply: [{ $meta: "textScore" }, 3.0] },
{ $multiply: ["$personalization_score", 2.0] },
"$demographic_score"
]
}
}
},
// Stage 3: Sort by personalized score
{
$sort: {
personalized_final_score: -1,
average_rating: -1,
review_count: -1
}
},
// Stage 4: Log search event for future personalization
{
$addFields: {
search_event: {
user_id: userId,
search_query: searchQuery,
timestamp: new Date(),
personalization_applied: true
}
}
},
{ $limit: options.limit || 20 }
]).toArray();
}
// Helper functions for personalization
function extractPreferredBrands(searchHistory) {
const brandCounts = {};
searchHistory.forEach(search => {
if (search.clicked_products) {
search.clicked_products.forEach(product => {
brandCounts[product.brand] = (brandCounts[product.brand] || 0) + 1;
});
}
});
return Object.keys(brandCounts)
.sort((a, b) => brandCounts[b] - brandCounts[a])
.slice(0, 5);
}
function calculatePriceRangePreference(searchHistory) {
const prices = [];
searchHistory.forEach(search => {
if (search.purchased_products) {
search.purchased_products.forEach(product => {
prices.push(product.price);
});
}
});
if (prices.length === 0) return { min: 0, max: 1000 };
prices.sort((a, b) => a - b);
return {
min: Math.max(0, prices[Math.floor(prices.length * 0.25)] * 0.8),
max: prices[Math.floor(prices.length * 0.75)] * 1.2
};
}
Performance Optimization Strategies
Search Index Optimization
Optimize text indexes for different search patterns and data access requirements:
// Performance-optimized index strategies for different search scenarios
// Strategy 1: High-frequency, simple searches (e-commerce product search)
// Optimized for speed over comprehensive coverage
db.products_fast_search.createIndex({
product_name: "text",
brand: "text",
category: "text"
}, {
weights: {
product_name: 10,
brand: 8,
category: 5
},
name: "fast_product_search",
sparse: true, // Only index documents with text fields
background: true,
textIndexVersion: 3
});
// Strategy 2: Comprehensive content search (documentation, blogs)
// Optimized for relevance over speed
db.content_comprehensive_search.createIndex({
title: "text",
content: "text",
tags: "text",
category: "text",
author: "text",
"metadata.keywords": "text"
}, {
weights: {
title: 15,
"metadata.keywords": 10,
tags: 8,
category: 6,
author: 4,
content: 3
},
name: "comprehensive_content_search",
default_language: "english",
language_override: "content_language",
textIndexVersion: 3
});
// Strategy 3: Multi-language optimized search
// Separate indexes per language for optimal performance
const languages = ['english', 'spanish', 'french', 'german', 'italian'];
languages.forEach(lang => {
db.multilingual_content.createIndex({
[`title.${lang}`]: "text",
[`content.${lang}`]: "text",
[`summary.${lang}`]: "text",
global_tags: "text"
}, {
weights: {
[`title.${lang}`]: 12,
[`summary.${lang}`]: 8,
[`content.${lang}`]: 5,
global_tags: 6
},
name: `search_${lang}`,
default_language: lang,
partialFilterExpression: { primary_language: lang },
background: true
});
});
// Strategy 4: Compound indexes for filtered searches
// Combine text search with common filter conditions
db.products_filtered_search.createIndex({
category: 1,
price: 1,
availability_status: 1,
product_name: "text",
description: "text"
}, {
weights: {
product_name: 10,
description: 5
},
name: "filtered_product_search"
});
// Performance monitoring for search indexes
async function analyzeSearchPerformance() {
// Get index statistics
const indexStats = await db.products.aggregate([
{ $indexStats: {} },
{
$match: {
name: { $regex: /.*search.*/ } // Focus on search indexes
}
},
{
$project: {
name: 1,
accesses: "$accesses.ops",
since: "$accesses.since"
}
}
]).toArray();
console.log("Search Index Performance:", indexStats);
// Analyze slow search queries
const slowQueries = await db.system.profile.find({
"command.aggregate": { $exists: true },
"command.pipeline.0.$match.$text": { $exists: true },
millis: { $gt: 100 } // Queries taking longer than 100ms
}).sort({ ts: -1 }).limit(10).toArray();
console.log("Slow Search Queries:", slowQueries);
return { indexStats, slowQueries };
}
Search Result Caching Strategies
Implement intelligent caching for frequently accessed search results:
// Advanced search result caching implementation
class SearchResultCache {
constructor(cacheConfig = {}) {
this.cacheConfig = {
defaultTTL: cacheConfig.defaultTTL || 300, // 5 minutes
maxCacheSize: cacheConfig.maxCacheSize || 10000,
popularQueryTTL: cacheConfig.popularQueryTTL || 900, // 15 minutes for popular queries
personalizedTTL: cacheConfig.personalizedTTL || 60, // 1 minute for personalized results
...cacheConfig
};
this.cache = new Map();
this.queryFrequency = new Map();
this.cacheStats = {
hits: 0,
misses: 0,
evictions: 0
};
}
// Generate cache key considering all search factors
generateCacheKey(searchParams) {
const {
query,
filters,
sort,
limit,
userId,
language = 'english',
personalized = false
} = searchParams;
const keyComponents = [
`q:${query}`,
`f:${JSON.stringify(filters || {})}`,
`s:${JSON.stringify(sort || {})}`,
`l:${limit || 20}`,
`lang:${language}`
];
if (personalized && userId) {
keyComponents.push(`u:${userId}`);
}
return keyComponents.join('|');
}
// Determine appropriate TTL based on query characteristics
calculateTTL(searchParams, queryFrequency = 0) {
const { personalized, filters } = searchParams;
// Personalized searches have shorter TTL
if (personalized) {
return this.cacheConfig.personalizedTTL;
}
// Popular queries get longer TTL
if (queryFrequency > 10) {
return this.cacheConfig.popularQueryTTL;
}
// Filtered searches (more specific) get longer TTL
if (filters && Object.keys(filters).length > 0) {
return this.cacheConfig.defaultTTL * 1.5;
}
return this.cacheConfig.defaultTTL;
}
// Get cached search results
async get(searchParams) {
const cacheKey = this.generateCacheKey(searchParams);
const cached = this.cache.get(cacheKey);
if (cached && cached.expiresAt > Date.now()) {
this.cacheStats.hits++;
// Update query frequency for cache optimization
const currentFreq = this.queryFrequency.get(cacheKey) || 0;
this.queryFrequency.set(cacheKey, currentFreq + 1);
return {
results: cached.data,
cached: true,
cacheAge: Date.now() - cached.createdAt
};
}
this.cacheStats.misses++;
return null;
}
// Store search results in cache
async set(searchParams, results) {
const cacheKey = this.generateCacheKey(searchParams);
const queryFreq = this.queryFrequency.get(cacheKey) || 0;
const ttl = this.calculateTTL(searchParams, queryFreq);
// Cache size management
if (this.cache.size >= this.cacheConfig.maxCacheSize) {
this.evictLeastUsed();
}
const cacheEntry = {
data: results,
createdAt: Date.now(),
expiresAt: Date.now() + (ttl * 1000),
accessCount: 1,
lastAccessed: Date.now(),
queryFrequency: queryFreq
};
this.cache.set(cacheKey, cacheEntry);
this.queryFrequency.set(cacheKey, queryFreq + 1);
}
// Evict least recently used entries when cache is full
evictLeastUsed() {
let oldestKey = null;
let oldestAccess = Date.now();
for (const [key, entry] of this.cache.entries()) {
if (entry.lastAccessed < oldestAccess) {
oldestAccess = entry.lastAccessed;
oldestKey = key;
}
}
if (oldestKey) {
this.cache.delete(oldestKey);
this.cacheStats.evictions++;
}
}
// Invalidate cache entries when data changes
async invalidatePattern(pattern) {
const keysToDelete = [];
for (const key of this.cache.keys()) {
if (key.includes(pattern)) {
keysToDelete.push(key);
}
}
keysToDelete.forEach(key => this.cache.delete(key));
return keysToDelete.length;
}
// Get cache performance statistics
getStats() {
const hitRate = this.cacheStats.hits / (this.cacheStats.hits + this.cacheStats.misses);
return {
...this.cacheStats,
hitRate: hitRate || 0,
cacheSize: this.cache.size,
popularQueries: Array.from(this.queryFrequency.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, 10)
};
}
}
// Cached search implementation
const searchCache = new SearchResultCache({
defaultTTL: 300,
maxCacheSize: 5000,
popularQueryTTL: 900,
personalizedTTL: 60
});
async function performCachedSearch(searchParams) {
const startTime = Date.now();
// Try to get from cache first
const cachedResult = await searchCache.get(searchParams);
if (cachedResult) {
return {
...cachedResult,
searchTime: Date.now() - startTime,
fromCache: true
};
}
// Perform actual search
const searchResults = await executeMongoSearch(searchParams);
// Cache the results for future use
await searchCache.set(searchParams, searchResults);
return {
results: searchResults,
cached: false,
searchTime: Date.now() - startTime,
fromCache: false
};
}
async function executeMongoSearch(searchParams) {
const { query, filters = {}, sort = {}, limit = 20, userId, personalized = false } = searchParams;
let pipeline = [
{
$match: {
$and: [
{
$text: {
$search: query,
$caseSensitive: false,
$diacriticSensitive: false
}
},
filters
]
}
},
{
$addFields: {
relevance_score: { $meta: "textScore" }
}
}
];
// Add personalization if requested
if (personalized && userId) {
pipeline = await addPersonalizationStages(pipeline, userId);
}
// Add sorting and limiting
pipeline.push(
{ $sort: { relevance_score: -1, ...sort } },
{ $limit: limit }
);
return await db.searchable_content.aggregate(pipeline).toArray();
}
Real-Time Search and Autocomplete
Autocomplete Implementation with Search Suggestions
Build responsive autocomplete functionality with search term suggestions:
// Advanced autocomplete and search suggestion system
class SearchAutocompleteManager {
constructor(config = {}) {
this.config = {
minQueryLength: config.minQueryLength || 2,
maxSuggestions: config.maxSuggestions || 10,
suggestionTypes: config.suggestionTypes || ['products', 'categories', 'brands'],
includePopularSearches: config.includePopularSearches !== false,
includeTrendingSearches: config.includeTrendingSearches !== false,
...config
};
}
// Create autocomplete indexes for fast prefix matching
async setupAutocompleteIndexes() {
// Product name autocomplete
await db.products.createIndex({
product_name_autocomplete: "text"
}, {
name: "product_autocomplete",
textIndexVersion: 3
});
// Add autocomplete fields to products
await db.products.updateMany({}, [
{
$set: {
product_name_autocomplete: {
$concat: [
"$product_name", " ",
"$brand", " ",
"$category", " ",
{ $reduce: {
input: "$tags",
initialValue: "",
in: { $concat: ["$$value", " ", "$$this"] }
}}
]
},
// Create searchable tokens
search_tokens: {
$split: [
{
$toLower: {
$concat: [
"$product_name", " ",
"$brand", " ",
"$category"
]
}
},
" "
]
}
}
}
]);
// Create prefix index for fast autocomplete
await db.products.createIndex({
"search_tokens": 1
}, {
name: "search_tokens_prefix"
});
// Popular searches collection for trending suggestions
await db.popular_searches.createIndex({
search_term: 1,
frequency: -1,
last_searched: -1
});
}
// Generate autocomplete suggestions
async generateAutocompleteSuggestions(partialQuery, options = {}) {
const suggestions = {
products: [],
categories: [],
brands: [],
popular_searches: [],
trending_searches: []
};
const queryRegex = new RegExp(partialQuery.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'i');
// Product suggestions
if (this.config.suggestionTypes.includes('products')) {
suggestions.products = await db.products.aggregate([
{
$match: {
$or: [
{ product_name: queryRegex },
{ search_tokens: { $elemMatch: { $regex: queryRegex } } }
],
availability_status: 'in_stock'
}
},
{
$addFields: {
relevance_score: {
$add: [
// Exact name match gets highest score
{ $cond: [{ $regexMatch: { input: "$product_name", regex: `^${partialQuery}`, options: "i" } }, 10, 0] },
// Name contains query
{ $cond: [{ $regexMatch: { input: "$product_name", regex: queryRegex } }, 5, 0] },
// Popularity boost
{ $multiply: [{ $ln: { $add: ["$search_frequency", 1] } }, 0.1] },
// Rating boost
{ $multiply: ["$average_rating", 0.5] }
]
}
}
},
{ $sort: { relevance_score: -1, search_frequency: -1 } },
{ $limit: 5 },
{
$project: {
suggestion: "$product_name",
type: "product",
category: 1,
brand: 1,
price: 1,
image_url: { $arrayElemAt: ["$images", 0] },
relevance_score: 1
}
}
]).toArray();
}
// Category suggestions
if (this.config.suggestionTypes.includes('categories')) {
suggestions.categories = await db.categories.aggregate([
{
$match: {
$or: [
{ name: queryRegex },
{ aliases: { $elemMatch: { $regex: queryRegex } } }
],
active: true
}
},
{
$addFields: {
relevance_score: {
$add: [
{ $cond: [{ $regexMatch: { input: "$name", regex: `^${partialQuery}`, options: "i" } }, 8, 0] },
{ $cond: [{ $regexMatch: { input: "$name", regex: queryRegex } }, 4, 0] },
{ $multiply: [{ $ln: { $add: ["$product_count", 1] } }, 0.1] }
]
}
}
},
{ $sort: { relevance_score: -1, product_count: -1 } },
{ $limit: 3 },
{
$project: {
suggestion: "$name",
type: "category",
product_count: 1,
icon: 1,
relevance_score: 1
}
}
]).toArray();
}
// Brand suggestions
if (this.config.suggestionTypes.includes('brands')) {
suggestions.brands = await db.brands.aggregate([
{
$match: {
name: queryRegex,
active: true
}
},
{
$addFields: {
relevance_score: {
$add: [
{ $cond: [{ $regexMatch: { input: "$name", regex: `^${partialQuery}`, options: "i" } }, 8, 0] },
{ $multiply: [{ $ln: { $add: ["$product_count", 1] } }, 0.1] },
{ $cond: [{ $eq: ["$featured", true] }, 2, 0] }
]
}
}
},
{ $sort: { relevance_score: -1, product_count: -1 } },
{ $limit: 3 },
{
$project: {
suggestion: "$name",
type: "brand",
product_count: 1,
logo_url: 1,
relevance_score: 1
}
}
]).toArray();
}
// Popular searches
if (this.config.includePopularSearches) {
suggestions.popular_searches = await db.popular_searches.find({
search_term: queryRegex,
frequency: { $gte: 5 }
})
.sort({ frequency: -1 })
.limit(3)
.project({
suggestion: "$search_term",
type: "popular_search",
frequency: 1
})
.toArray();
}
// Trending searches (last 24 hours)
if (this.config.includeTrendingSearches) {
const last24Hours = new Date(Date.now() - 24 * 60 * 60 * 1000);
suggestions.trending_searches = await db.search_analytics.aggregate([
{
$match: {
search_term: queryRegex,
timestamp: { $gte: last24Hours }
}
},
{
$group: {
_id: "$search_term",
recent_frequency: { $sum: 1 },
avg_result_clicks: { $avg: "$result_clicks" }
}
},
{ $sort: { recent_frequency: -1, avg_result_clicks: -1 } },
{ $limit: 2 },
{
$project: {
suggestion: "$_id",
type: "trending_search",
recent_frequency: 1,
avg_result_clicks: 1
}
}
]).toArray();
}
// Combine and rank all suggestions
const allSuggestions = [
...suggestions.products,
...suggestions.categories,
...suggestions.brands,
...suggestions.popular_searches,
...suggestions.trending_searches
];
// Sort by relevance and limit
return allSuggestions
.sort((a, b) => (b.relevance_score || 0) - (a.relevance_score || 0))
.slice(0, this.config.maxSuggestions);
}
// Track search queries for improving autocomplete
async trackSearchQuery(query, userId = null, results = []) {
const searchRecord = {
search_term: query.toLowerCase().trim(),
user_id: userId,
timestamp: new Date(),
result_count: results.length,
result_clicks: 0, // Will be updated when user clicks results
session_id: generateSessionId()
};
// Insert search record
await db.search_analytics.insertOne(searchRecord);
// Update popular searches frequency
await db.popular_searches.updateOne(
{ search_term: query.toLowerCase().trim() },
{
$inc: { frequency: 1 },
$set: { last_searched: new Date() },
$setOnInsert: { first_searched: new Date() }
},
{ upsert: true }
);
}
// Update search result click tracking
async trackResultClick(searchId, productId, position) {
await db.search_analytics.updateOne(
{ _id: searchId },
{
$inc: { result_clicks: 1 },
$push: {
clicked_results: {
product_id: productId,
position: position,
timestamp: new Date()
}
}
}
);
}
}
// Usage example
const autocompleteManager = new SearchAutocompleteManager({
minQueryLength: 2,
maxSuggestions: 8,
suggestionTypes: ['products', 'categories', 'brands'],
includePopularSearches: true,
includeTrendingSearches: true
});
// API endpoint for autocomplete
async function autocompleteEndpoint(req, res) {
const { q: query, limit = 8 } = req.query;
if (!query || query.length < autocompleteManager.config.minQueryLength) {
return res.json([]);
}
try {
const suggestions = await autocompleteManager.generateAutocompleteSuggestions(query, { limit });
res.json({
query,
suggestions,
timestamp: new Date().toISOString()
});
} catch (error) {
console.error('Autocomplete error:', error);
res.status(500).json({ error: 'Autocomplete service unavailable' });
}
}
SQL Integration with QueryLeaf
QueryLeaf provides familiar SQL syntax for MongoDB's powerful text search capabilities:
-- QueryLeaf SQL syntax for MongoDB full-text search operations
-- Basic full-text search with SQL-familiar syntax
SELECT
product_id,
product_name,
brand,
category,
price,
average_rating,
review_count,
-- QueryLeaf provides MongoDB's text score as a function
MONGODB_TEXT_SCORE() as relevance_score
FROM products
WHERE FULL_TEXT_SEARCH('wireless bluetooth headphones')
AND price BETWEEN 50 AND 300
AND average_rating >= 4.0
AND inventory_quantity > 0
ORDER BY MONGODB_TEXT_SCORE() DESC, average_rating DESC
LIMIT 20;
-- Advanced search with multiple text fields and weighting
WITH weighted_product_search AS (
SELECT
product_id,
product_name,
brand,
category,
description,
price,
average_rating,
review_count,
inventory_quantity,
-- MongoDB text search with field-specific weights
FULL_TEXT_SEARCH('premium noise cancelling headphones',
JSON_BUILD_OBJECT(
'product_name', 10,
'brand', 8,
'category', 6,
'description', 4,
'tags', 5
)) as text_match,
MONGODB_TEXT_SCORE() as text_relevance_score,
-- Business scoring calculations
(
(average_rating * 2) +
(LN(review_count + 1) / 10) +
CASE
WHEN featured_product = true THEN 5
ELSE 0
END +
CASE
WHEN inventory_quantity > 100 THEN 2
WHEN inventory_quantity > 10 THEN 1
ELSE 0
END
) as business_score,
-- Combined scoring
(MONGODB_TEXT_SCORE() * 3) +
(
(average_rating * 2) +
(LN(review_count + 1) / 10) +
CASE WHEN featured_product = true THEN 5 ELSE 0 END
) as combined_score
FROM products
WHERE FULL_TEXT_SEARCH('premium noise cancelling headphones')
AND price BETWEEN 100 AND 500
AND average_rating >= 4.0
AND inventory_quantity > 0
),
ranked_results AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY combined_score DESC, average_rating DESC) as search_rank,
-- Add search result highlighting
REGEXP_REPLACE(product_name, '(premium|noise|cancelling|headphones)', '<mark>$1</mark>', 'gi') as highlighted_name,
SUBSTRING(description, 1, 200) as description_snippet
FROM weighted_product_search
)
SELECT
product_id,
product_name,
highlighted_name,
brand,
category,
price,
average_rating,
review_count,
description_snippet,
text_relevance_score,
business_score,
combined_score,
search_rank,
-- Search result metadata
CURRENT_TIMESTAMP as search_timestamp,
'premium noise cancelling headphones' as search_query
FROM ranked_results
ORDER BY combined_score DESC
LIMIT 20;
-- Multi-language search with language-specific optimization
SELECT
document_id,
title,
content_summary,
language,
author,
published_date,
view_count,
-- Language-aware full-text search
CASE
WHEN language = 'english' THEN
FULL_TEXT_SEARCH('machine learning artificial intelligence',
JSON_BUILD_OBJECT('language', 'english'))
WHEN language = 'spanish' THEN
FULL_TEXT_SEARCH('aprendizaje automático inteligencia artificial',
JSON_BUILD_OBJECT('language', 'spanish'))
WHEN language = 'french' THEN
FULL_TEXT_SEARCH('apprentissage automatique intelligence artificielle',
JSON_BUILD_OBJECT('language', 'french'))
ELSE
FULL_TEXT_SEARCH('machine learning artificial intelligence',
JSON_BUILD_OBJECT('language', 'english'))
END as language_search_match,
MONGODB_TEXT_SCORE() as relevance_score,
-- Language match boost
CASE
WHEN language = 'english' THEN MONGODB_TEXT_SCORE() * 1.2
ELSE MONGODB_TEXT_SCORE()
END as language_boosted_score
FROM multilingual_documents
WHERE
CASE
WHEN language = 'english' THEN
FULL_TEXT_SEARCH('machine learning artificial intelligence',
JSON_BUILD_OBJECT('language', 'english'))
WHEN language = 'spanish' THEN
FULL_TEXT_SEARCH('aprendizaje automático inteligencia artificial',
JSON_BUILD_OBJECT('language', 'spanish'))
WHEN language = 'french' THEN
FULL_TEXT_SEARCH('apprentissage automatique intelligence artificielle',
JSON_BUILD_OBJECT('language', 'french'))
ELSE false
END
AND published_date >= CURRENT_DATE - INTERVAL '1 year'
AND document_status = 'published'
ORDER BY language_boosted_score DESC, view_count DESC
LIMIT 15;
-- Advanced e-commerce search with filters and faceting
WITH product_search_base AS (
SELECT
p.product_id,
p.product_name,
p.brand,
p.category,
p.subcategory,
p.price,
p.average_rating,
p.review_count,
p.tags,
pi.quantity_available,
-- Full-text search with multiple terms
FULL_TEXT_SEARCH('"wireless headphones" bluetooth premium',
JSON_BUILD_OBJECT(
'product_name', 12,
'brand', 8,
'category', 6,
'tags', 5,
'description', 3
)) as search_match,
MONGODB_TEXT_SCORE() as text_score,
-- Calculate comprehensive relevance score
(
MONGODB_TEXT_SCORE() * 4 + -- Text relevance (40%)
(p.average_rating * 2) + -- Rating influence (20%)
(LN(p.review_count + 1) * 0.5) + -- Review count (5%)
CASE WHEN p.featured = true THEN 3 ELSE 0 END + -- Featured boost (3%)
CASE
WHEN pi.quantity_available > 50 THEN 2
WHEN pi.quantity_available > 10 THEN 1
ELSE 0
END + -- Inventory boost (2%)
CASE
WHEN p.brand IN ('Apple', 'Sony', 'Bose') THEN 1.5
ELSE 0
END -- Premium brand boost (1.5%)
) as comprehensive_score
FROM products p
JOIN product_inventory pi ON p.product_id = pi.product_id
WHERE FULL_TEXT_SEARCH('"wireless headphones" bluetooth premium')
AND p.price BETWEEN 50 AND 400
AND p.average_rating >= 3.5
AND pi.quantity_available > 0
AND p.status = 'active'
),
search_results_with_facets AS (
SELECT
*,
-- Generate search facets for filtering UI
brand as brand_facet,
category as category_facet,
-- Price range facets
CASE
WHEN price < 50 THEN 'Under $50'
WHEN price < 100 THEN '$50-$99'
WHEN price < 200 THEN '$100-$199'
WHEN price < 300 THEN '$200-$299'
ELSE '$300+'
END as price_range_facet,
-- Rating facets
CASE
WHEN average_rating >= 4.5 THEN '4.5+ stars'
WHEN average_rating >= 4.0 THEN '4.0+ stars'
WHEN average_rating >= 3.5 THEN '3.5+ stars'
ELSE '3.0+ stars'
END as rating_facet,
ROW_NUMBER() OVER (ORDER BY comprehensive_score DESC) as search_position
FROM product_search_base
)
-- Main search results
SELECT
product_id,
product_name,
brand,
category,
price,
average_rating,
review_count,
quantity_available,
text_score,
comprehensive_score,
search_position,
-- Add highlighted search terms
REGEXP_REPLACE(product_name, '(wireless|headphones|bluetooth|premium)', '<strong>$1</strong>', 'gi') as highlighted_name,
-- Search metadata
JSON_BUILD_OBJECT(
'search_query', '"wireless headphones" bluetooth premium',
'search_timestamp', CURRENT_TIMESTAMP,
'total_results', COUNT(*) OVER(),
'search_facets', JSON_BUILD_OBJECT(
'brand', brand_facet,
'category', category_facet,
'price_range', price_range_facet,
'rating', rating_facet
)
) as search_metadata
FROM search_results_with_facets
ORDER BY comprehensive_score DESC, average_rating DESC
LIMIT 20;
-- Search analytics and performance monitoring
WITH search_performance_analysis AS (
SELECT
DATE_TRUNC('hour', search_timestamp) as search_hour,
search_query,
COUNT(*) as search_frequency,
AVG(MONGODB_TEXT_SCORE()) as avg_relevance_score,
AVG(search_response_time_ms) as avg_response_time,
COUNT(DISTINCT user_id) as unique_searchers,
-- Click-through analysis
SUM(CASE WHEN result_clicked = true THEN 1 ELSE 0 END) as total_clicks,
(SUM(CASE WHEN result_clicked = true THEN 1 ELSE 0 END)::DECIMAL / COUNT(*)) * 100 as ctr_percentage,
-- Popular result positions
AVG(CASE WHEN result_clicked = true THEN result_position ELSE NULL END) as avg_clicked_position
FROM search_analytics
WHERE search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
GROUP BY DATE_TRUNC('hour', search_timestamp), search_query
HAVING COUNT(*) >= 5 -- Only analyze queries with sufficient volume
),
search_optimization_insights AS (
SELECT
search_query,
SUM(search_frequency) as total_searches,
AVG(avg_relevance_score) as overall_avg_relevance,
AVG(avg_response_time) as overall_avg_response_time,
AVG(ctr_percentage) as overall_ctr,
AVG(avg_clicked_position) as overall_avg_clicked_position,
-- Performance classification
CASE
WHEN AVG(avg_response_time) > 500 THEN 'slow'
WHEN AVG(avg_response_time) > 200 THEN 'moderate'
ELSE 'fast'
END as performance_classification,
-- Relevance quality assessment
CASE
WHEN AVG(ctr_percentage) > 15 THEN 'high_relevance'
WHEN AVG(ctr_percentage) > 8 THEN 'medium_relevance'
ELSE 'low_relevance'
END as relevance_quality,
-- Optimization recommendations
CASE
WHEN AVG(avg_response_time) > 500 THEN 'index_optimization_needed'
WHEN AVG(ctr_percentage) < 5 THEN 'search_algorithm_tuning_needed'
WHEN AVG(avg_clicked_position) > 10 THEN 'ranking_improvement_needed'
ELSE 'performing_well'
END as optimization_recommendation
FROM search_performance_analysis
GROUP BY search_query
)
SELECT
search_query,
total_searches,
overall_avg_relevance,
overall_avg_response_time,
overall_ctr,
overall_avg_clicked_position,
performance_classification,
relevance_quality,
optimization_recommendation,
-- Priority score for optimization efforts
(
CASE performance_classification
WHEN 'slow' THEN 40
WHEN 'moderate' THEN 20
ELSE 10
END +
CASE relevance_quality
WHEN 'low_relevance' THEN 30
WHEN 'medium_relevance' THEN 15
ELSE 5
END +
(total_searches / 100) -- Volume-based priority
) as optimization_priority_score
FROM search_optimization_insights
ORDER BY optimization_priority_score DESC, total_searches DESC;
-- QueryLeaf provides seamless MongoDB text search integration:
-- 1. FULL_TEXT_SEARCH() function with field weighting support
-- 2. MONGODB_TEXT_SCORE() for accessing MongoDB's text relevance scores
-- 3. Language-specific search configuration through JSON parameters
-- 4. Integration with standard SQL filtering, sorting, and aggregation
-- 5. Advanced search analytics and performance monitoring
-- 6. Familiar SQL syntax for complex multi-field text search operations
Best Practices for Production Search Implementation
Search Index Management and Optimization
Essential strategies for maintaining high-performance search in production:
- Index Strategy Planning: Design text indexes based on actual query patterns and field importance
- Performance Monitoring: Continuously monitor search performance and optimize slow queries
- Language Optimization: Configure appropriate language analyzers and stemming for your content
- Relevance Tuning: Regularly analyze search quality metrics and adjust scoring algorithms
- Caching Strategy: Implement intelligent caching for frequently accessed search results
- Resource Management: Monitor index size and query resource usage for capacity planning
Search Quality and User Experience
Optimize search functionality for maximum user satisfaction:
- Relevance Quality: Implement comprehensive relevance scoring that combines text matching with business metrics
- Search Analytics: Track user search behavior to continuously improve search quality
- Autocomplete Performance: Provide fast, relevant search suggestions with minimal latency
- Result Presentation: Design search results with proper highlighting and metadata
- Faceted Search: Enable users to refine searches with category, price, and attribute filters
- Search Personalization: Customize search results based on user preferences and behavior
Conclusion
MongoDB's full-text search capabilities provide comprehensive solutions for implementing production-grade search functionality directly within your database. The combination of powerful text indexing, sophisticated scoring algorithms, and advanced optimization strategies enables applications to deliver fast, relevant search experiences without the complexity of external search engines.
Key benefits of MongoDB full-text search include:
- Performance Optimization: Advanced indexing strategies and caching for high-volume search workloads
- Relevance Intelligence: Sophisticated scoring algorithms that combine text matching with business metrics
- Multi-Language Support: Built-in language analysis, stemming, and localization capabilities
- Scalable Architecture: Distributed search across sharded collections with automatic query routing
- SQL Accessibility: Familiar SQL-style search operations through QueryLeaf for approachable development
- Production Readiness: Comprehensive monitoring, analytics, and optimization tools for enterprise deployments
Whether you're building e-commerce product search, content discovery systems, or enterprise search applications, MongoDB's text search with QueryLeaf's familiar SQL interface provides the foundation for delivering exceptional search experiences that scale with your application's growth.
QueryLeaf Integration: QueryLeaf seamlessly translates SQL full-text search operations into optimized MongoDB text queries. Advanced search features like field weighting, language-specific analysis, and relevance scoring are accessible through familiar SQL syntax, making sophisticated search functionality approachable for SQL-oriented development teams while leveraging MongoDB's powerful text search capabilities.
The combination of MongoDB's robust search engine with SQL-familiar query patterns makes it an ideal choice for applications requiring both powerful search capabilities and familiar database interaction patterns, ensuring your search functionality can evolve with your application's complexity and scale.