Skip to content

MongoDB Full-Text Search and Advanced Indexing: SQL-Style Text Queries and Search Optimization

Modern applications require sophisticated search capabilities that go beyond simple pattern matching. Whether you're building e-commerce product catalogs, content management systems, or document repositories, users expect fast, relevant, and intelligent search functionality that can handle typos, synonyms, and complex queries across multiple languages.

Traditional database text search often relies on basic LIKE patterns or regular expressions, which are limited in functionality and performance. MongoDB's full-text search capabilities, combined with advanced indexing strategies, provide enterprise-grade search functionality that rivals dedicated search engines while maintaining the simplicity of database queries.

The Text Search Challenge

Basic text search approaches have significant limitations:

-- SQL basic text search limitations

-- Simple pattern matching - case sensitive, no relevance
SELECT product_name, description, price
FROM products
WHERE product_name LIKE '%laptop%'
   OR description LIKE '%laptop%';
-- Problems: Case sensitivity, no stemming, no relevance scoring

-- Regular expressions - expensive and limited
SELECT title, content, author
FROM articles  
WHERE content ~* '(machine|artificial|deep).*(learning|intelligence)';
-- Problems: No ranking, poor performance on large datasets

-- Multiple keyword search - complex and inefficient
SELECT *
FROM products
WHERE (LOWER(product_name) LIKE '%gaming%' OR LOWER(description) LIKE '%gaming%')
  AND (LOWER(product_name) LIKE '%laptop%' OR LOWER(description) LIKE '%laptop%')
  AND (LOWER(product_name) LIKE '%performance%' OR LOWER(description) LIKE '%performance%');
-- Problems: Complex syntax, no semantic understanding, poor performance

MongoDB's text search addresses these limitations:

// MongoDB advanced text search capabilities
db.products.find({
  $text: {
    $search: "gaming laptop performance",
    $language: "english",
    $caseSensitive: false,
    $diacriticSensitive: false
  }
}, {
  score: { $meta: "textScore" }
}).sort({
  score: { $meta: "textScore" }
});

// Results include:
// - Stemming: "games" matches "gaming"  
// - Language-specific tokenization
// - Relevance scoring based on term frequency and position
// - Multi-field search across indexed text fields
// - Performance optimized with specialized text indexes

Text Indexing Fundamentals

Creating Text Indexes

Build comprehensive text search functionality with MongoDB text indexes:

// Basic text index creation
db.products.createIndex({
  product_name: "text",
  description: "text",
  category: "text"
});

// Weighted text index for relevance tuning
db.products.createIndex({
  product_name: "text",
  description: "text", 
  tags: "text",
  category: "text"
}, {
  weights: {
    product_name: 10,    // Product name is most important
    description: 5,      // Description has medium importance  
    tags: 8,            // Tags are highly relevant
    category: 3         // Category provides context
  },
  name: "product_text_search",
  default_language: "english",
  language_override: "language"
});

// Compound index combining text search with other criteria
db.products.createIndex({
  category: 1,           // Standard index for filtering
  price: 1,             // Range queries
  product_name: "text",  // Text search
  description: "text"
}, {
  weights: {
    product_name: 15,
    description: 8
  }
});

// Multi-language text index
db.articles.createIndex({
  title: "text",
  content: "text"
}, {
  default_language: "english",
  language_override: "lang",  // Document field that specifies language
  weights: {
    title: 20,
    content: 10
  }
});

SQL-style text indexing concepts:

-- SQL full-text search equivalent patterns

-- Create full-text index on multiple columns
CREATE FULLTEXT INDEX ft_products_search 
ON products (product_name, description, tags);

-- Weighted full-text search with relevance ranking
SELECT 
  product_id,
  product_name,
  description,
  MATCH(product_name, description, tags) 
    AGAINST('gaming laptop performance' IN NATURAL LANGUAGE MODE) AS relevance_score
FROM products
WHERE MATCH(product_name, description, tags) 
  AGAINST('gaming laptop performance' IN NATURAL LANGUAGE MODE)
ORDER BY relevance_score DESC;

-- Boolean full-text search with operators
SELECT *
FROM products
WHERE MATCH(product_name, description) 
  AGAINST('+gaming +laptop -refurbished' IN BOOLEAN MODE);

-- Full-text search with additional filtering
SELECT 
  product_name,
  price,
  category,
  MATCH(product_name, description) 
    AGAINST('high performance gaming' IN NATURAL LANGUAGE MODE) AS score
FROM products
WHERE price BETWEEN 1000 AND 3000
  AND category = 'computers'
  AND MATCH(product_name, description) 
    AGAINST('high performance gaming' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC
LIMIT 20;

Advanced Text Search Queries

Implement sophisticated search patterns:

// Advanced text search implementation
class TextSearchService {
  constructor(db) {
    this.db = db;
    this.productsCollection = db.collection('products');
  }

  async basicTextSearch(searchTerm, options = {}) {
    const query = {
      $text: {
        $search: searchTerm,
        $language: options.language || "english",
        $caseSensitive: options.caseSensitive || false,
        $diacriticSensitive: options.diacriticSensitive || false
      }
    };

    // Add additional filters
    if (options.category) {
      query.category = options.category;
    }

    if (options.priceRange) {
      query.price = {
        $gte: options.priceRange.min,
        $lte: options.priceRange.max
      };
    }

    const results = await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        price: 1,
        category: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();

    return results;
  }

  async phraseSearch(phrase, options = {}) {
    // Exact phrase search using quoted strings
    const query = {
      $text: {
        $search: `"${phrase}"`,
        $language: options.language || "english"
      }
    };

    return await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 10)
    .toArray();
  }

  async booleanTextSearch(searchExpression, options = {}) {
    // Boolean search with inclusion/exclusion operators
    const query = {
      $text: {
        $search: searchExpression,  // e.g., "laptop gaming -refurbished"
        $language: options.language || "english"
      }
    };

    return await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        price: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();
  }

  async fuzzySearch(searchTerm, options = {}) {
    // Combine text search with regex for fuzzy matching
    const textResults = await this.basicTextSearch(searchTerm, options);

    // Fuzzy fallback using regex for typos/variations
    if (textResults.length < 5) {
      const fuzzyPattern = this.buildFuzzyPattern(searchTerm);
      const regexQuery = {
        $or: [
          { product_name: { $regex: fuzzyPattern, $options: 'i' } },
          { description: { $regex: fuzzyPattern, $options: 'i' } }
        ]
      };

      const fuzzyResults = await this.productsCollection.find(regexQuery)
        .limit(10 - textResults.length)
        .toArray();

      return [...textResults, ...fuzzyResults];
    }

    return textResults;
  }

  buildFuzzyPattern(term) {
    // Create regex pattern allowing character variations
    const chars = term.split('');
    const pattern = chars.map(char => {
      return `${char}.*?`;
    }).join('');

    return pattern;
  }

  async searchWithFacets(searchTerm, facetFields = ['category', 'brand', 'price_range']) {
    const pipeline = [
      {
        $match: {
          $text: { $search: searchTerm }
        }
      },
      {
        $addFields: {
          score: { $meta: "textScore" },
          price_range: {
            $switch: {
              branches: [
                { case: { $lte: ["$price", 500] }, then: "Under $500" },
                { case: { $lte: ["$price", 1000] }, then: "$500 - $1000" },
                { case: { $lte: ["$price", 2000] }, then: "$1000 - $2000" },
                { case: { $gt: ["$price", 2000] }, then: "Over $2000" }
              ],
              default: "Unknown"
            }
          }
        }
      },
      {
        $facet: {
          results: [
            { $sort: { score: -1 } },
            { $limit: 20 },
            {
              $project: {
                product_name: 1,
                description: 1,
                price: 1,
                category: 1,
                brand: 1,
                score: 1
              }
            }
          ],
          category_facets: [
            { $group: { _id: "$category", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          brand_facets: [
            { $group: { _id: "$brand", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          price_facets: [
            { $group: { _id: "$price_range", count: { $sum: 1 } } },
            { $sort: { _id: 1 } }
          ]
        }
      }
    ];

    const facetResults = await this.productsCollection.aggregate(pipeline).toArray();
    return facetResults[0];
  }

  async autoComplete(prefix, field = 'product_name', limit = 10) {
    // Auto-completion using regex and text search
    const pipeline = [
      {
        $match: {
          [field]: { $regex: `^${prefix}`, $options: 'i' }
        }
      },
      {
        $group: {
          _id: `$${field}`,
          count: { $sum: 1 }
        }
      },
      {
        $sort: { count: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          suggestion: "$_id",
          frequency: "$count",
          _id: 0
        }
      }
    ];

    return await this.productsCollection.aggregate(pipeline).toArray();
  }
}

Support international search requirements:

// Multi-language text search implementation
class MultiLanguageSearchService {
  constructor(db) {
    this.db = db;
    this.documentsCollection = db.collection('documents');

    // Language-specific stemming and stop words
    this.languageConfig = {
      english: { 
        stopwords: ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
        stemming: true
      },
      spanish: {
        stopwords: ['el', 'la', 'y', 'o', 'pero', 'en', 'con', 'por', 'para', 'de'],
        stemming: true
      },
      french: {
        stopwords: ['le', 'la', 'et', 'ou', 'mais', 'dans', 'sur', 'avec', 'par', 'pour', 'de'],
        stemming: true
      }
    };
  }

  async setupMultiLanguageIndexes() {
    // Create language-specific text indexes
    for (const [language, config] of Object.entries(this.languageConfig)) {
      await this.documentsCollection.createIndex({
        title: "text",
        content: "text",
        tags: "text"
      }, {
        name: `text_search_${language}`,
        default_language: language,
        language_override: "lang",
        weights: {
          title: 15,
          content: 10,
          tags: 8
        }
      });
    }

    // Create compound index with language field
    await this.documentsCollection.createIndex({
      language: 1,
      title: "text",
      content: "text"
    }, {
      name: "multilang_text_search"
    });
  }

  async searchMultiLanguage(searchTerm, targetLanguage = null, options = {}) {
    const query = {
      $text: {
        $search: searchTerm,
        $language: targetLanguage || "english",
        $caseSensitive: false,
        $diacriticSensitive: false
      }
    };

    // Filter by specific language if provided
    if (targetLanguage) {
      query.language = targetLanguage;
    }

    const pipeline = [
      { $match: query },
      {
        $addFields: {
          score: { $meta: "textScore" },
          // Boost score for exact language match
          language_bonus: {
            $cond: {
              if: { $eq: ["$language", targetLanguage || "english"] },
              then: 1.5,
              else: 1.0
            }
          }
        }
      },
      {
        $addFields: {
          adjusted_score: { $multiply: ["$score", "$language_bonus"] }
        }
      },
      {
        $sort: { adjusted_score: -1 }
      },
      {
        $limit: options.limit || 20
      },
      {
        $project: {
          title: 1,
          content: { $substr: ["$content", 0, 200] }, // Excerpt
          language: 1,
          author: 1,
          created_at: 1,
          score: "$adjusted_score"
        }
      }
    ];

    return await this.documentsCollection.aggregate(pipeline).toArray();
  }

  async detectLanguage(text) {
    // Simple language detection based on common words
    const words = text.toLowerCase().split(/\s+/);
    const languageScores = {};

    for (const [language, config] of Object.entries(this.languageConfig)) {
      const stopwordMatches = words.filter(word => 
        config.stopwords.includes(word)
      ).length;

      languageScores[language] = stopwordMatches / words.length;
    }

    // Return language with highest score
    return Object.entries(languageScores)
      .sort(([,a], [,b]) => b - a)[0][0];
  }

  async searchWithLanguageDetection(searchTerm, options = {}) {
    // Auto-detect search term language
    const detectedLanguage = await this.detectLanguage(searchTerm);

    return await this.searchMultiLanguage(searchTerm, detectedLanguage, options);
  }

  async translateAndSearch(searchTerm, sourceLanguage, targetLanguages = ['english']) {
    // This would integrate with translation services
    const searchResults = new Map();

    for (const targetLanguage of targetLanguages) {
      // Placeholder for translation service integration
      const translatedTerm = await this.translateTerm(searchTerm, sourceLanguage, targetLanguage);

      const results = await this.searchMultiLanguage(translatedTerm, targetLanguage);
      searchResults.set(targetLanguage, results);
    }

    return searchResults;
  }

  async translateTerm(term, from, to) {
    // Placeholder for translation service
    // In practice, integrate with Google Translate, AWS Translate, etc.
    return term; // Return original term for now
  }
}

Advanced Search Features

Search Analytics and Optimization

Track and optimize search performance:

// Search analytics and performance optimization
class SearchAnalytics {
  constructor(db) {
    this.db = db;
    this.searchLogsCollection = db.collection('search_logs');
    this.productsCollection = db.collection('products');
  }

  async logSearchQuery(searchData) {
    const logEntry = {
      search_term: searchData.query,
      user_id: searchData.userId,
      session_id: searchData.sessionId,
      timestamp: new Date(),
      results_count: searchData.resultsCount,
      clicked_results: [],
      execution_time_ms: searchData.executionTime,
      search_type: searchData.searchType, // basic, fuzzy, phrase, etc.
      filters_applied: searchData.filters || {},
      user_agent: searchData.userAgent,
      ip_address: searchData.ipAddress
    };

    await this.searchLogsCollection.insertOne(logEntry);
    return logEntry._id;
  }

  async trackSearchClick(searchLogId, clickedResult) {
    await this.searchLogsCollection.updateOne(
      { _id: searchLogId },
      {
        $push: {
          clicked_results: {
            result_id: clickedResult.id,
            result_position: clickedResult.position,
            clicked_at: new Date()
          }
        }
      }
    );
  }

  async getSearchAnalytics(timeframe = 7) {
    const since = new Date(Date.now() - timeframe * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: since }
        }
      },
      {
        $group: {
          _id: {
            search_term: { $toLower: "$search_term" },
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }
          },
          search_count: { $sum: 1 },
          avg_results: { $avg: "$results_count" },
          avg_execution_time: { $avg: "$execution_time_ms" },
          unique_users: { $addToSet: "$user_id" },
          click_through_rate: {
            $avg: {
              $cond: {
                if: { $gt: [{ $size: "$clicked_results" }, 0] },
                then: 1,
                else: 0
              }
            }
          }
        }
      },
      {
        $addFields: {
          unique_user_count: { $size: "$unique_users" }
        }
      },
      {
        $sort: { search_count: -1 }
      },
      {
        $limit: 100
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async getPopularSearchTerms(limit = 20) {
    const pipeline = [
      {
        $match: {
          timestamp: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: { $toLower: "$search_term" },
          frequency: { $sum: 1 },
          avg_results: { $avg: "$results_count" },
          click_rate: {
            $avg: {
              $cond: {
                if: { $gt: [{ $size: "$clicked_results" }, 0] },
                then: 1,
                else: 0
              }
            }
          }
        }
      },
      {
        $match: {
          frequency: { $gte: 2 }  // Only terms searched more than once
        }
      },
      {
        $sort: { frequency: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          search_term: "$_id",
          frequency: 1,
          avg_results: { $round: ["$avg_results", 1] },
          click_rate: { $round: ["$click_rate", 3] },
          _id: 0
        }
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async identifyZeroResultQueries(limit = 50) {
    const pipeline = [
      {
        $match: {
          results_count: 0,
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: { $toLower: "$search_term" },
          occurrence_count: { $sum: 1 },
          last_searched: { $max: "$timestamp" }
        }
      },
      {
        $sort: { occurrence_count: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          search_term: "$_id",
          occurrence_count: 1,
          last_searched: 1,
          _id: 0
        }
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async optimizeSearchIndexes() {
    // Analyze query patterns to optimize indexes
    const searchPatterns = await this.getSearchAnalytics(30);

    const optimizationRecommendations = [];

    for (const pattern of searchPatterns) {
      const searchTerm = pattern._id.search_term;

      // Check if current indexes are efficient for common queries
      const indexStats = await this.productsCollection.aggregate([
        { $indexStats: {} }
      ]).toArray();

      // Analyze index usage for text searches
      const textIndexUsage = indexStats.filter(stat => 
        stat.name.includes('text') || stat.key.hasOwnProperty('_fts')
      );

      if (pattern.avg_execution_time > 100) { // Slow queries > 100ms
        optimizationRecommendations.push({
          issue: 'slow_search',
          search_term: searchTerm,
          avg_time: pattern.avg_execution_time,
          recommendation: 'Consider adding compound index for frequent filters'
        });
      }

      if (pattern.avg_results < 1) { // Very few results
        optimizationRecommendations.push({
          issue: 'low_recall',
          search_term: searchTerm,
          avg_results: pattern.avg_results,
          recommendation: 'Consider fuzzy matching or synonym expansion'
        });
      }
    }

    return optimizationRecommendations;
  }

  async generateSearchSuggestions() {
    // Generate search suggestions based on popular terms
    const popularTerms = await this.getPopularSearchTerms(100);

    const suggestions = [];

    for (const term of popularTerms) {
      // Extract keywords from successful searches
      const keywords = term.search_term.split(' ').filter(word => word.length > 2);

      for (const keyword of keywords) {
        // Find related products to suggest similar searches
        const relatedProducts = await this.productsCollection.find({
          $text: { $search: keyword }
        }, {
          projection: { product_name: 1, category: 1, tags: 1 }
        }).limit(5).toArray();

        const relatedTerms = new Set();

        relatedProducts.forEach(product => {
          // Extract terms from product names and categories
          const productWords = product.product_name.toLowerCase().split(/\s+/);
          const categoryWords = product.category ? product.category.toLowerCase().split(/\s+/) : [];
          const tagWords = product.tags ? product.tags.flatMap(tag => tag.toLowerCase().split(/\s+/)) : [];

          [...productWords, ...categoryWords, ...tagWords].forEach(word => {
            if (word.length > 2 && word !== keyword) {
              relatedTerms.add(word);
            }
          });
        });

        if (relatedTerms.size > 0) {
          suggestions.push({
            base_term: keyword,
            suggested_terms: Array.from(relatedTerms).slice(0, 5),
            popularity: term.frequency
          });
        }
      }
    }

    return suggestions.slice(0, 50); // Top 50 suggestions
  }
}

Real-Time Search Suggestions

Implement dynamic search suggestions and autocomplete:

// Real-time search suggestions system
class SearchSuggestionEngine {
  constructor(db) {
    this.db = db;
    this.suggestionsCollection = db.collection('search_suggestions');
    this.productsCollection = db.collection('products');
  }

  async buildSuggestionIndex() {
    // Create suggestions from product data
    const products = await this.productsCollection.find({}, {
      projection: {
        product_name: 1,
        category: 1,
        brand: 1,
        tags: 1,
        description: 1
      }
    }).toArray();

    const suggestionSet = new Set();

    for (const product of products) {
      // Extract searchable terms
      const terms = this.extractTerms(product);
      terms.forEach(term => suggestionSet.add(term));
    }

    // Convert to suggestion documents
    const suggestionDocs = Array.from(suggestionSet).map(term => ({
      text: term,
      length: term.length,
      frequency: 1, // Initial frequency
      created_at: new Date()
    }));

    // Clear existing suggestions and insert new ones
    await this.suggestionsCollection.deleteMany({});

    if (suggestionDocs.length > 0) {
      await this.suggestionsCollection.insertMany(suggestionDocs);
    }

    // Create indexes for fast prefix matching
    await this.suggestionsCollection.createIndex({ text: 1 });
    await this.suggestionsCollection.createIndex({ length: 1, frequency: -1 });
  }

  extractTerms(product) {
    const terms = new Set();

    // Product name - split and add individual words and phrases
    if (product.product_name) {
      const words = product.product_name.toLowerCase()
        .replace(/[^\w\s]/g, ' ')
        .split(/\s+/)
        .filter(word => word.length >= 2);

      words.forEach(word => terms.add(word));

      // Add 2-word and 3-word phrases
      for (let i = 0; i < words.length - 1; i++) {
        terms.add(`${words[i]} ${words[i + 1]}`);
        if (i < words.length - 2) {
          terms.add(`${words[i]} ${words[i + 1]} ${words[i + 2]}`);
        }
      }
    }

    // Category and brand
    if (product.category) {
      terms.add(product.category.toLowerCase());
    }

    if (product.brand) {
      terms.add(product.brand.toLowerCase());
    }

    // Tags
    if (product.tags && Array.isArray(product.tags)) {
      product.tags.forEach(tag => {
        if (typeof tag === 'string') {
          terms.add(tag.toLowerCase());
        }
      });
    }

    return Array.from(terms);
  }

  async getSuggestions(prefix, limit = 10) {
    // Get suggestions starting with prefix
    const suggestions = await this.suggestionsCollection.find({
      text: { $regex: `^${prefix.toLowerCase()}`, $options: 'i' },
      length: { $lte: 50 } // Reasonable length limit
    })
    .sort({ frequency: -1, length: 1 })
    .limit(limit)
    .project({ text: 1, frequency: 1, _id: 0 })
    .toArray();

    return suggestions.map(s => s.text);
  }

  async updateSuggestionFrequency(searchTerm) {
    // Update frequency when user searches
    await this.suggestionsCollection.updateOne(
      { text: searchTerm.toLowerCase() },
      { 
        $inc: { frequency: 1 },
        $set: { last_used: new Date() }
      },
      { upsert: true }
    );
  }

  async getFuzzySuggestions(term, maxDistance = 2, limit = 5) {
    // Get fuzzy suggestions for typos
    const pipeline = [
      {
        $project: {
          text: 1,
          frequency: 1,
          distance: {
            $function: {
              body: function(text1, text2) {
                // Levenshtein distance calculation
                const a = text1.toLowerCase();
                const b = text2.toLowerCase();
                const matrix = [];

                for (let i = 0; i <= b.length; i++) {
                  matrix[i] = [i];
                }

                for (let j = 0; j <= a.length; j++) {
                  matrix[0][j] = j;
                }

                for (let i = 1; i <= b.length; i++) {
                  for (let j = 1; j <= a.length; j++) {
                    if (b.charAt(i - 1) === a.charAt(j - 1)) {
                      matrix[i][j] = matrix[i - 1][j - 1];
                    } else {
                      matrix[i][j] = Math.min(
                        matrix[i - 1][j - 1] + 1,
                        matrix[i][j - 1] + 1,
                        matrix[i - 1][j] + 1
                      );
                    }
                  }
                }

                return matrix[b.length][a.length];
              },
              args: ["$text", term],
              lang: "js"
            }
          }
        }
      },
      {
        $match: {
          distance: { $lte: maxDistance }
        }
      },
      {
        $sort: {
          distance: 1,
          frequency: -1
        }
      },
      {
        $limit: limit
      },
      {
        $project: {
          text: 1,
          distance: 1,
          _id: 0
        }
      }
    ];

    return await this.suggestionsCollection.aggregate(pipeline).toArray();
  }

  async contextualSuggestions(partialQuery, userContext = {}) {
    // Provide contextual suggestions based on user behavior
    const contextFilters = {};

    if (userContext.previousSearches) {
      // Weight suggestions based on user's search history
      const historicalTerms = userContext.previousSearches.flatMap(search => 
        search.split(' ')
      );

      contextFilters.historical_boost = {
        $in: historicalTerms
      };
    }

    if (userContext.category) {
      // Boost suggestions from user's preferred category
      contextFilters.category_match = userContext.category;
    }

    const pipeline = [
      {
        $match: {
          text: { $regex: `^${partialQuery.toLowerCase()}`, $options: 'i' }
        }
      },
      {
        $addFields: {
          context_score: {
            $add: [
              "$frequency",
              // Boost for historical relevance
              {
                $cond: {
                  if: { $in: ["$text", userContext.previousSearches || []] },
                  then: 10,
                  else: 0
                }
              }
            ]
          }
        }
      },
      {
        $sort: { context_score: -1, length: 1 }
      },
      {
        $limit: 8
      },
      {
        $project: { text: 1, _id: 0 }
      }
    ];

    const suggestions = await this.suggestionsCollection.aggregate(pipeline).toArray();
    return suggestions.map(s => s.text);
  }
}

Combine text search with geographic queries:

// Geospatial text search implementation
class GeoTextSearchService {
  constructor(db) {
    this.db = db;
    this.businessesCollection = db.collection('businesses');
  }

  async setupGeoTextIndexes() {
    // Create compound geospatial and text index
    await this.businessesCollection.createIndex({
      location: "2dsphere",     // Geospatial index
      name: "text",            // Text search
      description: "text",
      tags: "text"
    }, {
      weights: {
        name: 15,
        description: 8,
        tags: 10
      }
    });

    // Alternative: separate indexes
    await this.businessesCollection.createIndex({ location: "2dsphere" });
    await this.businessesCollection.createIndex({
      name: "text",
      description: "text",
      tags: "text"
    });
  }

  async searchNearby(searchTerm, location, radius = 5000, options = {}) {
    // Search for businesses near a location matching text criteria
    const pipeline = [
      {
        $geoNear: {
          near: {
            type: "Point",
            coordinates: [location.longitude, location.latitude]
          },
          distanceField: "distance_meters",
          maxDistance: radius,
          spherical: true,
          query: {
            $text: { $search: searchTerm }
          }
        }
      },
      {
        $addFields: {
          text_score: { $meta: "textScore" },
          // Combine distance and text relevance scoring
          combined_score: {
            $add: [
              { $meta: "textScore" },
              // Distance penalty (closer is better)
              { $multiply: [
                { $divide: [{ $subtract: [radius, "$distance_meters"] }, radius] },
                5  // Distance weight factor
              ]}
            ]
          }
        }
      },
      {
        $sort: { combined_score: -1 }
      },
      {
        $limit: options.limit || 20
      },
      {
        $project: {
          name: 1,
          description: 1,
          address: 1,
          location: 1,
          distance_meters: { $round: ["$distance_meters", 0] },
          text_score: { $round: ["$text_score", 2] },
          combined_score: { $round: ["$combined_score", 2] }
        }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }

  async searchInArea(searchTerm, polygon, options = {}) {
    // Search within a defined geographic area
    const query = {
      $and: [
        {
          location: {
            $geoWithin: {
              $geometry: polygon
            }
          }
        },
        {
          $text: { $search: searchTerm }
        }
      ]
    };

    return await this.businessesCollection.find(query, {
      projection: {
        name: 1,
        description: 1,
        address: 1,
        location: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();
  }

  async clusterSearchResults(searchTerm, center, radius = 10000) {
    // Group search results by geographic clusters
    const pipeline = [
      {
        $match: {
          $and: [
            {
              location: {
                $geoWithin: {
                  $centerSphere: [
                    [center.longitude, center.latitude],
                    radius / 6378100 // Convert to radians (Earth radius in meters)
                  ]
                }
              }
            },
            {
              $text: { $search: searchTerm }
            }
          ]
        }
      },
      {
        $addFields: {
          text_score: { $meta: "textScore" },
          // Create grid coordinates for clustering
          grid_x: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 0] },
                1000  // Grid resolution
              ]
            }
          },
          grid_y: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 1] },
                1000
              ]
            }
          }
        }
      },
      {
        $group: {
          _id: {
            grid_x: "$grid_x",
            grid_y: "$grid_y"
          },
          businesses: {
            $push: {
              name: "$name",
              location: "$location",
              text_score: "$text_score",
              address: "$address"
            }
          },
          count: { $sum: 1 },
          avg_score: { $avg: "$text_score" },
          center_point: {
            $avg: {
              coordinates: "$location.coordinates"
            }
          }
        }
      },
      {
        $match: {
          count: { $gte: 2 }  // Only clusters with multiple businesses
        }
      },
      {
        $sort: { avg_score: -1 }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }

  async spatialAutoComplete(prefix, location, radius = 10000, limit = 10) {
    // Autocomplete suggestions based on nearby businesses
    const pipeline = [
      {
        $match: {
          location: {
            $geoWithin: {
              $centerSphere: [
                [location.longitude, location.latitude],
                radius / 6378100
              ]
            }
          }
        }
      },
      {
        $project: {
          name: 1,
          name_words: {
            $split: [{ $toLower: "$name" }, " "]
          }
        }
      },
      {
        $unwind: "$name_words"
      },
      {
        $match: {
          name_words: { $regex: `^${prefix.toLowerCase()}` }
        }
      },
      {
        $group: {
          _id: "$name_words",
          frequency: { $sum: 1 }
        }
      },
      {
        $sort: { frequency: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          suggestion: "$_id",
          frequency: 1,
          _id: 0
        }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }
}

SQL-style geospatial text search concepts:

-- SQL geospatial text search equivalent patterns

-- PostGIS extension for spatial queries with text search
CREATE EXTENSION IF NOT EXISTS postgis;

-- Spatial and text indexes
CREATE INDEX idx_businesses_location ON businesses USING GIST (location);
CREATE INDEX idx_businesses_text ON businesses USING GIN (
  to_tsvector('english', name || ' ' || description || ' ' || array_to_string(tags, ' '))
);

-- Search nearby businesses with text matching
WITH nearby_businesses AS (
  SELECT 
    business_id,
    name,
    description,
    ST_Distance(location, ST_MakePoint(-122.4194, 37.7749)) AS distance_meters,
    ts_rank(
      to_tsvector('english', name || ' ' || description),
      plainto_tsquery('english', 'coffee shop')
    ) AS text_relevance
  FROM businesses
  WHERE ST_DWithin(
    location, 
    ST_MakePoint(-122.4194, 37.7749)::geography, 
    5000  -- 5km radius
  )
  AND to_tsvector('english', name || ' ' || description) 
      @@ plainto_tsquery('english', 'coffee shop')
)
SELECT 
  name,
  description,
  distance_meters,
  text_relevance,
  -- Combined scoring: text relevance + distance factor
  (text_relevance + (1 - distance_meters / 5000.0)) AS combined_score
FROM nearby_businesses
ORDER BY combined_score DESC
LIMIT 20;

-- Spatial clustering with text search
SELECT 
  ST_ClusterKMeans(location, 5) OVER () AS cluster_id,
  COUNT(*) AS businesses_in_cluster,
  AVG(ts_rank(
    to_tsvector('english', name || ' ' || description),
    plainto_tsquery('english', 'restaurant')
  )) AS avg_relevance,
  ST_Centroid(ST_Collect(location)) AS cluster_center
FROM businesses
WHERE to_tsvector('english', name || ' ' || description) 
      @@ plainto_tsquery('english', 'restaurant')
  AND ST_DWithin(
    location,
    ST_MakePoint(-122.4194, 37.7749)::geography,
    10000
  )
GROUP BY cluster_id
HAVING COUNT(*) >= 3
ORDER BY avg_relevance DESC;

Performance Optimization

Text Index Optimization

Optimize text search performance for large datasets:

// Text search performance optimization
class TextSearchOptimizer {
  constructor(db) {
    this.db = db;
  }

  async analyzeTextIndexPerformance(collection) {
    // Get index statistics
    const indexStats = await this.db.collection(collection).aggregate([
      { $indexStats: {} }
    ]).toArray();

    const textIndexes = indexStats.filter(stat => 
      stat.name.includes('text') || stat.key.hasOwnProperty('_fts')
    );

    const analysis = {
      collection: collection,
      text_indexes: textIndexes.length,
      index_details: []
    };

    for (const index of textIndexes) {
      const indexDetail = {
        name: index.name,
        size_bytes: index.size || 0,
        accesses: index.accesses || {},
        key_pattern: index.key,
        // Calculate index efficiency
        efficiency: this.calculateIndexEfficiency(index.accesses)
      };

      analysis.index_details.push(indexDetail);
    }

    return analysis;
  }

  calculateIndexEfficiency(accesses) {
    if (!accesses || !accesses.ops || !accesses.since) {
      return 0;
    }

    const ageHours = (Date.now() - accesses.since.getTime()) / (1000 * 60 * 60);
    const operationsPerHour = accesses.ops / Math.max(ageHours, 1);

    return {
      ops_per_hour: Math.round(operationsPerHour),
      total_operations: accesses.ops,
      age_hours: Math.round(ageHours)
    };
  }

  async optimizeTextIndexWeights(collection, sampleQueries = []) {
    // Analyze query performance with different weight configurations
    const fieldWeightTests = [
      { title: 20, content: 10, tags: 15 },  // Title-heavy
      { title: 10, content: 20, tags: 8 },   // Content-heavy  
      { title: 15, content: 15, tags: 20 },  // Tag-heavy
      { title: 12, content: 12, tags: 12 }   // Balanced
    ];

    const testResults = [];

    for (const weights of fieldWeightTests) {
      // Create test index
      const indexName = `text_test_${Date.now()}`;

      try {
        await this.db.collection(collection).createIndex({
          title: "text",
          content: "text", 
          tags: "text"
        }, {
          weights: weights,
          name: indexName
        });

        // Test queries with this index configuration
        const queryResults = [];

        for (const query of sampleQueries) {
          const startTime = Date.now();

          const results = await this.db.collection(collection).find({
            $text: { $search: query }
          }, {
            projection: { score: { $meta: "textScore" } }
          })
          .sort({ score: { $meta: "textScore" } })
          .limit(10)
          .toArray();

          const executionTime = Date.now() - startTime;

          queryResults.push({
            query: query,
            results_count: results.length,
            execution_time: executionTime,
            avg_score: results.reduce((sum, r) => sum + r.score, 0) / results.length || 0
          });
        }

        testResults.push({
          weights: weights,
          query_performance: queryResults,
          avg_execution_time: queryResults.reduce((sum, q) => sum + q.execution_time, 0) / queryResults.length,
          avg_relevance: queryResults.reduce((sum, q) => sum + q.avg_score, 0) / queryResults.length
        });

        // Drop test index
        await this.db.collection(collection).dropIndex(indexName);

      } catch (error) {
        console.error(`Failed to test weights ${JSON.stringify(weights)}:`, error);
      }
    }

    // Find optimal weights
    const bestConfig = testResults.reduce((best, current) => {
      const bestScore = (best.avg_relevance || 0) - (best.avg_execution_time || 1000) / 1000;
      const currentScore = (current.avg_relevance || 0) - (current.avg_execution_time || 1000) / 1000;

      return currentScore > bestScore ? current : best;
    });

    return {
      recommended_weights: bestConfig.weights,
      test_results: testResults,
      optimization_summary: {
        performance_gain: bestConfig.avg_execution_time < 100 ? 'excellent' : 'good',
        relevance_quality: bestConfig.avg_relevance > 1.0 ? 'high' : 'moderate'
      }
    };
  }

  async createOptimalTextIndex(collection, fields, sampleData = []) {
    // Analyze field content to determine optimal index configuration
    const fieldAnalysis = await this.analyzeFields(collection, fields);

    // Calculate optimal weights based on content analysis
    const weights = this.calculateOptimalWeights(fieldAnalysis);

    // Determine language settings
    const languageDistribution = await this.analyzeLanguageDistribution(collection);

    const indexConfig = {
      weights: weights,
      default_language: languageDistribution.primary_language,
      language_override: 'language',
      name: `optimized_text_${Date.now()}`
    };

    // Create the optimized index
    const indexSpec = {};
    fields.forEach(field => {
      indexSpec[field] = "text";
    });

    await this.db.collection(collection).createIndex(indexSpec, indexConfig);

    return {
      index_name: indexConfig.name,
      configuration: indexConfig,
      field_analysis: fieldAnalysis,
      language_distribution: languageDistribution
    };
  }

  async analyzeFields(collection, fields) {
    const pipeline = [
      { $sample: { size: 1000 } },  // Sample for analysis
      {
        $project: fields.reduce((proj, field) => {
          proj[field] = 1;
          proj[`${field}_word_count`] = {
            $size: {
              $split: [
                { $ifNull: [`$${field}`, ""] },
                " "
              ]
            }
          };
          proj[`${field}_char_count`] = {
            $strLenCP: { $ifNull: [`$${field}`, ""] }
          };
          return proj;
        }, {})
      }
    ];

    const sampleDocs = await this.db.collection(collection).aggregate(pipeline).toArray();

    const analysis = {};

    for (const field of fields) {
      const wordCounts = sampleDocs.map(doc => doc[`${field}_word_count`] || 0);
      const charCounts = sampleDocs.map(doc => doc[`${field}_char_count`] || 0);

      analysis[field] = {
        avg_words: wordCounts.reduce((sum, count) => sum + count, 0) / wordCounts.length,
        avg_chars: charCounts.reduce((sum, count) => sum + count, 0) / charCounts.length,
        max_words: Math.max(...wordCounts),
        non_empty_ratio: wordCounts.filter(count => count > 0).length / wordCounts.length
      };
    }

    return analysis;
  }

  calculateOptimalWeights(fieldAnalysis) {
    const weights = {};
    let totalScore = 0;

    // Calculate field importance scores
    for (const [field, stats] of Object.entries(fieldAnalysis)) {
      // Higher weight for fields with moderate word counts and high fill rates
      const wordScore = Math.min(stats.avg_words / 10, 3); // Cap at reasonable level
      const fillScore = stats.non_empty_ratio * 5;

      const fieldScore = wordScore + fillScore;
      weights[field] = Math.max(Math.round(fieldScore), 1);
      totalScore += weights[field];
    }

    // Normalize weights to reasonable range (1-20)
    const maxWeight = Math.max(...Object.values(weights));
    if (maxWeight > 20) {
      for (const field in weights) {
        weights[field] = Math.round((weights[field] / maxWeight) * 20);
      }
    }

    return weights;
  }

  async analyzeLanguageDistribution(collection) {
    // Simple language detection based on common words
    const pipeline = [
      { $sample: { size: 500 } },
      {
        $project: {
          text_content: {
            $concat: [
              { $ifNull: ["$title", ""] },
              " ",
              { $ifNull: ["$content", ""] },
              " ",
              { $ifNull: [{ $reduce: { input: "$tags", initialValue: "", in: { $concat: ["$$value", " ", "$$this"] } } }, ""] }
            ]
          }
        }
      }
    ];

    const samples = await this.db.collection(collection).aggregate(pipeline).toArray();

    const languageScores = { english: 0, spanish: 0, french: 0, german: 0 };

    // Language-specific common words
    const languageMarkers = {
      english: ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
      spanish: ['el', 'la', 'y', 'o', 'pero', 'en', 'con', 'por', 'para', 'de', 'que', 'es'],
      french: ['le', 'la', 'et', 'ou', 'mais', 'dans', 'sur', 'avec', 'par', 'pour', 'de', 'que'],
      german: ['der', 'die', 'das', 'und', 'oder', 'aber', 'in', 'auf', 'mit', 'für', 'von', 'zu']
    };

    for (const sample of samples) {
      const words = sample.text_content.toLowerCase().split(/\s+/);

      for (const [language, markers] of Object.entries(languageMarkers)) {
        const matches = words.filter(word => markers.includes(word)).length;
        languageScores[language] += matches / words.length;
      }
    }

    const totalSamples = samples.length;
    for (const language in languageScores) {
      languageScores[language] = languageScores[language] / totalSamples;
    }

    const primaryLanguage = Object.entries(languageScores)
      .sort(([,a], [,b]) => b - a)[0][0];

    return {
      primary_language: primaryLanguage,
      distribution: languageScores,
      confidence: languageScores[primaryLanguage]
    };
  }
}

QueryLeaf Text Search Integration

QueryLeaf provides familiar SQL-style text search syntax with MongoDB's powerful full-text capabilities:

-- QueryLeaf text search with SQL-familiar syntax

-- Basic full-text search using SQL MATCH syntax
SELECT 
  product_id,
  product_name,
  description,
  price,
  MATCH(product_name, description) AGAINST('gaming laptop') AS relevance_score
FROM products
WHERE MATCH(product_name, description) AGAINST('gaming laptop')
ORDER BY relevance_score DESC
LIMIT 20;

-- Boolean text search with operators
SELECT 
  product_name,
  category,
  price,
  MATCH_SCORE(product_name, description, tags) AS score
FROM products  
WHERE FULL_TEXT_SEARCH(product_name, description, tags, '+gaming +laptop -refurbished')
ORDER BY score DESC;

-- Phrase search for exact matches
SELECT 
  article_id,
  title,
  author,
  created_date,
  TEXT_SCORE(title, content) AS relevance
FROM articles
WHERE PHRASE_SEARCH(title, content, '"machine learning algorithms"')
ORDER BY relevance DESC;

-- Multi-language text search
SELECT 
  document_id,
  title,
  content,
  language,
  MATCH_MULTILANG(title, content, 'artificial intelligence', language) AS score
FROM documents
WHERE MATCH_MULTILANG(title, content, 'artificial intelligence', language) > 0.5
ORDER BY score DESC;

-- Text search with geographic filtering  
SELECT 
  b.business_name,
  b.address,
  ST_Distance(b.location, ST_MakePoint(-122.4194, 37.7749)) AS distance_meters,
  MATCH(b.business_name, b.description) AGAINST('coffee shop') AS text_score
FROM businesses b
WHERE ST_DWithin(
    b.location,
    ST_MakePoint(-122.4194, 37.7749),
    5000  -- 5km radius
  )
  AND MATCH(b.business_name, b.description) AGAINST('coffee shop')
ORDER BY (text_score * 0.7 + (1 - distance_meters/5000) * 0.3) DESC;

-- QueryLeaf automatically handles:
-- 1. MongoDB text index creation and optimization
-- 2. Language detection and stemming
-- 3. Relevance scoring and ranking
-- 4. Multi-field search coordination
-- 5. Performance optimization through proper indexing
-- 6. Integration with other query types (geospatial, range, etc.)

-- Advanced text analytics with SQL aggregations
WITH search_analytics AS (
  SELECT 
    search_term,
    COUNT(*) as search_frequency,
    AVG(MATCH(product_name, description) AGAINST(search_term)) as avg_relevance,
    COUNT(CASE WHEN clicked = true THEN 1 END) as click_count
  FROM search_logs
  WHERE search_date >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY search_term
)
SELECT 
  search_term,
  search_frequency,
  ROUND(avg_relevance, 3) as avg_relevance,
  ROUND(100.0 * click_count / search_frequency, 1) as click_through_rate,
  CASE 
    WHEN avg_relevance < 0.5 THEN 'LOW_QUALITY'
    WHEN click_through_rate < 5.0 THEN 'LOW_ENGAGEMENT' 
    ELSE 'PERFORMING_WELL'
  END as search_quality
FROM search_analytics
WHERE search_frequency >= 10
ORDER BY search_frequency DESC;

-- Auto-complete and suggestions using SQL
SELECT DISTINCT
  SUBSTRING(product_name, 1, POSITION(' ' IN product_name || ' ') - 1) as suggestion,
  COUNT(*) as frequency
FROM products
WHERE product_name ILIKE 'gam%'
  AND LENGTH(product_name) >= 4
GROUP BY suggestion
HAVING COUNT(*) >= 2
ORDER BY frequency DESC, suggestion ASC
LIMIT 10;

-- Search result clustering and categorization
SELECT 
  category,
  COUNT(*) as result_count,
  AVG(MATCH(product_name, description) AGAINST('smartphone')) as avg_relevance,
  MIN(price) as min_price,
  MAX(price) as max_price,
  ARRAY_AGG(DISTINCT brand ORDER BY brand) as available_brands
FROM products
WHERE MATCH(product_name, description) AGAINST('smartphone')
  AND MATCH(product_name, description) AGAINST('smartphone') > 0.3
GROUP BY category
HAVING COUNT(*) >= 5
ORDER BY avg_relevance DESC;

Search Implementation Guidelines

Essential practices for implementing MongoDB text search:

  1. Index Strategy: Create focused text indexes on relevant fields with appropriate weights
  2. Language Support: Configure proper language settings for stemming and tokenization
  3. Performance Monitoring: Track search query performance and optimize accordingly
  4. Relevance Tuning: Adjust field weights based on user behavior and search analytics
  5. Fallback Mechanisms: Implement fuzzy search for handling typos and variations
  6. Caching: Cache frequent search results and suggestions for improved performance

Search Quality Optimization

Improve search result quality and user experience:

  1. Analytics-Driven Optimization: Use search analytics to identify and fix poor-performing queries
  2. User Feedback Integration: Incorporate click-through rates and user interactions for relevance tuning
  3. Synonym Management: Implement synonym expansion for better search recall
  4. Personalization: Provide contextual suggestions based on user history and preferences
  5. Multi-Modal Search: Combine text search with filters, geospatial queries, and faceted search
  6. Real-Time Adaptation: Continuously update indexes and suggestions based on new content

Conclusion

MongoDB's full-text search capabilities provide enterprise-grade search functionality that rivals dedicated search engines while maintaining database integration simplicity. Combined with SQL-style query patterns, MongoDB text search enables familiar search implementation approaches while delivering the scalability and performance required for modern applications.

Key text search benefits include:

  • Advanced Linguistics: Stemming, tokenization, and language-specific processing for accurate results
  • Relevance Scoring: Built-in scoring algorithms with customizable field weights for optimal ranking
  • Performance Optimization: Specialized text indexes and query optimization for fast search response
  • Multi-Language Support: Native support for multiple languages with proper linguistic handling
  • Integration Flexibility: Seamless integration with other MongoDB query types and aggregation pipelines

Whether you're building product catalogs, content management systems, or document search applications, MongoDB text search with QueryLeaf's familiar SQL interface provides the foundation for sophisticated search experiences. This combination enables you to implement powerful search functionality while preserving the development patterns and query approaches your team already knows.

The integration of advanced text search capabilities with SQL-style query management makes MongoDB an ideal platform for applications requiring both powerful search functionality and familiar database interaction patterns, ensuring your search features remain both comprehensive and maintainable as they scale and evolve.