MongoDB Full-Text Search and Advanced Text Indexing: Query Optimization and Natural Language Processing

Modern applications require sophisticated text search capabilities that go beyond simple pattern matching to deliver intelligent, contextual search experiences. MongoDB's full-text search features provide comprehensive text indexing, natural language processing, relevance scoring, and advanced query capabilities that rival dedicated search engines while maintaining the simplicity and integration benefits of database-native search functionality.

MongoDB text indexes automatically handle stemming, stop word filtering, language-specific text processing, and relevance scoring, enabling developers to build powerful search features without the complexity of maintaining separate search infrastructure. Combined with aggregation pipelines and flexible document structures, MongoDB delivers enterprise-grade text search capabilities with familiar SQL-style query patterns.

The Text Search Challenge

Traditional database text queries using LIKE patterns are inefficient and limited:

-- Traditional SQL text search - limited and slow
SELECT product_id, name, description, category
FROM products
WHERE name LIKE '%wireless%'
   OR description LIKE '%bluetooth%'
   OR category LIKE '%electronics%';

-- Problems with pattern matching:
-- 1. Case sensitivity issues
-- 2. No support for word variations (wireless vs wirelessly)
-- 3. No relevance scoring or ranking
-- 4. Poor performance on large text fields
-- 5. No natural language processing
-- 6. Limited multi-field search capabilities
-- 7. No fuzzy matching or typo tolerance

MongoDB text search provides sophisticated alternatives:

// Create comprehensive text index for advanced search
db.products.createIndex({
  "name": "text",
  "description": "text", 
  "category": "text",
  "tags": "text",
  "brand": "text",
  "specifications.features": "text"
}, {
  "default_language": "english",
  "language_override": "language",
  "name": "comprehensive_product_search",
  "weights": {
    "name": 10,           // Highest relevance for product names
    "category": 8,        // High relevance for categories
    "brand": 6,           // Medium-high relevance for brands
    "description": 4,     // Medium relevance for descriptions
    "tags": 3,            // Lower relevance for tags
    "specifications.features": 2  // Lowest relevance for detailed specs
  },
  "textIndexVersion": 3
})

// Sample product document structure optimized for text search
{
  "_id": ObjectId("..."),
  "product_id": "ELEC_HEADPHONE_001",
  "name": "Premium Wireless Noise-Canceling Headphones",
  "description": "Experience crystal-clear audio with advanced noise cancellation technology. Perfect for travel, office work, or immersive music listening sessions.",
  "category": "Electronics",
  "subcategory": "Audio Equipment",
  "brand": "TechAudio Pro",
  "price": 299.99,
  "currency": "USD",

  // Optimized for search
  "tags": ["wireless", "bluetooth", "noise-canceling", "premium", "travel", "office"],
  "search_keywords": ["headphones", "audio", "music", "wireless", "bluetooth", "noise-canceling"],

  // Product specifications with searchable features
  "specifications": {
    "features": [
      "Active noise cancellation",
      "Bluetooth 5.0 connectivity", 
      "40-hour battery life",
      "Fast charging capability",
      "Multi-device pairing",
      "Voice assistant integration"
    ],
    "technical": {
      "driver_size": "40mm",
      "frequency_response": "20Hz-20kHz",
      "impedance": "32 ohms",
      "weight": "250g"
    }
  },

  // Multi-language support
  "language": "english",
  "translations": {
    "spanish": {
      "name": "Auriculares Inalámbricos Premium con Cancelación de Ruido",
      "description": "Experimenta audio cristalino con tecnología avanzada de cancelación de ruido."
    }
  },

  // Search analytics and optimization
  "search_metadata": {
    "search_count": 0,
    "popular_queries": [],
    "last_updated": ISODate("2025-12-29T00:00:00Z"),
    "seo_optimized": true
  },

  // Business metadata
  "inventory": {
    "stock_count": 150,
    "availability": "in_stock",
    "warehouse_locations": ["US-WEST", "US-EAST", "EU-CENTRAL"]
  },
  "ratings": {
    "average_rating": 4.7,
    "total_reviews": 342,
    "rating_distribution": {
      "5_star": 198,
      "4_star": 89,
      "3_star": 35,
      "2_star": 12,
      "1_star": 8
    }
  },
  "created_at": ISODate("2025-12-01T00:00:00Z"),
  "updated_at": ISODate("2025-12-29T00:00:00Z")
}

Advanced Text Search Queries

Basic Full-Text Search with Relevance Scoring

// Comprehensive text search with relevance scoring
db.products.aggregate([
  // Stage 1: Text search with scoring
  {
    $match: {
      $text: {
        $search: "wireless bluetooth headphones",
        $caseSensitive: false,
        $diacriticSensitive: false
      }
    }
  },

  // Stage 2: Add relevance score and metadata
  {
    $addFields: {
      relevance_score: { $meta: "textScore" },
      search_query: "wireless bluetooth headphones",
      search_timestamp: "$$NOW"
    }
  },

  // Stage 3: Enhanced relevance calculation
  {
    $addFields: {
      // Boost scores based on business factors
      boosted_score: {
        $multiply: [
          "$relevance_score",
          {
            $add: [
              1, // Base multiplier

              // Availability boost
              { $cond: [{ $eq: ["$inventory.availability", "in_stock"] }, 0.3, 0] },

              // Rating boost (high-rated products get higher relevance)
              { $multiply: [{ $subtract: ["$ratings.average_rating", 3] }, 0.1] },

              // Popular product boost
              { $cond: [{ $gt: ["$ratings.total_reviews", 100] }, 0.2, 0] },

              // Price range boost (mid-range products favored)
              {
                $cond: [
                  { $and: [{ $gte: ["$price", 50] }, { $lte: ["$price", 500] }] },
                  0.15,
                  0
                ]
              }
            ]
          }
        ]
      }
    }
  },

  // Stage 4: Category and brand analysis
  {
    $addFields: {
      category_match: {
        $cond: [
          { $regexMatch: { input: "$category", regex: /electronics|audio/i } },
          true,
          false
        ]
      },
      brand_popularity_score: {
        $switch: {
          branches: [
            { case: { $in: ["$brand", ["TechAudio Pro", "SoundMaster", "AudioElite"]] }, then: 1.2 },
            { case: { $in: ["$brand", ["BasicSound", "EcoAudio", "ValueTech"]] }, then: 0.9 }
          ],
          default: 1.0
        }
      }
    }
  },

  // Stage 5: Final score calculation
  {
    $addFields: {
      final_relevance_score: {
        $multiply: [
          "$boosted_score",
          "$brand_popularity_score",
          { $cond: ["$category_match", 1.1, 1.0] }
        ]
      }
    }
  },

  // Stage 6: Filter and sort results
  {
    $match: {
      "relevance_score": { $gte: 0.5 }, // Minimum relevance threshold
      "inventory.availability": { $ne: "discontinued" }
    }
  },

  // Stage 7: Sort by relevance and business factors
  {
    $sort: {
      "final_relevance_score": -1,
      "ratings.average_rating": -1,
      "ratings.total_reviews": -1
    }
  },

  // Stage 8: Project search results
  {
    $project: {
      product_id: 1,
      name: 1,
      description: 1,
      category: 1,
      brand: 1,
      price: 1,
      currency: 1,

      // Search relevance information
      search_metadata: {
        relevance_score: { $round: ["$relevance_score", 3] },
        boosted_score: { $round: ["$boosted_score", 3] },
        final_score: { $round: ["$final_relevance_score", 3] },
        search_query: "$search_query",
        search_timestamp: "$search_timestamp"
      },

      // Business information
      availability: "$inventory.availability",
      stock_count: "$inventory.stock_count",
      rating_info: {
        average_rating: "$ratings.average_rating",
        total_reviews: "$ratings.total_reviews"
      },

      // Highlighted text fields for search result display
      search_highlights: {
        name_highlight: "$name",
        description_highlight: { $substr: ["$description", 0, 150] },
        category_match: "$category_match"
      }
    }
  },

  { $limit: 20 }
])

Advanced Multi-Language Text Search

// Multi-language text search with language detection
db.products.aggregate([
  // Stage 1: Detect search language and perform text search
  {
    $match: {
      $or: [
        // English search
        { 
          $and: [
            { $text: { $search: "auriculares inalámbricos" } },
            { "language": "spanish" }
          ]
        },
        // Spanish search
        {
          $and: [
            { $text: { $search: "wireless headphones" } },
            { "language": "english" }
          ]
        },
        // Language-agnostic search (fallback)
        { $text: { $search: "auriculares inalámbricos wireless headphones" } }
      ]
    }
  },

  // Stage 2: Language-aware scoring
  {
    $addFields: {
      base_score: { $meta: "textScore" },
      detected_language: {
        $cond: [
          { $regexMatch: { input: "auriculares inalámbricos", regex: /[áéíóúñü]/i } },
          "spanish",
          "english"
        ]
      }
    }
  },

  // Stage 3: Apply language-specific boosts
  {
    $addFields: {
      language_adjusted_score: {
        $multiply: [
          "$base_score",
          {
            $cond: [
              { $eq: ["$detected_language", "$language"] },
              1.5, // Boost for exact language match
              1.0  // Standard score for cross-language matches
            ]
          }
        ]
      }
    }
  },

  // Stage 4: Add localized content
  {
    $addFields: {
      localized_name: {
        $cond: [
          { $eq: ["$detected_language", "spanish"] },
          "$translations.spanish.name",
          "$name"
        ]
      },
      localized_description: {
        $cond: [
          { $eq: ["$detected_language", "spanish"] },
          "$translations.spanish.description", 
          "$description"
        ]
      }
    }
  },

  { $sort: { "language_adjusted_score": -1 } },
  { $limit: 15 }
])

Faceted Search with Text Queries

// Advanced faceted search combining text search with categorical filtering
db.products.aggregate([
  // Stage 1: Text search foundation
  {
    $match: {
      $text: { $search: "gaming laptop high performance" }
    }
  },

  // Stage 2: Add text relevance score
  {
    $addFields: {
      text_score: { $meta: "textScore" }
    }
  },

  // Stage 3: Create facet aggregations
  {
    $facet: {
      // Main search results
      "search_results": [
        {
          $match: {
            "text_score": { $gte: 0.5 },
            "inventory.availability": { $in: ["in_stock", "limited_stock"] }
          }
        },
        {
          $sort: { "text_score": -1, "ratings.average_rating": -1 }
        },
        {
          $project: {
            product_id: 1,
            name: 1,
            brand: 1,
            price: 1,
            category: 1,
            subcategory: 1,
            text_score: { $round: ["$text_score", 2] },
            rating: "$ratings.average_rating",
            availability: "$inventory.availability"
          }
        },
        { $limit: 50 }
      ],

      // Price range facets
      "price_facets": [
        {
          $bucket: {
            groupBy: "$price",
            boundaries: [0, 100, 300, 500, 1000, 2000, 5000],
            default: "5000+",
            output: {
              count: { $sum: 1 },
              avg_price: { $avg: "$price" },
              avg_rating: { $avg: "$ratings.average_rating" }
            }
          }
        }
      ],

      // Category facets
      "category_facets": [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" },
            avg_price: { $avg: "$price" },
            avg_rating: { $avg: "$ratings.average_rating" }
          }
        },
        { $sort: { "avg_score": -1 } }
      ],

      // Brand facets
      "brand_facets": [
        {
          $group: {
            _id: "$brand",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" },
            price_range: {
              min_price: { $min: "$price" },
              max_price: { $max: "$price" }
            },
            avg_rating: { $avg: "$ratings.average_rating" }
          }
        },
        { $sort: { "count": -1, "avg_score": -1 } },
        { $limit: 15 }
      ],

      // Rating facets
      "rating_facets": [
        {
          $bucket: {
            groupBy: "$ratings.average_rating",
            boundaries: [0, 2, 3, 4, 4.5, 5],
            default: "unrated",
            output: {
              count: { $sum: 1 },
              avg_score: { $avg: "$text_score" },
              price_range: {
                min_price: { $min: "$price" },
                max_price: { $max: "$price" }
              }
            }
          }
        }
      ],

      // Availability facets
      "availability_facets": [
        {
          $group: {
            _id: "$inventory.availability",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" }
          }
        }
      ],

      // Search analytics
      "search_analytics": [
        {
          $group: {
            _id: null,
            total_results: { $sum: 1 },
            avg_relevance_score: { $avg: "$text_score" },
            score_distribution: {
              high_relevance: { $sum: { $cond: [{ $gte: ["$text_score", 1.5] }, 1, 0] } },
              medium_relevance: { $sum: { $cond: [{ $and: [{ $gte: ["$text_score", 1.0] }, { $lt: ["$text_score", 1.5] }] }, 1, 0] } },
              low_relevance: { $sum: { $cond: [{ $lt: ["$text_score", 1.0] }, 1, 0] } }
            },
            price_stats: {
              avg_price: { $avg: "$price" },
              min_price: { $min: "$price" },
              max_price: { $max: "$price" }
            }
          }
        }
      ]
    }
  },

  // Stage 4: Format faceted results
  {
    $project: {
      search_results: 1,
      facets: {
        price_ranges: "$price_facets",
        categories: "$category_facets", 
        brands: "$brand_facets",
        ratings: "$rating_facets",
        availability: "$availability_facets"
      },
      search_summary: {
        $arrayElemAt: ["$search_analytics", 0]
      }
    }
  }
])

Auto-Complete and Suggestion Engine

// Intelligent auto-complete system with typo tolerance
class MongoDBAutoCompleteEngine {
  constructor(db, collection) {
    this.db = db;
    this.collection = collection;
    this.suggestionCache = new Map();
  }

  async createAutoCompleteIndexes() {
    // Create text index for auto-complete
    await this.db[this.collection].createIndex({
      "name": "text",
      "tags": "text",
      "search_keywords": "text",
      "category": "text",
      "brand": "text"
    }, {
      name: "autocomplete_text_index",
      weights: {
        "name": 10,
        "brand": 8,
        "category": 6,
        "tags": 4,
        "search_keywords": 3
      }
    });

    // Create prefix-based indexes for fast auto-complete
    await this.db[this.collection].createIndex({
      "name": 1,
      "category": 1,
      "brand": 1
    }, {
      name: "prefix_autocomplete_index"
    });
  }

  async getAutoCompleteSuggestions(query, options = {}) {
    const maxSuggestions = options.maxSuggestions || 10;
    const includeCategories = options.includeCategories !== false;
    const includeBrands = options.includeBrands !== false;
    const minScore = options.minScore || 0.3;

    try {
      const suggestions = await this.db[this.collection].aggregate([
        // Stage 1: Multi-approach matching
        {
          $facet: {
            // Exact prefix matching
            "prefix_matches": [
              {
                $match: {
                  $or: [
                    { "name": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } },
                    { "brand": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } },
                    { "category": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } }
                  ]
                }
              },
              {
                $addFields: {
                  suggestion_type: "prefix_match",
                  suggestion_text: {
                    $cond: [
                      { $regexMatch: { input: "$name", regex: new RegExp(`^${this.escapeRegex(query)}`, "i") } },
                      "$name",
                      {
                        $cond: [
                          { $regexMatch: { input: "$brand", regex: new RegExp(`^${this.escapeRegex(query)}`, "i") } },
                          "$brand",
                          "$category"
                        ]
                      }
                    ]
                  },
                  relevance_score: 2.0
                }
              },
              { $limit: 5 }
            ],

            // Full-text search suggestions
            "text_matches": [
              {
                $match: {
                  $text: { $search: query }
                }
              },
              {
                $addFields: {
                  suggestion_type: "text_match",
                  suggestion_text: "$name",
                  relevance_score: { $meta: "textScore" }
                }
              },
              {
                $match: { "relevance_score": { $gte: minScore } }
              },
              { $limit: 5 }
            ],

            // Fuzzy matching for typo tolerance
            "fuzzy_matches": [
              {
                $match: {
                  $or: [
                    { "name": { $regex: this.generateFuzzyRegex(query), $options: "i" } },
                    { "tags": { $elemMatch: { $regex: this.generateFuzzyRegex(query), $options: "i" } } }
                  ]
                }
              },
              {
                $addFields: {
                  suggestion_type: "fuzzy_match",
                  suggestion_text: "$name",
                  relevance_score: 1.0
                }
              },
              { $limit: 3 }
            ]
          }
        },

        // Stage 2: Combine and deduplicate suggestions
        {
          $project: {
            all_suggestions: {
              $concatArrays: ["$prefix_matches", "$text_matches", "$fuzzy_matches"]
            }
          }
        },

        // Stage 3: Unwind and process suggestions
        { $unwind: "$all_suggestions" },
        { $replaceRoot: { newRoot: "$all_suggestions" } },

        // Stage 4: Group to remove duplicates
        {
          $group: {
            _id: "$suggestion_text",
            suggestion_type: { $first: "$suggestion_type" },
            relevance_score: { $max: "$relevance_score" },
            product_count: { $sum: 1 },
            sample_product: { $first: "$$ROOT" }
          }
        },

        // Stage 5: Enhanced relevance scoring
        {
          $addFields: {
            final_score: {
              $multiply: [
                "$relevance_score",
                {
                  $switch: {
                    branches: [
                      { case: { $eq: ["$suggestion_type", "prefix_match"] }, then: 1.5 },
                      { case: { $eq: ["$suggestion_type", "text_match"] }, then: 1.2 },
                      { case: { $eq: ["$suggestion_type", "fuzzy_match"] }, then: 0.8 }
                    ],
                    default: 1.0
                  }
                },
                // Boost popular suggestions
                { $add: [1.0, { $multiply: [{ $ln: "$product_count" }, 0.1] }] }
              ]
            }
          }
        },

        // Stage 6: Sort and limit results
        { $sort: { "final_score": -1, "_id": 1 } },
        { $limit: maxSuggestions },

        // Stage 7: Format final suggestions
        {
          $project: {
            suggestion: "$_id",
            type: "$suggestion_type",
            score: { $round: ["$final_score", 2] },
            product_count: 1,
            category: "$sample_product.category",
            brand: "$sample_product.brand"
          }
        }
      ]).toArray();

      return suggestions;

    } catch (error) {
      console.error('Auto-complete error:', error);
      return [];
    }
  }

  escapeRegex(string) {
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  }

  generateFuzzyRegex(query) {
    // Simple fuzzy matching - allows one character difference per 4 characters
    const chars = query.split('');
    const pattern = chars.map((char, index) => {
      if (index % 4 === 0 && index > 0) {
        return `.?${this.escapeRegex(char)}`;
      }
      return this.escapeRegex(char);
    }).join('');

    return pattern;
  }

  async getSearchHistory(userId, limit = 20) {
    return await this.db.search_history.aggregate([
      { $match: { user_id: userId } },
      {
        $group: {
          _id: "$query",
          search_count: { $sum: 1 },
          last_searched: { $max: "$timestamp" },
          avg_results: { $avg: "$result_count" }
        }
      },
      { $sort: { "search_count": -1, "last_searched": -1 } },
      { $limit: limit },
      {
        $project: {
          query: "$_id",
          search_count: 1,
          last_searched: 1,
          suggestion_score: {
            $add: [
              { $multiply: [{ $ln: "$search_count" }, 0.3] },
              { $divide: [{ $subtract: ["$$NOW", "$last_searched"] }, 86400000] } // Days since last search
            ]
          }
        }
      }
    ]).toArray();
  }
}

SQL-Style Text Search with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB text search operations:

-- Basic full-text search with SQL syntax
SELECT 
    product_id,
    name,
    description,
    brand,
    price,
    category,
    TEXTRANK() as relevance_score
FROM products
WHERE TEXTSEARCH('wireless bluetooth headphones')
  AND inventory.availability = 'in_stock'
ORDER BY relevance_score DESC
LIMIT 20;

-- Advanced text search with boosting and filtering
SELECT 
    name,
    brand,
    price,
    category,
    ratings.average_rating,
    TEXTRANK() * 
    CASE 
        WHEN ratings.average_rating >= 4.5 THEN 1.3
        WHEN ratings.average_rating >= 4.0 THEN 1.1
        ELSE 1.0
    END as boosted_score
FROM products
WHERE TEXTSEARCH('gaming laptop RTX', language='english')
  AND price BETWEEN 800 AND 3000
  AND ratings.average_rating >= 4.0
ORDER BY boosted_score DESC, price ASC;

-- Multi-field text search with field-specific weighting
SELECT 
    product_id,
    name,
    brand,
    description,
    category,
    price,
    TEXTRANK() as base_score,

    -- Calculate weighted scores for different fields
    CASE 
        WHEN name LIKE '%wireless%' THEN TEXTRANK() * 2.0
        WHEN brand LIKE '%tech%' THEN TEXTRANK() * 1.5
        WHEN category LIKE '%electronics%' THEN TEXTRANK() * 1.2
        ELSE TEXTRANK()
    END as weighted_score

FROM products
WHERE TEXTSEARCH('wireless technology premium')
ORDER BY weighted_score DESC;

-- Faceted search with text queries
WITH text_search_base AS (
    SELECT *,
           TEXTRANK() as relevance_score
    FROM products
    WHERE TEXTSEARCH('smartphone android camera')
      AND relevance_score >= 0.5
),

price_facets AS (
    SELECT 
        CASE 
            WHEN price < 200 THEN 'Under $200'
            WHEN price < 500 THEN '$200-$499'
            WHEN price < 800 THEN '$500-$799'
            WHEN price < 1200 THEN '$800-$1199'
            ELSE '$1200+'
        END as price_range,
        COUNT(*) as product_count,
        AVG(relevance_score) as avg_relevance
    FROM text_search_base
    GROUP BY 
        CASE 
            WHEN price < 200 THEN 'Under $200'
            WHEN price < 500 THEN '$200-$499'
            WHEN price < 500 THEN '$500-$799'
            WHEN price < 1200 THEN '$800-$1199'
            ELSE '$1200+'
        END
),

brand_facets AS (
    SELECT 
        brand,
        COUNT(*) as product_count,
        AVG(relevance_score) as avg_relevance,
        AVG(ratings.average_rating) as avg_rating
    FROM text_search_base
    GROUP BY brand
    ORDER BY avg_relevance DESC, product_count DESC
    LIMIT 10
)

SELECT 
    -- Main search results
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'product_id', product_id,
        'name', name,
        'brand', brand,
        'price', price,
        'relevance_score', ROUND(relevance_score, 2)
    )) 
    FROM (
        SELECT product_id, name, brand, price, relevance_score
        FROM text_search_base
        ORDER BY relevance_score DESC
        LIMIT 50
    ) results) as search_results,

    -- Price facets
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'range', price_range,
        'count', product_count,
        'avg_relevance', ROUND(avg_relevance, 2)
    ))
    FROM price_facets
    ORDER BY avg_relevance DESC) as price_facets,

    -- Brand facets
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'brand', brand,
        'count', product_count,
        'avg_relevance', ROUND(avg_relevance, 2),
        'avg_rating', ROUND(avg_rating, 1)
    ))
    FROM brand_facets) as brand_facets;

-- Auto-complete suggestions with SQL
SELECT DISTINCT
    name as suggestion,
    'product_name' as suggestion_type,
    TEXTRANK() as relevance_score
FROM products
WHERE TEXTSEARCH('wirele')  -- Partial query
   OR name ILIKE 'wirele%'   -- Prefix matching
ORDER BY relevance_score DESC
LIMIT 10

UNION ALL

SELECT DISTINCT
    brand as suggestion,
    'brand' as suggestion_type,
    2.0 as relevance_score  -- Fixed high score for brand matches
FROM products 
WHERE brand ILIKE 'wirele%'
LIMIT 5

UNION ALL

SELECT DISTINCT
    category as suggestion,
    'category' as suggestion_type,
    1.5 as relevance_score
FROM products
WHERE category ILIKE 'wirele%'
LIMIT 5

ORDER BY relevance_score DESC, suggestion ASC;

-- Search analytics and performance monitoring
WITH search_performance AS (
    SELECT 
        TEXTSEARCH('wireless headphones audio') as has_text_match,
        name,
        brand, 
        category,
        price,
        ratings.average_rating,
        TEXTRANK() as relevance_score,
        inventory.availability
    FROM products
    WHERE has_text_match
),

search_metrics AS (
    SELECT 
        COUNT(*) as total_results,
        AVG(relevance_score) as avg_relevance_score,
        COUNT(CASE WHEN relevance_score >= 1.5 THEN 1 END) as high_relevance_results,
        COUNT(CASE WHEN relevance_score >= 1.0 THEN 1 END) as medium_relevance_results,
        COUNT(CASE WHEN relevance_score < 1.0 THEN 1 END) as low_relevance_results,

        -- Price distribution in results
        AVG(price) as avg_price,
        MIN(price) as min_price,
        MAX(price) as max_price,

        -- Rating distribution
        AVG(ratings.average_rating) as avg_rating,
        COUNT(CASE WHEN ratings.average_rating >= 4.0 THEN 1 END) as high_rated_results,

        -- Availability distribution  
        COUNT(CASE WHEN inventory.availability = 'in_stock' THEN 1 END) as available_results
    FROM search_performance
)

SELECT 
    -- Search quality metrics
    'Search Performance Report' as report_type,
    CURRENT_TIMESTAMP as generated_at,
    total_results,
    ROUND(avg_relevance_score, 3) as avg_relevance,

    -- Relevance distribution
    JSON_BUILD_OBJECT(
        'high_relevance', high_relevance_results,
        'medium_relevance', medium_relevance_results, 
        'low_relevance', low_relevance_results,
        'high_relevance_percent', ROUND((high_relevance_results::FLOAT / total_results * 100), 1)
    ) as relevance_distribution,

    -- Price insights
    JSON_BUILD_OBJECT(
        'avg_price', ROUND(avg_price, 2),
        'price_range', CONCAT('$', min_price, ' - $', max_price)
    ) as price_insights,

    -- Quality indicators
    JSON_BUILD_OBJECT(
        'avg_rating', ROUND(avg_rating, 2),
        'high_rated_percent', ROUND((high_rated_results::FLOAT / total_results * 100), 1),
        'availability_percent', ROUND((available_results::FLOAT / total_results * 100), 1)
    ) as quality_metrics,

    -- Search recommendations
    CASE 
        WHEN avg_relevance_score < 1.0 THEN 'Consider expanding search terms or adjusting index weights'
        WHEN high_relevance_results < 5 THEN 'Results may benefit from relevance boost tuning'
        WHEN available_results::FLOAT / total_results < 0.7 THEN 'High proportion of unavailable items in results'
        ELSE 'Search performance within acceptable ranges'
    END as recommendations

FROM search_metrics;

-- QueryLeaf provides comprehensive MongoDB text search capabilities:
-- 1. TEXTSEARCH() function for full-text search with natural language processing
-- 2. TEXTRANK() function for relevance scoring and ranking
-- 3. Multi-language support with language parameter specification
-- 4. Advanced text search operators integrated with SQL WHERE clauses
-- 5. Fuzzy matching and auto-complete functionality through SQL pattern matching
-- 6. Faceted search capabilities using standard SQL aggregation functions
-- 7. Search analytics and performance monitoring with familiar SQL reporting
-- 8. Integration with business logic through SQL CASE statements and functions

Production Text Search Optimization

Index Strategy and Performance Tuning

// Comprehensive text search optimization strategy
class ProductionTextSearchOptimizer {
  constructor(db) {
    this.db = db;
    this.indexStats = new Map();
    this.searchMetrics = new Map();
  }

  async optimizeTextIndexes() {
    console.log('Optimizing text search indexes for production workload...');

    // Analysis current text search performance
    const currentIndexes = await this.analyzeCurrentIndexes();
    const queryPatterns = await this.analyzeSearchQueries();
    const fieldUsage = await this.analyzeFieldUsage();

    // Design optimal text index configuration
    const optimizedConfig = this.designOptimalIndexes(currentIndexes, queryPatterns, fieldUsage);

    // Implement new indexes with minimal downtime
    await this.implementOptimizedIndexes(optimizedConfig);

    return {
      optimization_summary: optimizedConfig,
      performance_improvements: await this.measurePerformanceImprovements(),
      recommendations: this.generateOptimizationRecommendations()
    };
  }

  async analyzeCurrentIndexes() {
    const indexStats = await this.db.products.aggregate([
      { $indexStats: {} },
      { 
        $match: { 
          "name": { $regex: /text/i }
        }
      },
      {
        $project: {
          index_name: "$name",
          usage_stats: {
            ops: "$accesses.ops",
            since: "$accesses.since"
          },
          index_size: "$host"  
        }
      }
    ]).toArray();

    return indexStats;
  }

  async analyzeSearchQueries() {
    // Analyze recent search patterns from application logs
    const searchPatterns = await this.db.search_logs.aggregate([
      {
        $match: {
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 3600 * 1000) } // Last 7 days
        }
      },
      {
        $group: {
          _id: "$query_type",
          query_count: { $sum: 1 },
          avg_response_time: { $avg: "$response_time_ms" },
          unique_queries: { $addToSet: "$search_terms" },
          popular_terms: { 
            $push: {
              terms: "$search_terms",
              result_count: "$result_count",
              click_through_rate: "$ctr"
            }
          }
        }
      },
      {
        $project: {
          query_type: "$_id",
          query_count: 1,
          avg_response_time: { $round: ["$avg_response_time", 2] },
          unique_query_count: { $size: "$unique_queries" },
          performance_score: {
            $multiply: [
              { $divide: [1000, "$avg_response_time"] }, // Speed factor
              { $ln: "$query_count" } // Volume factor
            ]
          }
        }
      }
    ]).toArray();

    return searchPatterns;
  }

  async createOptimizedTextIndex() {
    // Drop existing text indexes if they exist
    try {
      await this.db.products.dropIndex("comprehensive_product_search");
    } catch (error) {
      // Index may not exist, continue
    }

    // Create highly optimized text index
    const indexResult = await this.db.products.createIndex({
      // Primary searchable fields with optimized weights
      "name": "text",
      "brand": "text", 
      "category": "text",
      "subcategory": "text",
      "description": "text",
      "tags": "text",
      "search_keywords": "text",

      // Secondary searchable fields
      "specifications.features": "text",
      "specifications.technical.type": "text"
    }, {
      name: "optimized_product_search_v2",
      default_language: "english",
      language_override: "language",

      // Carefully tuned weights based on analysis
      weights: {
        "name": 15,                          // Highest priority - exact product names
        "brand": 12,                         // High priority - brand recognition
        "category": 10,                      // High priority - categorical searches  
        "subcategory": 8,                    // Medium-high priority
        "tags": 6,                           // Medium priority - user-generated tags
        "search_keywords": 6,                // Medium priority - SEO terms
        "description": 4,                    // Lower priority - detailed descriptions
        "specifications.features": 3,         // Low priority - technical features
        "specifications.technical.type": 2   // Lowest priority - technical specs
      },

      // Performance optimization
      textIndexVersion: 3,
      partialFilterExpression: {
        "inventory.availability": { $ne: "discontinued" },
        "active": true
      }
    });

    console.log('Optimized text index created:', indexResult);
    return indexResult;
  }

  async implementSearchPerformanceMonitoring() {
    // Create collection for search performance metrics
    await this.db.createCollection("search_performance_metrics");

    // Create TTL index for automatic cleanup of old metrics
    await this.db.search_performance_metrics.createIndex(
      { "timestamp": 1 },
      { expireAfterSeconds: 30 * 24 * 3600 } // 30 days
    );

    return {
      monitoring_enabled: true,
      retention_period: "30 days",
      metrics_collection: "search_performance_metrics"
    };
  }

  async measureSearchPerformance(query, options = {}) {
    const startTime = Date.now();

    try {
      // Execute the search with explain plan
      const searchResults = await this.db.products.find(
        { $text: { $search: query } },
        { score: { $meta: "textScore" } }
      )
      .sort({ score: { $meta: "textScore" } })
      .limit(options.limit || 20)
      .explain("executionStats");

      const endTime = Date.now();
      const executionTime = endTime - startTime;

      // Record performance metrics
      const performanceMetrics = {
        query: query,
        execution_time_ms: executionTime,
        documents_examined: searchResults.executionStats.docsExamined,
        documents_returned: searchResults.executionStats.docsReturned,
        index_hits: searchResults.executionStats.indexesUsed?.length || 0,
        winning_plan: searchResults.queryPlanner.winningPlan.stage,
        timestamp: new Date(),

        // Performance classification
        performance_rating: this.classifyPerformance(executionTime, searchResults.executionStats),

        // Optimization recommendations
        optimization_suggestions: this.generatePerformanceRecommendations(searchResults)
      };

      // Store metrics for analysis
      await this.db.search_performance_metrics.insertOne(performanceMetrics);

      return performanceMetrics;

    } catch (error) {
      console.error('Search performance measurement failed:', error);
      return {
        query: query,
        error: error.message,
        timestamp: new Date()
      };
    }
  }

  classifyPerformance(executionTime, executionStats) {
    // Performance rating based on response time and efficiency
    if (executionTime < 50 && executionStats.docsExamined < executionStats.docsReturned * 2) {
      return "excellent";
    } else if (executionTime < 150 && executionStats.docsExamined < executionStats.docsReturned * 5) {
      return "good";
    } else if (executionTime < 500) {
      return "acceptable";
    } else {
      return "poor";
    }
  }

  generatePerformanceRecommendations(explainResult) {
    const recommendations = [];

    if (explainResult.executionStats.docsExamined > explainResult.executionStats.docsReturned * 10) {
      recommendations.push("High document examination ratio - consider more selective index or query optimization");
    }

    if (explainResult.executionStats.totalKeysExamined > explainResult.executionStats.docsReturned * 5) {
      recommendations.push("High index key examination - consider compound index optimization");
    }

    if (!explainResult.queryPlanner.indexFilterSet) {
      recommendations.push("Query not using optimal index filtering - consider index hint or query restructuring");
    }

    return recommendations;
  }
}

Best Practices for Production Text Search

Scaling and Performance Optimization

Index Design: Create targeted text indexes with appropriate field weights and language settings
Query Optimization: Use compound indexes combining text search with frequently filtered fields
Performance Monitoring: Implement comprehensive metrics collection for search query analysis
Caching Strategy: Cache frequently searched terms and results to reduce database load
Load Balancing: Distribute text search queries across multiple database nodes

Search Quality and User Experience

Relevance Tuning: Continuously adjust text index weights based on user interaction data
Auto-complete: Implement intelligent suggestion systems with typo tolerance
Faceted Search: Provide multiple filtering dimensions to help users refine search results
Search Analytics: Track search patterns, click-through rates, and conversion metrics
Multi-language Support: Handle international search requirements with appropriate language processing

Conclusion

MongoDB's full-text search capabilities provide enterprise-grade text search functionality that integrates seamlessly with document-based data models. The combination of sophisticated text indexing, natural language processing, and flexible scoring enables building powerful search experiences without external search infrastructure.

Key MongoDB text search advantages include:

Native Integration: Built-in text search eliminates need for separate search servers
Advanced Linguistics: Automatic stemming, stop words, and language-specific processing
Flexible Scoring: Customizable relevance scoring with business logic integration
Performance Optimization: Specialized text indexes optimized for search workloads
SQL Accessibility: Familiar text search operations through QueryLeaf's SQL interface
Comprehensive Analytics: Built-in search performance monitoring and optimization tools

Whether you're building e-commerce platforms, content management systems, knowledge bases, or document repositories, MongoDB's text search capabilities with QueryLeaf's SQL interface provide the foundation for delivering sophisticated search experiences that scale with your application requirements.

QueryLeaf Integration: QueryLeaf automatically translates SQL text search operations into MongoDB's native full-text search queries, making advanced text search accessible through familiar SQL patterns. Complex search scenarios including relevance scoring, faceted search, and auto-complete functionality are seamlessly handled through standard SQL syntax, enabling developers to build powerful search features without learning MongoDB's text search specifics.

The combination of MongoDB's powerful text search engine with SQL-familiar query patterns creates an ideal platform for applications requiring both sophisticated search capabilities and familiar database interaction patterns, ensuring your search functionality can evolve and scale efficiently as your data and user requirements grow.