Skip to content

MongoDB Atlas Vector Search for AI and Embedding Similarity: Building Intelligent Search Applications with SQL-Compatible Vector Operations

Modern AI applications require sophisticated search capabilities that go beyond traditional keyword matching to understand semantic meaning, user intent, and content similarity. Traditional database search approaches struggle with high-dimensional vector data, semantic relationships, and the complex similarity calculations required for recommendation systems, content discovery, and AI-powered features.

MongoDB Atlas Vector Search provides native vector database capabilities that enable efficient storage, indexing, and querying of high-dimensional embeddings generated by machine learning models. Unlike traditional search engines that require separate vector databases and complex data synchronization, Atlas Vector Search integrates seamlessly with your existing MongoDB data while delivering enterprise-grade performance for AI applications.

The Traditional Vector Search Challenge

Implementing vector similarity search with conventional approaches creates significant architectural complexity and performance challenges:

-- Traditional PostgreSQL vector search - complex and limited

-- Attempting vector similarity with PostgreSQL extensions
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Product embeddings table with limited vector support
CREATE TABLE product_embeddings (
    product_id BIGINT PRIMARY KEY,
    product_name VARCHAR(500) NOT NULL,
    description TEXT,
    category VARCHAR(100),
    price DECIMAL(10,2),

    -- Vector embeddings (limited to 2000 dimensions in pg_vector)
    title_embedding vector(384),        -- Limited dimensionality
    description_embedding vector(768),  -- Separate embeddings
    image_embedding vector(512),

    -- Traditional text search fallbacks
    search_vector tsvector,
    keywords TEXT[],

    -- Metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Manual similarity caching (performance workaround)
    similar_products JSONB,
    similarity_last_computed TIMESTAMP
);

-- Create vector indexes (limited optimization)
CREATE INDEX idx_title_embedding ON product_embeddings 
USING ivfflat (title_embedding vector_cosine_ops)
WITH (lists = 100);  -- Fixed index parameters

CREATE INDEX idx_description_embedding ON product_embeddings 
USING ivfflat (description_embedding vector_cosine_ops)
WITH (lists = 100);

-- Traditional text search as fallback
CREATE INDEX idx_search_vector ON product_embeddings 
USING GIN (search_vector);

-- Complex similarity search query with poor performance
WITH query_vector AS (
    SELECT vector('[0.1, 0.2, 0.3, ...]'::vector) as embedding
),
similarity_scores AS (
    SELECT 
        pe.product_id,
        pe.product_name,
        pe.description,
        pe.category,
        pe.price,

        -- Expensive similarity calculations
        1 - (pe.title_embedding <=> qv.embedding) as title_similarity,
        1 - (pe.description_embedding <=> qv.embedding) as desc_similarity,
        1 - (pe.image_embedding <=> qv.embedding) as image_similarity,

        -- Combined scoring (manual implementation)
        (
            (1 - (pe.title_embedding <=> qv.embedding)) * 0.4 +
            (1 - (pe.description_embedding <=> qv.embedding)) * 0.4 +
            (1 - (pe.image_embedding <=> qv.embedding)) * 0.2
        ) as combined_similarity_score,

        -- Traditional text relevance as fallback
        ts_rank_cd(pe.search_vector, plainto_tsquery('search query')) as text_relevance,

        -- Distance calculations for debugging
        pe.title_embedding <=> qv.embedding as title_distance,
        pe.description_embedding <=> qv.embedding as desc_distance

    FROM product_embeddings pe
    CROSS JOIN query_vector qv
    WHERE 
        -- Pre-filtering to reduce computation (limited effectiveness)
        pe.category IN ('electronics', 'clothing', 'books')
        AND pe.price BETWEEN 10 AND 1000
        AND pe.updated_at >= CURRENT_DATE - INTERVAL '1 year'
),

ranked_results AS (
    SELECT *,
        -- Manual ranking logic
        ROW_NUMBER() OVER (ORDER BY combined_similarity_score DESC) as similarity_rank,
        ROW_NUMBER() OVER (ORDER BY text_relevance DESC) as text_rank,

        -- Hybrid scoring attempt
        (combined_similarity_score * 0.7 + text_relevance * 0.3) as hybrid_score

    FROM similarity_scores
    WHERE combined_similarity_score > 0.6  -- Arbitrary threshold
)

SELECT 
    product_id,
    product_name,
    category,
    price,

    -- Similarity metrics
    ROUND(combined_similarity_score::NUMERIC, 4) as similarity_score,
    ROUND(title_similarity::NUMERIC, 4) as title_sim,
    ROUND(desc_similarity::NUMERIC, 4) as desc_sim,
    ROUND(image_similarity::NUMERIC, 4) as image_sim,

    -- Ranking information
    similarity_rank,
    hybrid_score,

    -- Performance debugging
    title_distance as debug_title_dist,
    desc_distance as debug_desc_dist

FROM ranked_results
ORDER BY hybrid_score DESC
LIMIT 20;

-- Problems with traditional vector search approaches:
-- 1. Limited vector dimensionality and poor performance scaling
-- 2. Complex manual similarity calculations and scoring logic
-- 3. No native support for advanced similarity algorithms
-- 4. Poor integration with existing application data
-- 5. Manual index optimization and maintenance
-- 6. Limited filtering capabilities during vector search
-- 7. No built-in support for multiple embedding models
-- 8. Complex hybrid search implementation
-- 9. Poor performance with large vector datasets
-- 10. Limited support for real-time embedding updates

-- Attempt at recommendation system (extremely inefficient)
WITH user_preferences AS (
    SELECT 
        user_id,
        -- Compute average embedding from user's purchase history
        AVG(pe.title_embedding::vector) as preference_embedding,
        COUNT(*) as purchase_count,
        ARRAY_AGG(DISTINCT pe.category) as preferred_categories
    FROM user_purchases up
    JOIN product_embeddings pe ON pe.product_id = up.product_id
    WHERE up.purchase_date >= CURRENT_DATE - INTERVAL '6 months'
    GROUP BY user_id
    HAVING COUNT(*) >= 5  -- Minimum purchase history
),

candidate_products AS (
    SELECT DISTINCT
        pe.product_id,
        pe.product_name,
        pe.category,
        pe.price,
        pe.title_embedding,
        pe.description_embedding
    FROM product_embeddings pe
    WHERE pe.product_id NOT IN (
        -- Exclude already purchased products
        SELECT product_id 
        FROM user_purchases 
        WHERE user_id = $1 
        AND purchase_date >= CURRENT_DATE - INTERVAL '3 months'
    )
),

recommendations AS (
    SELECT 
        up.user_id,
        cp.product_id,
        cp.product_name,
        cp.category,
        cp.price,

        -- Expensive similarity calculation for each user-product pair
        1 - (up.preference_embedding <=> cp.title_embedding) as title_preference_sim,
        1 - (up.preference_embedding <=> cp.description_embedding) as desc_preference_sim,

        -- Category matching bonus
        CASE 
            WHEN cp.category = ANY(up.preferred_categories) THEN 0.2
            ELSE 0.0
        END as category_bonus,

        -- Purchase history influence
        up.purchase_count,

        -- Combined recommendation score
        (
            (1 - (up.preference_embedding <=> cp.title_embedding)) * 0.5 +
            (1 - (up.preference_embedding <=> cp.description_embedding)) * 0.3 +
            CASE WHEN cp.category = ANY(up.preferred_categories) THEN 0.2 ELSE 0.0 END
        ) as recommendation_score

    FROM user_preferences up
    CROSS JOIN candidate_products cp
    WHERE cp.category = ANY(up.preferred_categories)  -- Basic filtering
)

SELECT 
    user_id,
    product_id,
    product_name,
    category,
    price,
    ROUND(recommendation_score::NUMERIC, 4) as score,
    ROUND(title_preference_sim::NUMERIC, 4) as title_sim,
    ROUND(desc_preference_sim::NUMERIC, 4) as desc_sim
FROM recommendations
WHERE recommendation_score > 0.5
ORDER BY user_id, recommendation_score DESC;

-- This approach is extremely slow and doesn't scale beyond small datasets
-- Vector operations are not optimized for recommendation workloads
-- Manual preference modeling lacks sophistication
-- No support for real-time recommendation updates
-- Limited ability to incorporate multiple signals and features

MongoDB Atlas Vector Search provides comprehensive vector database capabilities with enterprise performance:

// MongoDB Atlas Vector Search - advanced AI-powered search capabilities
const { MongoClient } = require('mongodb');

class AtlasVectorSearchManager {
  constructor() {
    this.client = null;
    this.db = null;
    this.searchIndexes = new Map();
    this.embeddingModels = new Map();
    this.searchPerformanceMetrics = new Map();
  }

  async initialize() {
    console.log('Initializing MongoDB Atlas Vector Search Manager...');

    // Connect to Atlas with vector search optimization
    this.client = new MongoClient(process.env.MONGODB_ATLAS_URI, {
      // Optimized connection settings for vector operations
      maxPoolSize: 20,
      minPoolSize: 5,
      maxIdleTimeMS: 30000,

      // Read preference for vector search workloads
      readPreference: 'primary',
      readConcern: { level: 'local' },

      // Compression for large vector payloads
      compression: ['zlib', 'snappy'],

      appName: 'VectorSearchApplication'
    });

    await this.client.connect();
    this.db = this.client.db('ai_application');

    // Initialize vector search indexes and models
    await this.setupVectorSearchIndexes();
    await this.initializeEmbeddingModels();

    console.log('✅ Atlas Vector Search Manager initialized');
  }

  async setupVectorSearchIndexes() {
    console.log('Setting up Atlas Vector Search indexes...');

    const productsCollection = this.db.collection('products');

    // Create comprehensive vector search index
    const productVectorIndex = {
      name: 'products_vector_search',
      type: 'vectorSearch',
      definition: {
        // Multi-field vector search configuration
        fields: [
          {
            // Primary product embedding for semantic search
            type: 'vector',
            path: 'embeddings.combined',
            numDimensions: 1536,          // OpenAI ada-002 dimensions
            similarity: 'cosine'           // Cosine similarity for semantic search
          },
          {
            // Title-specific embedding for title-focused search
            type: 'vector', 
            path: 'embeddings.title',
            numDimensions: 384,            // Sentence transformers dimensions
            similarity: 'euclidean'
          },
          {
            // Description embedding for content-based search
            type: 'vector',
            path: 'embeddings.description', 
            numDimensions: 768,            // BERT-based embeddings
            similarity: 'dotProduct'
          },
          {
            // Visual embedding for image similarity
            type: 'vector',
            path: 'embeddings.image',
            numDimensions: 512,            // Vision transformer embeddings
            similarity: 'cosine'
          },

          // Filterable fields for hybrid search
          {
            type: 'filter',
            path: 'category'
          },
          {
            type: 'filter', 
            path: 'brand'
          },
          {
            type: 'filter',
            path: 'pricing.basePrice'
          },
          {
            type: 'filter',
            path: 'availability.isActive'
          },
          {
            type: 'filter',
            path: 'ratings.averageRating'
          },
          {
            type: 'filter',
            path: 'metadata.tags'
          }
        ]
      }
    };

    // Create user preferences vector index
    const userPreferencesIndex = {
      name: 'user_preferences_vector_search',
      type: 'vectorSearch', 
      definition: {
        fields: [
          {
            // User preference embedding for personalization
            type: 'vector',
            path: 'preferences.embedding',
            numDimensions: 1536,
            similarity: 'cosine'
          },
          {
            // Session-based embedding for short-term preferences
            type: 'vector',
            path: 'preferences.sessionEmbedding', 
            numDimensions: 384,
            similarity: 'cosine'
          },

          // User demographic and behavioral filters
          {
            type: 'filter',
            path: 'demographics.ageRange'
          },
          {
            type: 'filter',
            path: 'demographics.location'
          },
          {
            type: 'filter', 
            path: 'behavior.purchaseFrequency'
          },
          {
            type: 'filter',
            path: 'preferences.categories'
          }
        ]
      }
    };

    // Store index configurations
    this.searchIndexes.set('products', productVectorIndex);
    this.searchIndexes.set('user_preferences', userPreferencesIndex);

    console.log('✅ Vector search indexes configured');
  }

  async initializeEmbeddingModels() {
    console.log('Initializing embedding models...');

    // Configure different embedding models for different use cases
    const embeddingConfigs = {
      'openai-ada-002': {
        provider: 'openai',
        model: 'text-embedding-ada-002',
        dimensions: 1536,
        maxTokens: 8192,
        useCase: 'general_semantic_search',
        costPerToken: 0.0001
      },

      'sentence-transformers': {
        provider: 'huggingface',
        model: 'all-MiniLM-L6-v2', 
        dimensions: 384,
        maxTokens: 256,
        useCase: 'title_and_short_text',
        costPerToken: 0.0  // Free local model
      },

      'cohere-embed-v3': {
        provider: 'cohere',
        model: 'embed-english-v3.0',
        dimensions: 1024,
        maxTokens: 512,
        useCase: 'multilingual_content',
        costPerToken: 0.0001
      },

      'vision-transformer': {
        provider: 'openai',
        model: 'clip-vit-base-patch32',
        dimensions: 512,
        useCase: 'image_similarity',
        costPerToken: 0.0002
      }
    };

    for (const [modelName, config] of Object.entries(embeddingConfigs)) {
      this.embeddingModels.set(modelName, config);
    }

    console.log('✅ Embedding models initialized');
  }

  async performSemanticProductSearch(queryText, options = {}) {
    console.log(`Performing semantic product search: "${queryText}"`);

    const startTime = Date.now();

    // Generate query embedding using configured model
    const queryEmbedding = await this.generateEmbedding(
      queryText, 
      options.embeddingModel || 'openai-ada-002'
    );

    const productsCollection = this.db.collection('products');

    // Construct vector search pipeline with advanced filtering
    const searchPipeline = [
      {
        $vectorSearch: {
          index: 'products_vector_search',
          path: 'embeddings.combined',
          queryVector: queryEmbedding,
          numCandidates: options.numCandidates || 1000,
          limit: options.limit || 20,

          // Advanced filtering during vector search
          filter: {
            $and: [
              { 'availability.isActive': true },
              ...(options.categories ? [{ category: { $in: options.categories } }] : []),
              ...(options.priceRange ? [{ 
                'pricing.basePrice': { 
                  $gte: options.priceRange.min, 
                  $lte: options.priceRange.max 
                }
              }] : []),
              ...(options.minRating ? [{ 
                'ratings.averageRating': { $gte: options.minRating }
              }] : []),
              ...(options.brands ? [{ brand: { $in: options.brands } }] : [])
            ]
          }
        }
      },

      // Add similarity score and additional fields
      {
        $addFields: {
          vectorSearchScore: { $meta: 'vectorSearchScore' },

          // Calculate additional similarity metrics
          titleRelevance: {
            $function: {
              body: function(title, query) {
                // Custom relevance scoring function
                const titleLower = title.toLowerCase();
                const queryLower = query.toLowerCase();
                const words = queryLower.split(/\s+/);
                let score = 0;

                words.forEach(word => {
                  if (titleLower.includes(word)) {
                    score += word.length / title.length;
                  }
                });

                return Math.min(score, 1.0);
              },
              args: ['$name', queryText],
              lang: 'js'
            }
          },

          // Boost scoring based on business rules
          businessBoost: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$featured', true] },
                  then: 0.2  // Featured products get boost
                },
                {
                  case: { $gte: ['$inventory.stockQuantity', 100] },
                  then: 0.1  // Well-stocked items get boost
                },
                {
                  case: { $gte: ['$ratings.averageRating', 4.5] },
                  then: 0.15 // Highly rated products get boost
                }
              ],
              default: 0.0
            }
          }
        }
      },

      // Calculate final relevance score
      {
        $addFields: {
          finalRelevanceScore: {
            $add: [
              { $multiply: ['$vectorSearchScore', 0.7] },  // Vector similarity weight
              { $multiply: ['$titleRelevance', 0.2] },     // Title relevance weight
              '$businessBoost'                             // Business rule boost
            ]
          }
        }
      },

      // Re-sort by final relevance score
      {
        $sort: { finalRelevanceScore: -1 }
      },

      // Project final result structure
      {
        $project: {
          productId: '$_id',
          name: 1,
          description: 1,
          category: 1,
          brand: 1,
          pricing: 1,
          ratings: 1,
          images: 1,
          availability: 1,

          // Search relevance metrics
          relevance: {
            vectorScore: '$vectorSearchScore',
            titleRelevance: '$titleRelevance', 
            businessBoost: '$businessBoost',
            finalScore: '$finalRelevanceScore'
          },

          // Additional context
          searchContext: {
            query: queryText,
            embeddingModel: options.embeddingModel || 'openai-ada-002',
            searchTimestamp: new Date()
          }
        }
      }
    ];

    // Execute search with performance tracking
    const searchResults = await productsCollection.aggregate(searchPipeline).toArray();

    const searchDuration = Date.now() - startTime;

    // Track search performance metrics
    await this.trackSearchMetrics({
      queryText: queryText,
      resultsCount: searchResults.length,
      searchDurationMs: searchDuration,
      embeddingModel: options.embeddingModel || 'openai-ada-002',
      filters: options
    });

    console.log(`✅ Semantic search completed: ${searchResults.length} results in ${searchDuration}ms`);

    return {
      results: searchResults,
      metadata: {
        query: queryText,
        totalResults: searchResults.length,
        searchDurationMs: searchDuration,
        embeddingModel: options.embeddingModel || 'openai-ada-002',
        searchTimestamp: new Date()
      }
    };
  }

  async generatePersonalizedRecommendations(userId, options = {}) {
    console.log(`Generating personalized recommendations for user: ${userId}`);

    const startTime = Date.now();

    // Get user preference embedding
    const userProfile = await this.getUserPreferenceEmbedding(userId);

    if (!userProfile || !userProfile.preferences?.embedding) {
      console.log('No user preference data available, falling back to popularity-based recommendations');
      return await this.getPopularityBasedRecommendations(options);
    }

    const productsCollection = this.db.collection('products');

    // Generate recommendations using vector similarity
    const recommendationPipeline = [
      {
        $vectorSearch: {
          index: 'products_vector_search',
          path: 'embeddings.combined',
          queryVector: userProfile.preferences.embedding,
          numCandidates: options.numCandidates || 2000,
          limit: options.limit || 50,

          // Exclude previously purchased/viewed products
          filter: {
            $and: [
              { 'availability.isActive': true },
              { '_id': { $not: { $in: userProfile.excludeProductIds || [] } } },
              ...(options.categories ? [{ category: { $in: options.categories } }] : []),
              ...(userProfile.preferences?.priceRange ? [{ 
                'pricing.basePrice': { 
                  $gte: userProfile.preferences.priceRange.min,
                  $lte: userProfile.preferences.priceRange.max
                }
              }] : [])
            ]
          }
        }
      },

      // Add personalization scoring
      {
        $addFields: {
          vectorSimilarity: { $meta: 'vectorSearchScore' },

          // Category preference matching
          categoryPreferenceScore: {
            $switch: {
              branches: userProfile.preferences.categories?.map(cat => ({
                case: { $eq: ['$category', cat.name] },
                then: cat.score || 0.5
              })) || [],
              default: 0.1
            }
          },

          // Brand preference scoring
          brandPreferenceScore: {
            $cond: {
              if: { $in: ['$brand', userProfile.preferences.brands || []] },
              then: 0.3,
              else: 0.0
            }
          },

          // Price preference scoring
          pricePreferenceScore: {
            $cond: {
              if: {
                $and: [
                  { $gte: ['$pricing.basePrice', userProfile.preferences.priceRange?.min || 0] },
                  { $lte: ['$pricing.basePrice', userProfile.preferences.priceRange?.max || 999999] }
                ]
              },
              then: 0.2,
              else: 0.0
            }
          },

          // Recency bias for trending products
          recencyScore: {
            $cond: {
              if: { $gte: ['$createdAt', new Date(Date.now() - 30 * 24 * 60 * 60 * 1000)] },
              then: 0.1,
              else: 0.0
            }
          }
        }
      },

      // Calculate final recommendation score
      {
        $addFields: {
          recommendationScore: {
            $add: [
              { $multiply: ['$vectorSimilarity', 0.5] },        // Vector similarity weight
              { $multiply: ['$categoryPreferenceScore', 0.2] }, // Category preference
              '$brandPreferenceScore',                          // Brand preference
              '$pricePreferenceScore',                          // Price preference  
              '$recencyScore'                                   // Recency boost
            ]
          }
        }
      },

      // Sort by final recommendation score
      {
        $sort: { recommendationScore: -1 }
      },

      // Limit to requested number of recommendations
      {
        $limit: options.limit || 20
      },

      // Project final recommendation structure
      {
        $project: {
          productId: '$_id',
          name: 1,
          description: 1,
          category: 1,
          brand: 1,
          pricing: 1,
          ratings: 1,
          images: 1,

          // Recommendation scoring details
          recommendation: {
            score: '$recommendationScore',
            vectorSimilarity: '$vectorSimilarity',
            categoryMatch: '$categoryPreferenceScore',
            brandMatch: '$brandPreferenceScore',
            priceMatch: '$pricePreferenceScore',
            recencyBoost: '$recencyScore',
            reason: {
              $switch: {
                branches: [
                  {
                    case: { $gt: ['$categoryPreferenceScore', 0.3] },
                    then: 'Based on your interest in this category'
                  },
                  {
                    case: { $gt: ['$brandPreferenceScore', 0.2] },
                    then: 'From a brand you like'
                  },
                  {
                    case: { $gt: ['$vectorSimilarity', 0.8] },
                    then: 'Similar to products you\'ve liked'
                  }
                ],
                default: 'Recommended for you'
              }
            }
          },

          // Recommendation metadata
          recommendationContext: {
            userId: userId,
            basedOn: 'user_preferences',
            generatedAt: new Date()
          }
        }
      }
    ];

    const recommendations = await productsCollection.aggregate(recommendationPipeline).toArray();

    const generationDuration = Date.now() - startTime;

    console.log(`✅ Generated ${recommendations.length} personalized recommendations in ${generationDuration}ms`);

    return {
      recommendations: recommendations,
      userProfile: {
        userId: userId,
        preferences: userProfile.preferences,
        excludedProducts: userProfile.excludeProductIds?.length || 0
      },
      metadata: {
        generationDurationMs: generationDuration,
        totalRecommendations: recommendations.length,
        algorithm: 'vector_similarity_personalized',
        generatedAt: new Date()
      }
    };
  }

  async performHybridSearch(queryText, userId, options = {}) {
    console.log(`Performing hybrid search for query: "${queryText}", user: ${userId}`);

    // Execute both semantic search and personalized recommendations
    const [semanticResults, personalizedResults] = await Promise.all([
      this.performSemanticProductSearch(queryText, {
        ...options,
        limit: Math.ceil((options.limit || 20) * 0.7)  // 70% semantic results
      }),
      userId ? this.generatePersonalizedRecommendations(userId, {
        ...options,
        limit: Math.ceil((options.limit || 20) * 0.3)  // 30% personalized results
      }) : Promise.resolve({ recommendations: [] })
    ]);

    // Merge and re-rank results
    const hybridResults = this.mergeAndRankHybridResults(
      semanticResults.results,
      personalizedResults.recommendations || [],
      options
    );

    return {
      results: hybridResults,
      sources: {
        semantic: semanticResults.results.length,
        personalized: personalizedResults.recommendations?.length || 0
      },
      metadata: {
        query: queryText,
        userId: userId,
        algorithm: 'hybrid_semantic_personalized',
        searchTimestamp: new Date()
      }
    };
  }

  async generateEmbedding(text, modelName) {
    const model = this.embeddingModels.get(modelName);
    if (!model) {
      throw new Error(`Unknown embedding model: ${modelName}`);
    }

    // Implementation would integrate with actual embedding service
    // This is a placeholder for the actual embedding generation
    console.log(`Generating embedding with ${modelName} for text: ${text.substring(0, 50)}...`);

    // Return mock embedding vector for demonstration
    return Array.from({ length: model.dimensions }, () => Math.random() - 0.5);
  }

  async getUserPreferenceEmbedding(userId) {
    const userPreferencesCollection = this.db.collection('user_preferences');

    const userProfile = await userPreferencesCollection.findOne(
      { userId: userId },
      { 
        projection: {
          preferences: 1,
          excludeProductIds: 1,
          lastUpdated: 1
        }
      }
    );

    return userProfile;
  }

  async trackSearchMetrics(metrics) {
    const searchMetricsCollection = this.db.collection('search_metrics');

    await searchMetricsCollection.insertOne({
      ...metrics,
      timestamp: new Date()
    });

    // Update performance tracking
    if (!this.searchPerformanceMetrics.has(metrics.embeddingModel)) {
      this.searchPerformanceMetrics.set(metrics.embeddingModel, {
        totalSearches: 0,
        totalDurationMs: 0,
        avgResults: 0
      });
    }

    const modelMetrics = this.searchPerformanceMetrics.get(metrics.embeddingModel);
    modelMetrics.totalSearches++;
    modelMetrics.totalDurationMs += metrics.searchDurationMs;
    modelMetrics.avgResults = (modelMetrics.avgResults + metrics.resultsCount) / 2;
  }

  mergeAndRankHybridResults(semanticResults, personalizedResults, options) {
    // Combine results with hybrid scoring
    const combinedResults = new Map();

    // Add semantic results with base score
    semanticResults.forEach((result, index) => {
      combinedResults.set(result.productId.toString(), {
        ...result,
        hybridScore: (result.relevance?.finalScore || 0) * 0.7 + (1 - index / semanticResults.length) * 0.3,
        sources: ['semantic']
      });
    });

    // Add personalized results, boosting score if already present
    personalizedResults.forEach((result, index) => {
      const productId = result.productId.toString();
      const personalizedScore = (result.recommendation?.score || 0) * 0.6 + (1 - index / personalizedResults.length) * 0.4;

      if (combinedResults.has(productId)) {
        // Boost existing result
        const existing = combinedResults.get(productId);
        existing.hybridScore = existing.hybridScore * 0.8 + personalizedScore * 0.2;
        existing.sources.push('personalized');
        existing.personalization = result.recommendation;
      } else {
        // Add new personalized result
        combinedResults.set(productId, {
          ...result,
          hybridScore: personalizedScore * 0.8,  // Slightly lower weight for pure personalized
          sources: ['personalized'],
          relevance: { finalScore: personalizedScore }
        });
      }
    });

    // Convert to array and sort by hybrid score
    return Array.from(combinedResults.values())
      .sort((a, b) => b.hybridScore - a.hybridScore)
      .slice(0, options.limit || 20);
  }

  async getSearchAnalytics() {
    const searchMetricsCollection = this.db.collection('search_metrics');

    const analytics = await searchMetricsCollection.aggregate([
      {
        $match: {
          timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: '$embeddingModel',
          totalSearches: { $sum: 1 },
          avgDuration: { $avg: '$searchDurationMs' },
          avgResults: { $avg: '$resultsCount' },
          minDuration: { $min: '$searchDurationMs' },
          maxDuration: { $max: '$searchDurationMs' }
        }
      },
      {
        $sort: { totalSearches: -1 }
      }
    ]).toArray();

    return {
      timestamp: new Date(),
      period: '24_hours',
      models: analytics
    };
  }

  async shutdown() {
    console.log('Shutting down Atlas Vector Search Manager...');

    if (this.client) {
      await this.client.close();
      console.log('✅ MongoDB Atlas connection closed');
    }

    this.searchIndexes.clear();
    this.embeddingModels.clear();
    this.searchPerformanceMetrics.clear();
  }
}

// Export the Atlas Vector Search manager
module.exports = { AtlasVectorSearchManager };

// Benefits of MongoDB Atlas Vector Search:
// - Native vector database capabilities integrated with document data
// - High-performance vector indexing and similarity search
// - Advanced filtering during vector search operations
// - Multiple embedding model support with configurable algorithms
// - Hybrid search combining semantic and traditional approaches
// - Real-time personalization with user preference embeddings
// - Enterprise-grade scalability and performance optimization
// - Comprehensive analytics and performance monitoring
// - SQL-compatible vector operations through QueryLeaf integration
// - Zero additional infrastructure for vector search capabilities

Understanding MongoDB Atlas Vector Search Architecture

Advanced Vector Search Implementation Patterns

Implement sophisticated vector search strategies for different AI application scenarios:

// Advanced Atlas Vector Search patterns for enterprise AI applications
class EnterpriseVectorSearchOrchestrator {
  constructor() {
    this.searchStrategies = new Map();
    this.embeddingPipelines = new Map();
    this.performanceOptimizer = new Map();
    this.cacheManager = new Map();
  }

  async initializeSearchStrategies() {
    console.log('Initializing enterprise vector search strategies...');

    const strategies = {
      // E-commerce product discovery
      'product_discovery': {
        primaryEmbeddingModel: 'openai-ada-002',
        fallbackEmbeddingModel: 'sentence-transformers',

        searchConfiguration: {
          numCandidates: 2000,
          similarity: 'cosine',
          indexName: 'products_vector_search',

          scoringWeights: {
            vectorSimilarity: 0.6,
            textRelevance: 0.2,
            popularityBoost: 0.1,
            businessRules: 0.1
          },

          filterPriority: ['availability', 'category', 'priceRange', 'ratings'],
          resultDiversification: true
        },

        performanceTargets: {
          maxLatencyMs: 500,
          minResultCount: 10,
          maxResultCount: 50
        }
      },

      // Content recommendation system
      'content_recommendations': {
        primaryEmbeddingModel: 'cohere-embed-v3',
        fallbackEmbeddingModel: 'sentence-transformers',

        searchConfiguration: {
          numCandidates: 5000,
          similarity: 'dotProduct',
          indexName: 'content_vector_search',

          scoringWeights: {
            vectorSimilarity: 0.7,
            userEngagement: 0.15,
            recency: 0.1,
            contentQuality: 0.05
          },

          filterPriority: ['contentType', 'publishDate', 'userPreferences'],
          resultDiversification: false
        },

        performanceTargets: {
          maxLatencyMs: 300,
          minResultCount: 20,
          maxResultCount: 100
        }
      },

      // Customer support knowledge base
      'knowledge_search': {
        primaryEmbeddingModel: 'openai-ada-002',
        fallbackEmbeddingModel: 'sentence-transformers',

        searchConfiguration: {
          numCandidates: 1000,
          similarity: 'cosine',
          indexName: 'knowledge_vector_search',

          scoringWeights: {
            vectorSimilarity: 0.8,
            documentAuthority: 0.1,
            recency: 0.05,
            userFeedback: 0.05
          },

          filterPriority: ['category', 'difficulty', 'department'],
          resultDiversification: true
        },

        performanceTargets: {
          maxLatencyMs: 200,
          minResultCount: 5,
          maxResultCount: 15
        }
      },

      // Image similarity search
      'image_similarity': {
        primaryEmbeddingModel: 'vision-transformer',
        fallbackEmbeddingModel: null,

        searchConfiguration: {
          numCandidates: 3000,
          similarity: 'cosine',
          indexName: 'images_vector_search',

          scoringWeights: {
            vectorSimilarity: 0.9,
            imageMetadata: 0.05,
            userPreferences: 0.05
          },

          filterPriority: ['imageType', 'resolution', 'tags'],
          resultDiversification: false
        },

        performanceTargets: {
          maxLatencyMs: 800,
          minResultCount: 10,
          maxResultCount: 30
        }
      }
    };

    for (const [strategyName, strategy] of Object.entries(strategies)) {
      this.searchStrategies.set(strategyName, strategy);
    }

    console.log('✅ Enterprise search strategies initialized');
  }

  async executeVectorSearchStrategy(strategyName, query, context = {}) {
    const strategy = this.searchStrategies.get(strategyName);
    if (!strategy) {
      throw new Error(`Unknown search strategy: ${strategyName}`);
    }

    console.log(`Executing ${strategyName} strategy for query: "${query}"`);

    const searchContext = {
      strategy: strategyName,
      query: query,
      userId: context.userId,
      sessionId: context.sessionId,
      startTime: Date.now(),
      filters: context.filters || {},
      options: context.options || {}
    };

    try {
      // Generate embeddings using primary model
      let queryEmbedding;
      try {
        queryEmbedding = await this.generateEmbedding(
          query,
          strategy.primaryEmbeddingModel
        );
      } catch (primaryError) {
        console.warn(`Primary embedding model failed, using fallback:`, primaryError.message);

        if (strategy.fallbackEmbeddingModel) {
          queryEmbedding = await this.generateEmbedding(
            query,
            strategy.fallbackEmbeddingModel
          );
        } else {
          throw primaryError;
        }
      }

      // Execute vector search with strategy-specific configuration
      const searchResults = await this.performOptimizedVectorSearch(
        queryEmbedding,
        strategy,
        searchContext
      );

      // Apply strategy-specific post-processing
      const processedResults = await this.applySearchPostProcessing(
        searchResults,
        strategy,
        searchContext
      );

      const searchDuration = Date.now() - searchContext.startTime;

      console.log(`✅ ${strategyName} completed: ${processedResults.length} results in ${searchDuration}ms`);

      return {
        results: processedResults,
        strategy: strategyName,
        metadata: {
          ...searchContext,
          searchDurationMs: searchDuration,
          resultCount: processedResults.length,
          embeddingModel: strategy.primaryEmbeddingModel
        }
      };

    } catch (error) {
      console.error(`Search strategy ${strategyName} failed:`, error);
      return {
        results: [],
        strategy: strategyName,
        error: error.message,
        metadata: searchContext
      };
    }
  }

  async performOptimizedVectorSearch(queryEmbedding, strategy, context) {
    const collection = this.getCollectionForStrategy(strategy.searchConfiguration.indexName);

    // Build vector search aggregation pipeline
    const searchPipeline = [
      {
        $vectorSearch: {
          index: strategy.searchConfiguration.indexName,
          path: this.getVectorPathForStrategy(strategy),
          queryVector: queryEmbedding,
          numCandidates: strategy.searchConfiguration.numCandidates,
          limit: strategy.performanceTargets.maxResultCount,

          // Apply context-aware filtering
          ...(Object.keys(context.filters).length > 0 && {
            filter: this.buildFilterExpression(context.filters, strategy)
          })
        }
      },

      // Add vector search score
      {
        $addFields: {
          vectorSearchScore: { $meta: 'vectorSearchScore' }
        }
      },

      // Apply strategy-specific scoring enhancements
      ...this.buildScoringEnhancements(strategy, context),

      // Sort by enhanced score
      {
        $sort: { enhancedScore: -1 }
      },

      // Limit to strategy target
      {
        $limit: strategy.performanceTargets.maxResultCount
      }
    ];

    return await collection.aggregate(searchPipeline).toArray();
  }

  buildScoringEnhancements(strategy, context) {
    const enhancements = [];
    const weights = strategy.searchConfiguration.scoringWeights;

    // Base scoring calculation
    enhancements.push({
      $addFields: {
        enhancedScore: {
          $add: [
            { $multiply: ['$vectorSearchScore', weights.vectorSimilarity] },

            // Add text relevance if applicable
            ...(weights.textRelevance ? [{
              $multiply: [
                { $ifNull: ['$textRelevanceScore', 0] },
                weights.textRelevance
              ]
            }] : []),

            // Add popularity boost
            ...(weights.popularityBoost ? [{
              $multiply: [
                { $ifNull: ['$popularityScore', 0] },
                weights.popularityBoost
              ]
            }] : []),

            // Add business rule adjustments
            ...(weights.businessRules ? [{
              $multiply: [
                { $ifNull: ['$businessRuleScore', 0] },
                weights.businessRules
              ]
            }] : []),

            // Add user engagement metrics
            ...(weights.userEngagement ? [{
              $multiply: [
                { $ifNull: ['$userEngagementScore', 0] },
                weights.userEngagement
              ]
            }] : []),

            // Add recency boost
            ...(weights.recency ? [{
              $multiply: [
                { $ifNull: ['$recencyScore', 0] },
                weights.recency
              ]
            }] : [])
          ]
        }
      }
    });

    // Add strategy-specific scoring fields
    if (strategy.searchConfiguration.indexName === 'products_vector_search') {
      enhancements.unshift({
        $addFields: {
          popularityScore: {
            $divide: [
              { $ln: { $add: ['$metrics.totalSales', 1] } },
              10
            ]
          },

          businessRuleScore: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$featured', true] },
                  then: 0.3
                },
                {
                  case: { $gte: ['$ratings.averageRating', 4.5] },
                  then: 0.2
                },
                {
                  case: { $gte: ['$inventory.stockQuantity', 50] },
                  then: 0.1
                }
              ],
              default: 0.0
            }
          }
        }
      });
    } else if (strategy.searchConfiguration.indexName === 'content_vector_search') {
      enhancements.unshift({
        $addFields: {
          userEngagementScore: {
            $divide: [
              { $add: ['$metrics.views', '$metrics.likes', '$metrics.shares'] },
              1000
            ]
          },

          recencyScore: {
            $cond: {
              if: { 
                $gte: [
                  '$publishedAt',
                  new Date(Date.now() - 7 * 24 * 60 * 60 * 1000)
                ]
              },
              then: 0.2,
              else: 0.0
            }
          }
        }
      });
    }

    return enhancements;
  }

  async applySearchPostProcessing(results, strategy, context) {
    // Apply result diversification if enabled
    if (strategy.searchConfiguration.resultDiversification) {
      results = await this.diversifyResults(results, strategy);
    }

    // Apply user personalization
    if (context.userId && strategy.searchConfiguration.personalizeResults !== false) {
      results = await this.personalizeResults(results, context.userId, strategy);
    }

    // Apply business rules and filters
    results = this.applyBusinessRules(results, strategy, context);

    // Ensure minimum result count
    if (results.length < strategy.performanceTargets.minResultCount) {
      console.warn(`Insufficient results for ${strategy.strategy}: ${results.length} < ${strategy.performanceTargets.minResultCount}`);
    }

    return results;
  }

  async diversifyResults(results, strategy) {
    if (results.length <= 5) return results; // Skip diversification for small result sets

    const diversified = [];
    const categories = new Set();
    const maxPerCategory = Math.ceil(strategy.performanceTargets.maxResultCount / 5);
    const categoryCount = new Map();

    // First, add top results ensuring category diversity
    for (const result of results) {
      const category = result.category || result.contentType || 'default';
      const currentCount = categoryCount.get(category) || 0;

      if (currentCount < maxPerCategory || categories.size < 3) {
        diversified.push(result);
        categories.add(category);
        categoryCount.set(category, currentCount + 1);

        if (diversified.length >= strategy.performanceTargets.maxResultCount) {
          break;
        }
      }
    }

    // Fill remaining slots with best remaining results
    const remaining = strategy.performanceTargets.maxResultCount - diversified.length;
    const usedIds = new Set(diversified.map(r => r._id?.toString() || r.productId?.toString()));

    for (const result of results) {
      if (remaining <= 0) break;

      const resultId = result._id?.toString() || result.productId?.toString();
      if (!usedIds.has(resultId)) {
        diversified.push(result);
        usedIds.add(resultId);
      }
    }

    return diversified;
  }

  async personalizeResults(results, userId, strategy) {
    // Get user preferences and behavior
    const userProfile = await this.getUserProfile(userId);

    if (!userProfile) return results;

    // Apply personalization scoring
    return results.map(result => {
      let personalizationBoost = 0;

      // Category preferences
      if (userProfile.preferredCategories?.includes(result.category)) {
        personalizationBoost += 0.1;
      }

      // Brand preferences
      if (userProfile.preferredBrands?.includes(result.brand)) {
        personalizationBoost += 0.05;
      }

      // Price range preferences
      if (result.pricing && userProfile.priceRange) {
        if (result.pricing.basePrice >= userProfile.priceRange.min && 
            result.pricing.basePrice <= userProfile.priceRange.max) {
          personalizationBoost += 0.05;
        }
      }

      // Update enhanced score with personalization
      result.enhancedScore = (result.enhancedScore || result.vectorSearchScore || 0) + personalizationBoost;

      return result;
    }).sort((a, b) => (b.enhancedScore || 0) - (a.enhancedScore || 0));
  }

  applyBusinessRules(results, strategy, context) {
    // Apply business-specific filtering and boosting rules
    return results
      .filter(result => {
        // Basic availability check
        if (result.availability && !result.availability.isActive) {
          return false;
        }

        // Inventory check for products
        if (result.inventory && result.inventory.stockQuantity === 0 && 
            !result.inventory.allowBackorder) {
          return false;
        }

        // Content moderation check
        if (result.moderation && result.moderation.status === 'rejected') {
          return false;
        }

        return true;
      })
      .map(result => {
        // Apply business rule boosts
        if (result.featured || result.promoted) {
          result.enhancedScore = (result.enhancedScore || 0) * 1.2;
        }

        if (result.ratings && result.ratings.averageRating >= 4.5) {
          result.enhancedScore = (result.enhancedScore || 0) * 1.1;
        }

        return result;
      });
  }

  getCollectionForStrategy(indexName) {
    const collectionMap = {
      'products_vector_search': 'products',
      'content_vector_search': 'content_items',
      'knowledge_vector_search': 'knowledge_articles',
      'images_vector_search': 'images'
    };

    const collectionName = collectionMap[indexName];
    if (!collectionName) {
      throw new Error(`Unknown index name: ${indexName}`);
    }

    return this.db.collection(collectionName);
  }

  getVectorPathForStrategy(strategy) {
    const pathMap = {
      'products_vector_search': 'embeddings.combined',
      'content_vector_search': 'embeddings.content',
      'knowledge_vector_search': 'embeddings.article',
      'images_vector_search': 'embeddings.visual'
    };

    return pathMap[strategy.searchConfiguration.indexName] || 'embeddings.default';
  }

  buildFilterExpression(filters, strategy) {
    const filterExpression = { $and: [] };

    // Apply filters based on strategy priority
    for (const filterType of strategy.searchConfiguration.filterPriority) {
      if (filters[filterType] !== undefined) {
        switch (filterType) {
          case 'category':
            if (Array.isArray(filters.category)) {
              filterExpression.$and.push({ category: { $in: filters.category } });
            } else {
              filterExpression.$and.push({ category: filters.category });
            }
            break;

          case 'priceRange':
            if (filters.priceRange.min !== undefined || filters.priceRange.max !== undefined) {
              const priceFilter = {};
              if (filters.priceRange.min !== undefined) {
                priceFilter.$gte = filters.priceRange.min;
              }
              if (filters.priceRange.max !== undefined) {
                priceFilter.$lte = filters.priceRange.max;
              }
              filterExpression.$and.push({ 'pricing.basePrice': priceFilter });
            }
            break;

          case 'availability':
            filterExpression.$and.push({ 'availability.isActive': true });
            break;

          case 'ratings':
            if (filters.ratings?.min !== undefined) {
              filterExpression.$and.push({ 
                'ratings.averageRating': { $gte: filters.ratings.min }
              });
            }
            break;
        }
      }
    }

    return filterExpression.$and.length > 0 ? filterExpression : undefined;
  }

  async getUserProfile(userId) {
    const userProfilesCollection = this.db.collection('user_profiles');
    return await userProfilesCollection.findOne(
      { userId: userId },
      { 
        projection: {
          preferredCategories: 1,
          preferredBrands: 1,
          priceRange: 1,
          behaviors: 1
        }
      }
    );
  }

  async getPerformanceMetrics() {
    const searchMetrics = [];

    for (const [strategyName, strategy] of this.searchStrategies) {
      const metrics = await this.getStrategyMetrics(strategyName);
      searchMetrics.push({
        strategy: strategyName,
        configuration: strategy,
        metrics: metrics
      });
    }

    return {
      timestamp: new Date(),
      strategies: searchMetrics
    };
  }

  async getStrategyMetrics(strategyName) {
    const searchLogs = this.db.collection('search_logs');

    const metrics = await searchLogs.aggregate([
      {
        $match: {
          strategy: strategyName,
          timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: null,
          totalSearches: { $sum: 1 },
          avgDuration: { $avg: '$searchDurationMs' },
          avgResults: { $avg: '$resultCount' },
          p95Duration: { $percentile: { input: '$searchDurationMs', p: [0.95], method: 'approximate' } },
          successRate: { 
            $avg: { 
              $cond: [{ $gt: ['$resultCount', 0] }, 1, 0]
            }
          }
        }
      }
    ]).toArray();

    return metrics[0] || {
      totalSearches: 0,
      avgDuration: 0,
      avgResults: 0,
      successRate: 0
    };
  }
}

// Export the enterprise vector search orchestrator
module.exports = { EnterpriseVectorSearchOrchestrator };

SQL-Style Vector Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Atlas Vector Search operations:

-- QueryLeaf vector search operations with SQL-familiar syntax

-- Create vector search indexes with SQL-style DDL
CREATE VECTOR INDEX products_semantic_search ON products (
  -- Primary embedding field
  embeddings.combined VECTOR(1536) USING cosine,

  -- Additional vector fields for multi-modal search
  embeddings.title VECTOR(384) USING euclidean,
  embeddings.description VECTOR(768) USING dotProduct,
  embeddings.image VECTOR(512) USING cosine,

  -- Filterable fields for hybrid search
  category FILTER,
  brand FILTER,
  pricing.basePrice FILTER,
  availability.isActive FILTER,
  ratings.averageRating FILTER
) WITH (
  similarity_algorithm = 'cosine',
  num_candidates = 2000
);

-- Vector similarity search with SQL syntax
WITH semantic_search AS (
  SELECT 
    p.*,
    VECTOR_SIMILARITY(
      p.embeddings.combined,
      VECTOR_EMBED('openai-ada-002', 'wireless bluetooth headphones with noise cancellation'),
      'cosine'
    ) as semantic_similarity,

    -- Multi-modal similarity scoring
    VECTOR_SIMILARITY(
      p.embeddings.title,
      VECTOR_EMBED('sentence-transformer', 'wireless bluetooth headphones with noise cancellation'), 
      'euclidean'
    ) as title_similarity,

    VECTOR_SIMILARITY(
      p.embeddings.description,
      VECTOR_EMBED('cohere-embed-v3', 'wireless bluetooth headphones with noise cancellation'),
      'dotProduct' 
    ) as description_similarity

  FROM products p
  WHERE 
    -- Vector search with filtering
    VECTOR_SEARCH(
      p.embeddings.combined,
      VECTOR_EMBED('openai-ada-002', 'wireless bluetooth headphones with noise cancellation'),
      num_candidates = 1000,
      limit = 50
    )
    AND p.availability.isActive = true
    AND p.category IN ('electronics', 'audio', 'headphones')
    AND p.pricing.basePrice BETWEEN 50 AND 500
    AND p.ratings.averageRating >= 4.0
),

scored_results AS (
  SELECT *,
    -- Hybrid scoring combining multiple similarity measures
    (
      semantic_similarity * 0.6 +
      title_similarity * 0.25 +
      description_similarity * 0.15
    ) as combined_similarity_score,

    -- Business rule adjustments
    CASE 
      WHEN featured = true THEN 0.2
      WHEN ratings.averageRating >= 4.5 THEN 0.1
      WHEN inventory.stockQuantity > 100 THEN 0.05
      ELSE 0.0
    END as business_boost,

    -- Popularity scoring
    LOG(COALESCE(metrics.totalSales, 1) + 1) / 10.0 as popularity_score,

    -- Recency boost for new products
    CASE 
      WHEN created_at >= CURRENT_DATE - INTERVAL '30 days' THEN 0.1
      WHEN created_at >= CURRENT_DATE - INTERVAL '90 days' THEN 0.05
      ELSE 0.0
    END as recency_boost

  FROM semantic_search
),

final_rankings AS (
  SELECT *,
    -- Calculate final relevance score
    combined_similarity_score + business_boost + popularity_score + recency_boost as final_score,

    -- Ranking within categories for diversification
    ROW_NUMBER() OVER (
      PARTITION BY category 
      ORDER BY combined_similarity_score DESC
    ) as category_rank,

    -- Overall ranking
    ROW_NUMBER() OVER (ORDER BY final_score DESC) as overall_rank

  FROM scored_results
)

SELECT 
  product_id,
  name,
  description,
  category,
  brand,
  pricing.basePrice as price,
  ratings.averageRating as avg_rating,

  -- Similarity and scoring details
  ROUND(semantic_similarity, 4) as semantic_sim,
  ROUND(title_similarity, 4) as title_sim,
  ROUND(description_similarity, 4) as desc_sim,
  ROUND(final_score, 4) as relevance_score,

  -- Ranking information
  overall_rank,
  category_rank,

  -- Business context
  CASE
    WHEN business_boost > 0 THEN 'Featured/Highly Rated'
    WHEN popularity_score > 0.5 THEN 'Popular Choice'
    WHEN recency_boost > 0 THEN 'New Product'
    ELSE 'Standard'
  END as recommendation_reason,

  -- Search context
  'semantic_vector_search' as search_method,
  CURRENT_TIMESTAMP as search_timestamp

FROM final_rankings
WHERE overall_rank <= 20
ORDER BY final_score DESC, overall_rank ASC;

-- Personalized recommendations using vector similarity
WITH user_preference_embedding AS (
  SELECT 
    user_id,
    preferences.embedding as preference_vector,
    preferences.categories as preferred_categories,
    preferences.price_range as price_range
  FROM user_preferences
  WHERE user_id = $1
),

personalized_candidates AS (
  SELECT 
    p.*,
    VECTOR_SIMILARITY(
      upe.preference_vector,
      p.embeddings.combined,
      'cosine'
    ) as preference_similarity,

    -- Category preference matching
    CASE 
      WHEN p.category = ANY(upe.preferred_categories) THEN 0.3
      ELSE 0.0
    END as category_preference_score,

    -- Price preference alignment
    CASE 
      WHEN p.pricing.basePrice BETWEEN upe.price_range.min AND upe.price_range.max THEN 0.2
      ELSE 0.0
    END as price_preference_score

  FROM products p
  CROSS JOIN user_preference_embedding upe
  WHERE 
    VECTOR_SEARCH(
      p.embeddings.combined,
      upe.preference_vector,
      num_candidates = 2000,
      limit = 100
    )
    AND p.availability.isActive = true
    AND p.product_id NOT IN (
      -- Exclude recently purchased/viewed products
      SELECT product_id 
      FROM user_interactions ui
      WHERE ui.user_id = $1 
      AND ui.interaction_type IN ('purchase', 'view')
      AND ui.interaction_date >= CURRENT_DATE - INTERVAL '30 days'
    )
),

recommendation_scores AS (
  SELECT *,
    -- Combined recommendation score
    (
      preference_similarity * 0.5 +
      category_preference_score +
      price_preference_score +
      (ratings.averageRating / 5.0) * 0.1
    ) as recommendation_score,

    -- Diversification ranking
    ROW_NUMBER() OVER (
      PARTITION BY category 
      ORDER BY preference_similarity DESC
    ) as category_diversity_rank

  FROM personalized_candidates
)

SELECT 
  product_id,
  name,
  category,
  brand,
  pricing.basePrice as price,
  ratings.averageRating as rating,

  -- Recommendation metrics
  ROUND(preference_similarity, 4) as preference_match,
  ROUND(recommendation_score, 4) as recommendation_score,

  -- Explanation
  CASE 
    WHEN category_preference_score > 0 THEN 'Based on your category preferences'
    WHEN price_preference_score > 0 THEN 'Within your preferred price range'
    WHEN preference_similarity > 0.8 THEN 'Highly similar to your preferences'
    ELSE 'Recommended for you'
  END as recommendation_reason,

  category_diversity_rank,
  'personalized_vector_recommendation' as recommendation_type

FROM recommendation_scores
WHERE category_diversity_rank <= 5  -- Max 5 per category for diversity
ORDER BY recommendation_score DESC
LIMIT 20;

-- Hybrid search combining semantic search with traditional text search
WITH vector_search_results AS (
  SELECT 
    p.*,
    VECTOR_SIMILARITY(
      p.embeddings.combined,
      VECTOR_EMBED('openai-ada-002', $1),  -- Query parameter
      'cosine'
    ) as vector_score,
    'vector_search' as source
  FROM products p
  WHERE 
    VECTOR_SEARCH(
      p.embeddings.combined,
      VECTOR_EMBED('openai-ada-002', $1),
      num_candidates = 1000,
      limit = 30
    )
    AND p.availability.isActive = true
),

text_search_results AS (
  SELECT 
    p.*,
    MATCH_SCORE(p.search_text, $1) as text_score,
    'text_search' as source
  FROM products p
  WHERE 
    MATCH(p.search_text) AGAINST ($1 IN BOOLEAN MODE)
    AND p.availability.isActive = true
  ORDER BY text_score DESC
  LIMIT 30
),

combined_results AS (
  SELECT *, vector_score as relevance_score FROM vector_search_results
  UNION ALL
  SELECT *, text_score as relevance_score FROM text_search_results
),

deduplicated_results AS (
  SELECT 
    product_id,
    name,
    description,
    category,
    brand,
    pricing,
    ratings,

    -- Aggregate scores from multiple sources
    MAX(relevance_score) as max_score,
    AVG(relevance_score) as avg_score,
    COUNT(*) as source_count,
    ARRAY_AGG(DISTINCT source) as search_sources,

    -- Hybrid scoring - boost items found by multiple methods
    CASE 
      WHEN COUNT(*) > 1 THEN MAX(relevance_score) * 1.2  -- Multi-source boost
      ELSE MAX(relevance_score)
    END as hybrid_score

  FROM combined_results
  GROUP BY product_id, name, description, category, brand, pricing, ratings
)

SELECT 
  product_id,
  name,
  category,
  brand,
  pricing.basePrice as price,
  ratings.averageRating as rating,

  -- Scoring details
  ROUND(hybrid_score, 4) as relevance_score,
  ROUND(max_score, 4) as max_individual_score,
  source_count,
  search_sources,

  -- Search method classification
  CASE 
    WHEN source_count > 1 THEN 'hybrid_match'
    WHEN 'vector_search' = ANY(search_sources) THEN 'semantic_match'
    WHEN 'text_search' = ANY(search_sources) THEN 'keyword_match'
    ELSE 'unknown'
  END as match_type,

  'hybrid_search' as search_algorithm

FROM deduplicated_results
ORDER BY hybrid_score DESC, source_count DESC
LIMIT 25;

-- Vector search performance analysis and optimization
WITH search_performance AS (
  SELECT 
    embedding_model,
    search_type,
    DATE_TRUNC('hour', search_timestamp) as hour_bucket,

    -- Performance metrics
    COUNT(*) as total_searches,
    AVG(search_duration_ms) as avg_duration_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY search_duration_ms) as p95_duration_ms,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY search_duration_ms) as p99_duration_ms,

    -- Result quality metrics
    AVG(result_count) as avg_result_count,
    AVG(avg_similarity_score) as avg_similarity,
    COUNT(*) FILTER (WHERE result_count = 0) as zero_result_searches,

    -- User engagement metrics
    AVG(click_through_rate) as avg_ctr,
    AVG(conversion_rate) as avg_conversion_rate,

    -- Resource utilization
    AVG(candidates_examined) as avg_candidates,
    AVG(memory_usage_mb) as avg_memory_usage

  FROM vector_search_logs vsl
  WHERE search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  GROUP BY embedding_model, search_type, DATE_TRUNC('hour', search_timestamp)
),

performance_trends AS (
  SELECT *,
    -- Calculate performance trends
    LAG(avg_duration_ms) OVER (
      PARTITION BY embedding_model, search_type 
      ORDER BY hour_bucket
    ) as prev_avg_duration,

    LAG(avg_similarity) OVER (
      PARTITION BY embedding_model, search_type 
      ORDER BY hour_bucket
    ) as prev_avg_similarity,

    LAG(avg_ctr) OVER (
      PARTITION BY embedding_model, search_type 
      ORDER BY hour_bucket
    ) as prev_avg_ctr

  FROM search_performance
)

SELECT 
  embedding_model,
  search_type,
  TO_CHAR(hour_bucket, 'YYYY-MM-DD HH24:00') as analysis_hour,

  -- Volume metrics
  total_searches,
  ROUND(avg_result_count, 1) as avg_results,

  -- Performance metrics
  ROUND(avg_duration_ms, 0) as avg_duration_ms,
  ROUND(p95_duration_ms, 0) as p95_duration_ms,
  ROUND(p99_duration_ms, 0) as p99_duration_ms,

  -- Quality metrics
  ROUND(avg_similarity, 3) as avg_similarity,
  ROUND((zero_result_searches::DECIMAL / total_searches) * 100, 1) as zero_result_rate_pct,

  -- Engagement metrics
  ROUND(avg_ctr * 100, 2) as avg_ctr_pct,
  ROUND(avg_conversion_rate * 100, 2) as avg_conversion_pct,

  -- Resource metrics
  ROUND(avg_candidates, 0) as avg_candidates_examined,
  ROUND(avg_memory_usage, 1) as avg_memory_mb,

  -- Trend analysis
  CASE 
    WHEN prev_avg_duration IS NOT NULL THEN
      ROUND(((avg_duration_ms - prev_avg_duration) / prev_avg_duration) * 100, 1)
    ELSE NULL
  END as duration_change_pct,

  CASE 
    WHEN prev_avg_similarity IS NOT NULL THEN
      ROUND(((avg_similarity - prev_avg_similarity) / prev_avg_similarity) * 100, 1) 
    ELSE NULL
  END as similarity_change_pct,

  -- Performance assessment
  CASE 
    WHEN avg_duration_ms > 1000 THEN 'slow'
    WHEN avg_duration_ms > 500 THEN 'moderate'
    ELSE 'fast'
  END as performance_rating,

  -- Optimization recommendations
  CASE 
    WHEN avg_duration_ms > 1000 THEN 'Reduce num_candidates or optimize index'
    WHEN zero_result_searches > total_searches * 0.1 THEN 'Review embedding quality or expand corpus'
    WHEN avg_ctr < 0.05 THEN 'Improve result relevance ranking'
    WHEN avg_memory_usage > 1000 THEN 'Consider batch size optimization'
    ELSE 'Performance within acceptable parameters'
  END as optimization_recommendation

FROM performance_trends
WHERE hour_bucket >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
ORDER BY embedding_model, search_type, hour_bucket DESC;

-- QueryLeaf provides comprehensive vector search capabilities:
-- 1. SQL-familiar syntax for MongoDB Atlas Vector Search operations
-- 2. Multi-modal vector similarity search with configurable algorithms
-- 3. Hybrid search combining semantic and traditional text matching
-- 4. Personalized recommendations using user preference embeddings
-- 5. Advanced filtering and ranking with business rule integration
-- 6. Performance monitoring with comprehensive analytics and optimization
-- 7. Real-time vector search with enterprise-grade scalability
-- 8. Integration with popular embedding models and AI services
-- 9. Familiar SQL constructs for complex vector operations
-- 10. Production-ready vector database capabilities through MongoDB Atlas

Best Practices for MongoDB Atlas Vector Search Implementation

Vector Search Optimization Strategies

Essential practices for maximizing vector search performance and accuracy:

  1. Embedding Model Selection: Choose appropriate embedding models based on data type and use case requirements
  2. Index Configuration: Optimize vector indexes for similarity algorithms and dimensionality
  3. Hybrid Search Design: Combine vector similarity with traditional search methods for comprehensive results
  4. Performance Monitoring: Track search latency, result quality, and user engagement metrics
  5. Result Diversification: Implement strategies to ensure diverse and relevant search results
  6. Personalization Integration: Leverage user preference embeddings for customized experiences

Production Deployment Considerations

Key factors for enterprise vector search deployments:

  1. Scalability Planning: Design for high-concurrency vector search workloads
  2. Embedding Management: Implement efficient embedding generation and update strategies
  3. Quality Assurance: Monitor search result quality and user satisfaction metrics
  4. Cost Optimization: Balance embedding model costs with search performance requirements
  5. Security Implementation: Secure vector data and search operations appropriately
  6. Disaster Recovery: Plan for vector index backup and recovery procedures

Conclusion

MongoDB Atlas Vector Search provides enterprise-grade vector database capabilities that seamlessly integrate AI-powered search with traditional database operations. The combination of high-performance vector indexing, advanced similarity algorithms, and familiar SQL-style interfaces enables applications to deliver sophisticated semantic search, personalization, and recommendation features without additional infrastructure complexity.

Key Atlas Vector Search benefits include:

  • Native AI Integration: Vector database capabilities built into MongoDB Atlas with zero additional infrastructure
  • High-Performance Search: Optimized vector indexing and similarity algorithms for enterprise-scale workloads
  • Hybrid Search Capabilities: Seamless integration of semantic and traditional search methodologies
  • Advanced Personalization: User preference embeddings enable sophisticated recommendation systems
  • SQL Compatibility: Familiar vector operations accessible through SQL-style query interfaces
  • Comprehensive Analytics: Real-time monitoring and optimization recommendations for vector search performance

Whether you're building e-commerce recommendation engines, content discovery platforms, customer support systems, or AI-powered search applications, MongoDB Atlas Vector Search with QueryLeaf's familiar SQL interface provides the foundation for intelligent search experiences that scale efficiently while maintaining familiar development patterns.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB Atlas Vector Search operations while providing SQL-familiar syntax for vector similarity search, hybrid search strategies, and personalized recommendations. Advanced vector indexing, embedding management, and performance analytics are seamlessly accessible through familiar SQL constructs, making sophisticated AI-powered search both powerful and approachable for SQL-oriented development teams.

The integration of MongoDB's vector search capabilities with SQL-style operations makes it an ideal platform for applications that require both advanced AI functionality and operational simplicity, ensuring your search and recommendation systems deliver intelligent user experiences while maintaining familiar development and deployment patterns.