MongoDB Embedded Documents vs References: Data Modeling Patterns and Performance Optimization for Enterprise Applications
Modern applications require sophisticated data modeling strategies that balance query performance, data consistency, and schema flexibility across complex relationships and evolving business requirements. Traditional relational databases force all relationships through normalized foreign key structures that often create performance bottlenecks, complex joins, and rigid schemas that resist change as applications evolve and business requirements shift.
MongoDB's document-oriented architecture provides powerful flexibility in how relationships are modeled, offering both embedded document patterns that co-locate related data within single documents and reference patterns that maintain relationships through document identifiers. Understanding when to embed versus when to reference is crucial for designing scalable, performant applications that can adapt to changing requirements while maintaining optimal query performance and data consistency.
The Traditional Relational Normalization Challenge
Conventional relational database modeling relies heavily on normalization principles that create complex join-heavy queries and performance challenges:
-- Traditional PostgreSQL normalized schema with complex relationship management overhead
-- User profile management with multiple related entities requiring joins
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(100) UNIQUE NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
-- Basic user metadata
date_of_birth DATE,
phone_number VARCHAR(20),
status VARCHAR(20) DEFAULT 'active',
CONSTRAINT valid_status CHECK (status IN ('active', 'inactive', 'suspended', 'deleted'))
);
-- User addresses requiring separate table and joins for access
CREATE TABLE user_addresses (
address_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
address_type VARCHAR(20) NOT NULL DEFAULT 'home',
-- Address components
street_address VARCHAR(500) NOT NULL,
apartment_unit VARCHAR(100),
city VARCHAR(100) NOT NULL,
state_province VARCHAR(100),
postal_code VARCHAR(20) NOT NULL,
country VARCHAR(3) NOT NULL DEFAULT 'USA',
-- Address metadata
is_primary BOOLEAN DEFAULT FALSE,
is_billing BOOLEAN DEFAULT FALSE,
is_shipping BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT valid_address_type CHECK (address_type IN ('home', 'work', 'billing', 'shipping', 'other'))
);
-- User preferences requiring separate storage and complex queries
CREATE TABLE user_preferences (
preference_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
preference_category VARCHAR(50) NOT NULL,
preference_key VARCHAR(100) NOT NULL,
preference_value JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
UNIQUE (user_id, preference_category, preference_key),
CONSTRAINT valid_category CHECK (preference_category IN (
'notifications', 'display', 'privacy', 'content', 'accessibility'
))
);
-- User social connections with bidirectional relationship complexity
CREATE TABLE user_connections (
connection_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
requester_user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
requested_user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
connection_type VARCHAR(30) NOT NULL DEFAULT 'friend',
connection_status VARCHAR(20) NOT NULL DEFAULT 'pending',
-- Connection metadata
requested_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
responded_at TIMESTAMP WITH TIME ZONE,
last_interaction_at TIMESTAMP WITH TIME ZONE,
-- Connection details
connection_strength INTEGER DEFAULT 1 CHECK (connection_strength BETWEEN 1 AND 10),
mutual_connections INTEGER DEFAULT 0,
shared_interests TEXT[],
CONSTRAINT no_self_connection CHECK (requester_user_id != requested_user_id),
CONSTRAINT valid_connection_type CHECK (connection_type IN (
'friend', 'family', 'colleague', 'acquaintance', 'blocked'
)),
CONSTRAINT valid_status CHECK (connection_status IN (
'pending', 'accepted', 'declined', 'blocked', 'removed'
)),
UNIQUE (requester_user_id, requested_user_id, connection_type)
);
-- User activity tracking requiring separate table with heavy join overhead
CREATE TABLE user_activities (
activity_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
activity_type VARCHAR(50) NOT NULL,
activity_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
-- Activity details
activity_data JSONB NOT NULL DEFAULT '{}',
activity_source VARCHAR(50) DEFAULT 'web',
ip_address INET,
user_agent TEXT,
-- Context information
session_id VARCHAR(100),
page_url TEXT,
referrer_url TEXT,
-- Performance tracking
response_time_ms INTEGER,
error_occurred BOOLEAN DEFAULT FALSE,
error_details JSONB,
CONSTRAINT valid_activity_type CHECK (activity_type IN (
'login', 'logout', 'page_view', 'action_performed', 'data_modified', 'error_occurred'
)),
CONSTRAINT valid_source CHECK (activity_source IN ('web', 'mobile', 'api', 'system'))
);
-- Complex query requiring multiple joins for complete user profile
CREATE OR REPLACE VIEW complete_user_profiles AS
SELECT
u.user_id,
u.email,
u.username,
u.first_name,
u.last_name,
u.date_of_birth,
u.phone_number,
u.status,
u.created_at,
-- Primary address information (requires join)
primary_addr.street_address as primary_street,
primary_addr.city as primary_city,
primary_addr.state_province as primary_state,
primary_addr.postal_code as primary_postal,
primary_addr.country as primary_country,
-- Aggregated address count
COALESCE(addr_counts.total_addresses, 0) as total_addresses,
-- Connection statistics (expensive aggregation)
COALESCE(conn_stats.total_connections, 0) as total_connections,
COALESCE(conn_stats.pending_requests, 0) as pending_requests,
COALESCE(conn_stats.accepted_connections, 0) as accepted_connections,
-- Recent activity summary (expensive aggregation with time windows)
COALESCE(activity_stats.total_activities_7d, 0) as activities_last_7_days,
COALESCE(activity_stats.last_login, null) as last_login_time,
COALESCE(activity_stats.last_activity, null) as last_activity_time,
-- Preference counts (requires additional join)
COALESCE(pref_counts.total_preferences, 0) as total_preferences
FROM users u
-- Left join for primary address (performance impact)
LEFT JOIN user_addresses primary_addr ON u.user_id = primary_addr.user_id
AND primary_addr.is_primary = TRUE
-- Subquery for address counts (additional performance overhead)
LEFT JOIN (
SELECT user_id, COUNT(*) as total_addresses
FROM user_addresses
GROUP BY user_id
) addr_counts ON u.user_id = addr_counts.user_id
-- Complex subquery for connection statistics
LEFT JOIN (
SELECT
COALESCE(requester_user_id, requested_user_id) as user_id,
COUNT(*) as total_connections,
COUNT(*) FILTER (WHERE connection_status = 'pending') as pending_requests,
COUNT(*) FILTER (WHERE connection_status = 'accepted') as accepted_connections
FROM user_connections
GROUP BY COALESCE(requester_user_id, requested_user_id)
) conn_stats ON u.user_id = conn_stats.user_id
-- Time-based activity aggregation (expensive computation)
LEFT JOIN (
SELECT
user_id,
COUNT(*) FILTER (WHERE activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days') as total_activities_7d,
MAX(activity_timestamp) FILTER (WHERE activity_type = 'login') as last_login,
MAX(activity_timestamp) as last_activity
FROM user_activities
GROUP BY user_id
) activity_stats ON u.user_id = activity_stats.user_id
-- Preference aggregation
LEFT JOIN (
SELECT user_id, COUNT(*) as total_preferences
FROM user_preferences
GROUP BY user_id
) pref_counts ON u.user_id = pref_counts.user_id;
-- Performance analysis of complex join-heavy queries
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM complete_user_profiles
WHERE status = 'active'
AND total_connections > 10
ORDER BY last_activity_time DESC NULLS LAST
LIMIT 20;
-- Complex friend recommendation query with multiple joins and aggregations
WITH friend_recommendations AS (
SELECT DISTINCT
u1.user_id as target_user_id,
u2.user_id as recommended_user_id,
u2.first_name,
u2.last_name,
u2.username,
-- Mutual connections calculation (expensive)
mutual_stats.mutual_count,
-- Shared interests analysis
CASE
WHEN EXISTS (
SELECT 1 FROM user_connections uc1
JOIN user_connections uc2 ON uc1.requested_user_id = uc2.requester_user_id
WHERE uc1.requester_user_id = u1.user_id
AND uc2.requested_user_id = u2.user_id
AND uc1.shared_interests && uc2.shared_interests
) THEN TRUE ELSE FALSE
END as has_shared_interests,
-- Activity similarity
CASE
WHEN EXISTS (
SELECT 1 FROM user_activities ua1
JOIN user_activities ua2 ON ua1.activity_type = ua2.activity_type
WHERE ua1.user_id = u1.user_id
AND ua2.user_id = u2.user_id
AND ua1.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
AND ua2.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
GROUP BY ua1.activity_type
HAVING COUNT(*) > 5
) THEN TRUE ELSE FALSE
END as similar_activity_patterns,
-- Geographic proximity (if addresses available)
CASE
WHEN addr1.city = addr2.city AND addr1.state_province = addr2.state_province
THEN TRUE ELSE FALSE
END as same_geographic_area,
-- Recommendation score calculation
(
COALESCE(mutual_stats.mutual_count, 0) * 3 +
CASE WHEN has_shared_interests THEN 2 ELSE 0 END +
CASE WHEN similar_activity_patterns THEN 2 ELSE 0 END +
CASE WHEN same_geographic_area THEN 1 ELSE 0 END
) as recommendation_score
FROM users u1
CROSS JOIN users u2
-- Ensure not already connected
LEFT JOIN user_connections existing_conn ON (
(existing_conn.requester_user_id = u1.user_id AND existing_conn.requested_user_id = u2.user_id) OR
(existing_conn.requester_user_id = u2.user_id AND existing_conn.requested_user_id = u1.user_id)
)
-- Mutual connections calculation (very expensive subquery)
LEFT JOIN (
SELECT
uc1.requester_user_id as user1_id,
uc2.requester_user_id as user2_id,
COUNT(*) as mutual_count
FROM user_connections uc1
JOIN user_connections uc2 ON uc1.requested_user_id = uc2.requested_user_id
WHERE uc1.connection_status = 'accepted'
AND uc2.connection_status = 'accepted'
AND uc1.requester_user_id != uc2.requester_user_id
GROUP BY uc1.requester_user_id, uc2.requester_user_id
) mutual_stats ON mutual_stats.user1_id = u1.user_id AND mutual_stats.user2_id = u2.user_id
-- Address proximity joins
LEFT JOIN user_addresses addr1 ON u1.user_id = addr1.user_id AND addr1.is_primary = TRUE
LEFT JOIN user_addresses addr2 ON u2.user_id = addr2.user_id AND addr2.is_primary = TRUE
WHERE u1.user_id != u2.user_id
AND u1.status = 'active'
AND u2.status = 'active'
AND existing_conn.connection_id IS NULL -- Not already connected
)
SELECT
target_user_id,
recommended_user_id,
first_name,
last_name,
username,
mutual_count,
recommendation_score,
has_shared_interests,
similar_activity_patterns,
same_geographic_area,
-- Ranking within recommendations for this user
ROW_NUMBER() OVER (
PARTITION BY target_user_id
ORDER BY recommendation_score DESC, mutual_count DESC
) as recommendation_rank
FROM friend_recommendations
WHERE recommendation_score > 0
ORDER BY target_user_id, recommendation_score DESC;
-- Problems with traditional normalized relational modeling:
-- 1. Complex multi-table joins required for basic user profile queries affecting performance
-- 2. Expensive aggregation queries across multiple related tables with poor scalability
-- 3. Rigid schema structure requiring ALTER TABLE operations for new fields
-- 4. Foreign key constraint management overhead affecting insert/update performance
-- 5. Complex query optimization challenges with multiple join paths and aggregations
-- 6. Difficulty modeling variable or optional relationship structures
-- 7. Performance degradation as related data volume increases due to join complexity
-- 8. Complex application code required to reconstruct related objects from multiple tables
-- 9. Limited ability to co-locate frequently accessed related data for optimal performance
-- 10. Expensive view materialization and maintenance for denormalized query patterns
MongoDB provides flexible document modeling patterns that optimize for query performance and data access patterns:
// MongoDB Document Modeling - Flexible embedded and reference patterns for optimal performance
const { MongoClient, ObjectId } = require('mongodb');
// Advanced MongoDB Document Modeling Manager for Enterprise Data Relationship Optimization
class AdvancedDocumentModelingManager {
constructor(client, config = {}) {
this.client = client;
this.db = client.db(config.database || 'enterprise_application');
this.config = {
// Modeling configuration
enableEmbeddedOptimization: config.enableEmbeddedOptimization !== false,
enableReferenceOptimization: config.enableReferenceOptimization !== false,
enableHybridModeling: config.enableHybridModeling !== false,
// Performance optimization
enableQueryOptimization: config.enableQueryOptimization !== false,
enableIndexOptimization: config.enableIndexOptimization !== false,
enableAggregationOptimization: config.enableAggregationOptimization !== false,
// Data consistency
enableConsistencyValidation: config.enableConsistencyValidation !== false,
enableReferentialIntegrity: config.enableReferentialIntegrity !== false,
enableDataSynchronization: config.enableDataSynchronization !== false,
// Monitoring and analytics
enablePerformanceMonitoring: config.enablePerformanceMonitoring !== false,
enableQueryAnalytics: config.enableQueryAnalytics !== false,
enableDocumentSizeMonitoring: config.enableDocumentSizeMonitoring !== false
};
// Modeling strategy tracking
this.modelingStrategies = new Map();
this.performanceMetrics = new Map();
this.queryPatterns = new Map();
this.initializeModelingManager();
}
async initializeModelingManager() {
console.log('Initializing Advanced Document Modeling Manager...');
try {
// Setup embedded document patterns
await this.setupEmbeddedDocumentPatterns();
// Setup reference patterns
await this.setupReferencePatterns();
// Setup hybrid modeling patterns
await this.setupHybridModelingPatterns();
// Initialize performance monitoring
if (this.config.enablePerformanceMonitoring) {
await this.initializePerformanceMonitoring();
}
console.log('Document Modeling Manager initialized successfully');
} catch (error) {
console.error('Error initializing document modeling manager:', error);
throw error;
}
}
async setupEmbeddedDocumentPatterns() {
console.log('Setting up embedded document modeling patterns...');
try {
// User profile with embedded addresses and preferences - optimal for frequent co-access
const userProfilesCollection = this.db.collection('user_profiles_embedded');
// Create optimized indexes for embedded document queries
await userProfilesCollection.createIndexes([
{ key: { email: 1 }, unique: true, background: true },
{ key: { username: 1 }, unique: true, background: true },
{ key: { 'addresses.type': 1, 'addresses.isPrimary': 1 }, background: true },
{ key: { 'preferences.category': 1, 'preferences.key': 1 }, background: true },
{ key: { status: 1, lastActivityAt: -1 }, background: true }
]);
this.modelingStrategies.set('user_profiles_embedded', {
collection: userProfilesCollection,
pattern: 'embedded_documents',
useCase: 'frequently_accessed_related_data',
benefits: [
'Single query for complete user profile',
'Atomic updates for user and related data',
'No joins required for common queries',
'Optimal performance for read-heavy workloads'
],
considerations: [
'Document size growth with related data',
'Potential for data duplication',
'Complex update operations for nested data'
],
queryOptimization: {
primaryQueries: ['find_by_user_id', 'find_by_email', 'find_with_addresses'],
indexStrategy: 'compound_indexes_for_embedded_fields',
projectionStrategy: 'selective_field_projection'
}
});
// Order documents with embedded line items - transactional consistency
const ordersCollection = this.db.collection('orders_embedded');
await ordersCollection.createIndexes([
{ key: { customerId: 1, orderDate: -1 }, background: true },
{ key: { orderStatus: 1, orderDate: -1 }, background: true },
{ key: { 'items.productId': 1 }, background: true },
{ key: { 'items.category': 1, orderDate: -1 }, background: true },
{ key: { totalAmount: 1 }, background: true }
]);
this.modelingStrategies.set('orders_embedded', {
collection: ordersCollection,
pattern: 'embedded_array_documents',
useCase: 'transactional_consistency_required',
benefits: [
'ACID guarantees for order and line items',
'Single document queries for complete orders',
'Efficient aggregation across order items',
'Simplified application logic'
],
considerations: [
'Document size with many line items',
'Array index performance for large arrays',
'Memory usage for large embedded arrays'
]
});
console.log('Embedded document patterns configured successfully');
} catch (error) {
console.error('Error setting up embedded document patterns:', error);
throw error;
}
}
async setupReferencePatterns() {
console.log('Setting up reference modeling patterns...');
try {
// User collection with references to separate related collections
const usersCollection = this.db.collection('users_referenced');
const addressesCollection = this.db.collection('user_addresses_referenced');
const activitiesCollection = this.db.collection('user_activities_referenced');
// User collection indexes
await usersCollection.createIndexes([
{ key: { email: 1 }, unique: true, background: true },
{ key: { username: 1 }, unique: true, background: true },
{ key: { status: 1, createdAt: -1 }, background: true }
]);
// Address collection with user references
await addressesCollection.createIndexes([
{ key: { userId: 1, type: 1 }, background: true },
{ key: { userId: 1, isPrimary: 1 }, background: true },
{ key: { city: 1, stateProvince: 1 }, background: true }
]);
// Activity collection with user references and time-based queries
await activitiesCollection.createIndexes([
{ key: { userId: 1, timestamp: -1 }, background: true },
{ key: { activityType: 1, timestamp: -1 }, background: true },
{ key: { timestamp: -1 }, background: true }
]);
this.modelingStrategies.set('users_referenced', {
collections: {
users: usersCollection,
addresses: addressesCollection,
activities: activitiesCollection
},
pattern: 'normalized_references',
useCase: 'independent_entity_management',
benefits: [
'Normalized data structure reduces duplication',
'Independent scaling of related collections',
'Flexible querying of individual entity types',
'Efficient updates to specific data types'
],
considerations: [
'Multiple queries required for complete data',
'Application-level join complexity',
'Potential consistency challenges',
'Network round-trips for related data'
],
queryOptimization: {
primaryQueries: ['find_user_with_addresses', 'find_user_activities', 'aggregate_user_data'],
joinStrategy: 'application_level_population',
indexStrategy: 'reference_field_optimization'
}
});
console.log('Reference patterns configured successfully');
} catch (error) {
console.error('Error setting up reference patterns:', error);
throw error;
}
}
async setupHybridModelingPatterns() {
console.log('Setting up hybrid modeling patterns...');
try {
// Blog posts with embedded metadata and referenced comments
const blogPostsCollection = this.db.collection('blog_posts_hybrid');
const commentsCollection = this.db.collection('blog_comments_hybrid');
await blogPostsCollection.createIndexes([
{ key: { authorId: 1, publishedAt: -1 }, background: true },
{ key: { 'tags.name': 1, publishedAt: -1 }, background: true },
{ key: { status: 1, publishedAt: -1 }, background: true },
{ key: { 'metadata.category': 1 }, background: true }
]);
await commentsCollection.createIndexes([
{ key: { postId: 1, createdAt: -1 }, background: true },
{ key: { authorId: 1, createdAt: -1 }, background: true },
{ key: { status: 1, createdAt: -1 }, background: true }
]);
this.modelingStrategies.set('blog_posts_hybrid', {
collections: {
posts: blogPostsCollection,
comments: commentsCollection
},
pattern: 'hybrid_embedded_and_referenced',
useCase: 'mixed_access_patterns',
benefits: [
'Optimized for different query patterns',
'Embedded data for frequent access',
'Referenced data for independent management',
'Balanced performance and flexibility'
],
considerations: [
'Complex modeling decisions',
'Mixed query strategies required',
'Potential data consistency complexity'
]
});
console.log('Hybrid modeling patterns configured successfully');
} catch (error) {
console.error('Error setting up hybrid patterns:', error);
throw error;
}
}
async createEmbeddedUserProfile(userData) {
console.log('Creating user profile with embedded document pattern...');
try {
const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;
const embeddedProfile = {
_id: new ObjectId(),
email: userData.email,
username: userData.username,
firstName: userData.firstName,
lastName: userData.lastName,
phoneNumber: userData.phoneNumber,
dateOfBirth: userData.dateOfBirth,
status: 'active',
// Embedded addresses for optimal co-access
addresses: userData.addresses?.map(addr => ({
_id: new ObjectId(),
type: addr.type,
streetAddress: addr.streetAddress,
apartmentUnit: addr.apartmentUnit,
city: addr.city,
stateProvince: addr.stateProvince,
postalCode: addr.postalCode,
country: addr.country,
isPrimary: addr.isPrimary || false,
isBilling: addr.isBilling || false,
isShipping: addr.isShipping || false,
createdAt: new Date(),
updatedAt: new Date()
})) || [],
// Embedded preferences for atomic updates
preferences: userData.preferences?.map(pref => ({
_id: new ObjectId(),
category: pref.category,
key: pref.key,
value: pref.value,
dataType: pref.dataType || 'string',
createdAt: new Date(),
updatedAt: new Date()
})) || [],
// Embedded profile metadata
profileMetadata: {
theme: userData.theme || 'light',
language: userData.language || 'en',
timezone: userData.timezone || 'UTC',
notificationSettings: {
email: userData.emailNotifications !== false,
push: userData.pushNotifications !== false,
sms: userData.smsNotifications || false
},
privacySettings: {
profileVisibility: userData.profileVisibility || 'public',
allowDirectMessages: userData.allowDirectMessages !== false,
shareActivityStatus: userData.shareActivityStatus !== false
}
},
// Activity summary (embedded for performance)
activitySummary: {
totalLogins: 0,
lastLoginAt: null,
lastActivityAt: new Date(),
accountCreatedAt: new Date(),
profileCompletionScore: this.calculateProfileCompleteness(userData)
},
// Audit information
createdAt: new Date(),
updatedAt: new Date(),
version: 1
};
const result = await userProfilesCollection.insertOne(embeddedProfile);
// Update performance metrics
await this.updateModelingMetrics('user_profiles_embedded', 'create', embeddedProfile);
console.log(`Embedded user profile created: ${result.insertedId}`);
return {
userId: result.insertedId,
modelingPattern: 'embedded_documents',
documentsCreated: 1,
queryOptimized: true,
atomicUpdates: true
};
} catch (error) {
console.error('Error creating embedded user profile:', error);
throw error;
}
}
async createReferencedUserProfile(userData) {
console.log('Creating user profile with reference pattern...');
try {
const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;
const addressesCollection = this.modelingStrategies.get('users_referenced').collections.addresses;
// Create main user document
const userDocument = {
_id: new ObjectId(),
email: userData.email,
username: userData.username,
firstName: userData.firstName,
lastName: userData.lastName,
phoneNumber: userData.phoneNumber,
dateOfBirth: userData.dateOfBirth,
status: 'active',
// Basic profile information
profileMetadata: {
theme: userData.theme || 'light',
language: userData.language || 'en',
timezone: userData.timezone || 'UTC'
},
createdAt: new Date(),
updatedAt: new Date(),
version: 1
};
const userResult = await usersCollection.insertOne(userDocument);
const userId = userResult.insertedId;
// Create referenced address documents
const addressDocuments = userData.addresses?.map(addr => ({
_id: new ObjectId(),
userId: userId,
type: addr.type,
streetAddress: addr.streetAddress,
apartmentUnit: addr.apartmentUnit,
city: addr.city,
stateProvince: addr.stateProvince,
postalCode: addr.postalCode,
country: addr.country,
isPrimary: addr.isPrimary || false,
isBilling: addr.isBilling || false,
isShipping: addr.isShipping || false,
createdAt: new Date(),
updatedAt: new Date()
})) || [];
let addressResults = null;
if (addressDocuments.length > 0) {
addressResults = await addressesCollection.insertMany(addressDocuments);
}
// Update performance metrics
await this.updateModelingMetrics('users_referenced', 'create', {
mainDocument: userDocument,
referencedDocuments: addressDocuments
});
console.log(`Referenced user profile created: ${userId} with ${addressDocuments.length} addresses`);
return {
userId: userId,
modelingPattern: 'normalized_references',
documentsCreated: 1 + addressDocuments.length,
addressIds: addressResults ? Object.values(addressResults.insertedIds) : [],
queryOptimized: false, // Requires joins
normalizedStructure: true
};
} catch (error) {
console.error('Error creating referenced user profile:', error);
throw error;
}
}
async getUserProfileEmbedded(userId, options = {}) {
console.log(`Retrieving embedded user profile: ${userId}`);
try {
const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;
// Single query for complete profile - optimal performance
const projection = options.fields ? this.buildProjection(options.fields) : {};
const profile = await userProfilesCollection.findOne(
{ _id: new ObjectId(userId) },
{ projection }
);
if (!profile) {
throw new Error(`User profile not found: ${userId}`);
}
// Update query metrics
await this.updateQueryMetrics('user_profiles_embedded', 'single_document_query', {
documentsReturned: 1,
queryTime: Date.now(),
projectionUsed: Object.keys(projection).length > 0
});
console.log(`Embedded profile retrieved: ${userId} (single query)`);
return {
profile: profile,
modelingPattern: 'embedded_documents',
queriesExecuted: 1,
performanceOptimized: true,
dataConsistency: 'guaranteed'
};
} catch (error) {
console.error(`Error retrieving embedded profile ${userId}:`, error);
throw error;
}
}
async getUserProfileReferenced(userId, options = {}) {
console.log(`Retrieving referenced user profile: ${userId}`);
try {
const collections = this.modelingStrategies.get('users_referenced').collections;
// Multiple queries required for complete profile
const queries = [];
// Main user query
queries.push(
collections.users.findOne({ _id: new ObjectId(userId) })
);
// Related data queries
if (!options.userOnly) {
queries.push(
collections.addresses.find({ userId: new ObjectId(userId) }).toArray()
);
}
const [userDoc, addressDocs] = await Promise.all(queries);
if (!userDoc) {
throw new Error(`User not found: ${userId}`);
}
// Construct complete profile from multiple documents
const completeProfile = {
...userDoc,
addresses: addressDocs || [],
// Derived fields
primaryAddress: addressDocs?.find(addr => addr.isPrimary),
addressCount: addressDocs?.length || 0
};
// Update query metrics
await this.updateQueryMetrics('users_referenced', 'multi_document_query', {
documentsReturned: 1 + (addressDocs?.length || 0),
queriesExecuted: queries.length,
queryTime: Date.now()
});
console.log(`Referenced profile retrieved: ${userId} (${queries.length} queries)`);
return {
profile: completeProfile,
modelingPattern: 'normalized_references',
queriesExecuted: queries.length,
performanceOptimized: false,
dataConsistency: 'eventual'
};
} catch (error) {
console.error(`Error retrieving referenced profile ${userId}:`, error);
throw error;
}
}
async updateEmbeddedUserAddress(userId, addressId, updateData) {
console.log(`Updating embedded user address: ${userId}, ${addressId}`);
try {
const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;
// Atomic update of embedded address document
const updateFields = {};
Object.keys(updateData).forEach(key => {
updateFields[`addresses.$.${key}`] = updateData[key];
});
updateFields['addresses.$.updatedAt'] = new Date();
updateFields['updatedAt'] = new Date();
const result = await userProfilesCollection.updateOne(
{
_id: new ObjectId(userId),
'addresses._id': new ObjectId(addressId)
},
{
$set: updateFields,
$inc: { version: 1 }
}
);
if (result.matchedCount === 0) {
throw new Error(`Address not found: ${addressId} for user ${userId}`);
}
console.log(`Embedded address updated: ${addressId} (atomic operation)`);
return {
addressId: addressId,
modelingPattern: 'embedded_documents',
atomicUpdate: true,
documentsModified: result.modifiedCount,
consistencyGuaranteed: true
};
} catch (error) {
console.error(`Error updating embedded address:`, error);
throw error;
}
}
async updateReferencedUserAddress(userId, addressId, updateData) {
console.log(`Updating referenced user address: ${userId}, ${addressId}`);
try {
const addressesCollection = this.modelingStrategies.get('users_referenced').collections.addresses;
// Update referenced address document
const result = await addressesCollection.updateOne(
{
_id: new ObjectId(addressId),
userId: new ObjectId(userId)
},
{
$set: {
...updateData,
updatedAt: new Date()
}
}
);
if (result.matchedCount === 0) {
throw new Error(`Address not found: ${addressId} for user ${userId}`);
}
// Potentially update user document timestamp (separate operation)
const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;
await usersCollection.updateOne(
{ _id: new ObjectId(userId) },
{ $set: { updatedAt: new Date() } }
);
console.log(`Referenced address updated: ${addressId} (separate operations)`);
return {
addressId: addressId,
modelingPattern: 'normalized_references',
atomicUpdate: false,
documentsModified: result.modifiedCount,
consistencyGuaranteed: false
};
} catch (error) {
console.error(`Error updating referenced address:`, error);
throw error;
}
}
async performComplexAggregation(pattern, aggregationQuery) {
console.log(`Performing complex aggregation with ${pattern} pattern`);
try {
let result;
const startTime = Date.now();
if (pattern === 'embedded') {
const collection = this.modelingStrategies.get('user_profiles_embedded').collection;
// Single collection aggregation pipeline
const pipeline = [
{ $match: aggregationQuery.match || {} },
// Unwind embedded arrays for aggregation
...(aggregationQuery.unwindAddresses ? [{ $unwind: '$addresses' }] : []),
...(aggregationQuery.unwindPreferences ? [{ $unwind: '$preferences' }] : []),
// Group and aggregate
{
$group: {
_id: aggregationQuery.groupBy || null,
userCount: { $sum: 1 },
avgProfileScore: { $avg: '$activitySummary.profileCompletionScore' },
totalAddresses: { $sum: { $size: '$addresses' } },
activeUsers: {
$sum: { $cond: [{ $eq: ['$status', 'active'] }, 1, 0] }
}
}
},
{ $sort: { userCount: -1 } },
{ $limit: aggregationQuery.limit || 100 }
];
result = await collection.aggregate(pipeline).toArray();
} else if (pattern === 'referenced') {
// Multi-collection aggregation with $lookup
const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;
const pipeline = [
{ $match: aggregationQuery.match || {} },
// Lookup addresses
{
$lookup: {
from: 'user_addresses_referenced',
localField: '_id',
foreignField: 'userId',
as: 'addresses'
}
},
// Lookup activities
{
$lookup: {
from: 'user_activities_referenced',
localField: '_id',
foreignField: 'userId',
as: 'activities'
}
},
// Group and aggregate
{
$group: {
_id: aggregationQuery.groupBy || null,
userCount: { $sum: 1 },
totalAddresses: { $sum: { $size: '$addresses' } },
totalActivities: { $sum: { $size: '$activities' } },
activeUsers: {
$sum: { $cond: [{ $eq: ['$status', 'active'] }, 1, 0] }
}
}
},
{ $sort: { userCount: -1 } },
{ $limit: aggregationQuery.limit || 100 }
];
result = await usersCollection.aggregate(pipeline).toArray();
}
const executionTime = Date.now() - startTime;
// Update aggregation metrics
await this.updateQueryMetrics(`${pattern}_aggregation`, 'complex_aggregation', {
executionTime: executionTime,
documentsProcessed: result.length,
pipelineStages: aggregationQuery.pipelineStages || 0
});
console.log(`${pattern} aggregation completed in ${executionTime}ms`);
return {
results: result,
modelingPattern: pattern,
executionTime: executionTime,
performanceProfile: executionTime < 100 ? 'optimal' : executionTime < 500 ? 'acceptable' : 'needs_optimization'
};
} catch (error) {
console.error(`Error performing ${pattern} aggregation:`, error);
throw error;
}
}
// Utility methods for document modeling optimization
calculateProfileCompleteness(userData) {
let score = 0;
// Basic information (50 points)
if (userData.firstName) score += 10;
if (userData.lastName) score += 10;
if (userData.email) score += 10;
if (userData.phoneNumber) score += 10;
if (userData.dateOfBirth) score += 10;
// Addresses (25 points)
if (userData.addresses?.length > 0) score += 25;
// Preferences (25 points)
if (userData.preferences?.length > 0) score += 25;
return Math.min(score, 100);
}
buildProjection(fields) {
const projection = {};
fields.forEach(field => {
projection[field] = 1;
});
return projection;
}
async updateModelingMetrics(strategy, operation, metadata) {
if (!this.config.enablePerformanceMonitoring) return;
const metrics = this.performanceMetrics.get(strategy) || {
totalOperations: 0,
operationTypes: {},
averageDocumentSize: 0,
performanceProfile: 'unknown'
};
metrics.totalOperations++;
metrics.operationTypes[operation] = (metrics.operationTypes[operation] || 0) + 1;
metrics.lastOperation = new Date();
if (metadata.documentsCreated) {
metrics.documentsCreated = (metrics.documentsCreated || 0) + metadata.documentsCreated;
}
this.performanceMetrics.set(strategy, metrics);
}
async updateQueryMetrics(strategy, queryType, metadata) {
if (!this.config.enableQueryAnalytics) return;
const queryMetrics = this.queryPatterns.get(strategy) || {
totalQueries: 0,
queryTypes: {},
averageQueryTime: 0,
performanceProfile: {}
};
queryMetrics.totalQueries++;
queryMetrics.queryTypes[queryType] = (queryMetrics.queryTypes[queryType] || 0) + 1;
if (metadata.queryTime) {
const currentAvg = queryMetrics.averageQueryTime || 0;
queryMetrics.averageQueryTime = (currentAvg + metadata.queryTime) / 2;
}
if (metadata.executionTime) {
queryMetrics.performanceProfile[queryType] = metadata.executionTime;
}
this.queryPatterns.set(strategy, queryMetrics);
}
async getModelingRecommendations(collectionName, queryPatterns) {
console.log(`Generating modeling recommendations for: ${collectionName}`);
const recommendations = {
currentPattern: 'unknown',
recommendedPattern: 'unknown',
reasoning: [],
tradeoffs: {},
migrationComplexity: 'unknown'
};
// Analyze query patterns
const embeddedQueries = queryPatterns.filter(q => q.type === 'find_complete_document').length;
const partialQueries = queryPatterns.filter(q => q.type === 'find_partial_data').length;
const updateFrequency = queryPatterns.filter(q => q.type === 'update_operation').length;
const aggregationComplexity = queryPatterns.filter(q => q.type === 'aggregation').length;
// Analyze data characteristics
const avgDocumentSize = queryPatterns.reduce((sum, q) => sum + (q.documentSize || 0), 0) / queryPatterns.length;
const dataGrowthRate = queryPatterns.reduce((sum, q) => sum + (q.growthRate || 0), 0) / queryPatterns.length;
// Generate recommendations based on patterns
if (embeddedQueries > partialQueries * 2 && avgDocumentSize < 16 * 1024 * 1024) {
recommendations.recommendedPattern = 'embedded_documents';
recommendations.reasoning.push('High frequency of complete document queries');
recommendations.reasoning.push('Document size within MongoDB limits');
if (updateFrequency > embeddedQueries * 0.3) {
recommendations.reasoning.push('Consider hybrid pattern due to high update frequency');
}
} else if (partialQueries > embeddedQueries && dataGrowthRate > 0.1) {
recommendations.recommendedPattern = 'normalized_references';
recommendations.reasoning.push('High frequency of partial data queries');
recommendations.reasoning.push('High data growth rate favors normalization');
} else if (aggregationComplexity > queryPatterns.length * 0.2) {
recommendations.recommendedPattern = 'hybrid_pattern';
recommendations.reasoning.push('Complex aggregation requirements');
recommendations.reasoning.push('Mixed access patterns detected');
}
// Define tradeoffs
recommendations.tradeoffs = {
embedded_documents: {
benefits: ['Single query performance', 'Atomic updates', 'Data locality'],
drawbacks: ['Document size growth', 'Potential duplication', 'Complex nested updates']
},
normalized_references: {
benefits: ['Data normalization', 'Independent scaling', 'Flexible querying'],
drawbacks: ['Multiple queries required', 'Application complexity', 'Consistency challenges']
},
hybrid_pattern: {
benefits: ['Optimized for mixed patterns', 'Balanced performance'],
drawbacks: ['Increased complexity', 'Mixed consistency models']
}
};
return recommendations;
}
async getPerformanceAnalysis() {
console.log('Generating performance analysis for modeling patterns...');
const analysis = {
embeddedPatterns: {},
referencedPatterns: {},
hybridPatterns: {},
recommendations: []
};
// Analyze embedded pattern performance
for (const [strategy, metrics] of this.performanceMetrics) {
if (strategy.includes('embedded')) {
analysis.embeddedPatterns[strategy] = {
totalOperations: metrics.totalOperations,
operationBreakdown: metrics.operationTypes,
averagePerformance: metrics.averageQueryTime || 0,
performanceRating: this.ratePerformance(metrics.averageQueryTime || 0)
};
} else if (strategy.includes('referenced')) {
analysis.referencedPatterns[strategy] = {
totalOperations: metrics.totalOperations,
operationBreakdown: metrics.operationTypes,
averagePerformance: metrics.averageQueryTime || 0,
performanceRating: this.ratePerformance(metrics.averageQueryTime || 0)
};
}
}
// Generate global recommendations
analysis.recommendations = [
'Use embedded documents for frequently co-accessed data',
'Use references for large or independently managed entities',
'Consider hybrid patterns for complex applications',
'Monitor document sizes to avoid 16MB limit',
'Optimize indexes based on query patterns'
];
return analysis;
}
ratePerformance(avgTime) {
if (avgTime < 10) return 'excellent';
if (avgTime < 50) return 'good';
if (avgTime < 200) return 'acceptable';
return 'needs_optimization';
}
async cleanup() {
console.log('Cleaning up Document Modeling Manager...');
this.modelingStrategies.clear();
this.performanceMetrics.clear();
this.queryPatterns.clear();
console.log('Document Modeling Manager cleanup completed');
}
}
// Example usage demonstrating embedded vs referenced patterns
async function demonstrateDocumentModelingPatterns() {
const client = new MongoClient('mongodb://localhost:27017');
await client.connect();
const modelingManager = new AdvancedDocumentModelingManager(client, {
database: 'document_modeling_demo',
enablePerformanceMonitoring: true,
enableQueryAnalytics: true
});
try {
// Sample user data for demonstration
const sampleUserData = {
email: '[email protected]',
username: 'johndoe123',
firstName: 'John',
lastName: 'Doe',
phoneNumber: '+1-555-0123',
dateOfBirth: new Date('1990-05-15'),
addresses: [
{
type: 'home',
streetAddress: '123 Main Street',
apartmentUnit: 'Apt 4B',
city: 'New York',
stateProvince: 'NY',
postalCode: '10001',
country: 'USA',
isPrimary: true,
isShipping: true
},
{
type: 'work',
streetAddress: '456 Corporate Blvd',
city: 'New York',
stateProvince: 'NY',
postalCode: '10002',
country: 'USA',
isBilling: true
}
],
preferences: [
{
category: 'notifications',
key: 'email_frequency',
value: 'daily',
dataType: 'string'
},
{
category: 'display',
key: 'theme',
value: 'dark',
dataType: 'string'
}
]
};
// Demonstrate embedded document pattern
console.log('Creating embedded user profile...');
const embeddedResult = await modelingManager.createEmbeddedUserProfile(sampleUserData);
console.log('Embedded Result:', embeddedResult);
// Demonstrate referenced pattern
console.log('Creating referenced user profile...');
const referencedResult = await modelingManager.createReferencedUserProfile(sampleUserData);
console.log('Referenced Result:', referencedResult);
// Demonstrate query performance differences
console.log('Comparing query performance...');
const embeddedQuery = await modelingManager.getUserProfileEmbedded(embeddedResult.userId);
console.log('Embedded Query Result:', {
pattern: embeddedQuery.modelingPattern,
queries: embeddedQuery.queriesExecuted,
optimized: embeddedQuery.performanceOptimized
});
const referencedQuery = await modelingManager.getUserProfileReferenced(referencedResult.userId);
console.log('Referenced Query Result:', {
pattern: referencedQuery.modelingPattern,
queries: referencedQuery.queriesExecuted,
optimized: referencedQuery.performanceOptimized
});
// Demonstrate update operations
console.log('Comparing update operations...');
const addressId = embeddedQuery.profile.addresses[0]._id;
const referencedAddressId = referencedResult.addressIds[0];
const embeddedUpdate = await modelingManager.updateEmbeddedUserAddress(
embeddedResult.userId,
addressId,
{ streetAddress: '789 Updated Street' }
);
console.log('Embedded Update:', embeddedUpdate);
const referencedUpdate = await modelingManager.updateReferencedUserAddress(
referencedResult.userId,
referencedAddressId,
{ streetAddress: '789 Updated Street' }
);
console.log('Referenced Update:', referencedUpdate);
// Demonstrate aggregation performance
console.log('Comparing aggregation performance...');
const embeddedAggregation = await modelingManager.performComplexAggregation('embedded', {
match: { status: 'active' },
groupBy: '$profileMetadata.theme',
limit: 10
});
const referencedAggregation = await modelingManager.performComplexAggregation('referenced', {
match: { status: 'active' },
groupBy: '$profileMetadata.theme',
limit: 10
});
console.log('Aggregation Comparison:', {
embedded: {
time: embeddedAggregation.executionTime,
profile: embeddedAggregation.performanceProfile
},
referenced: {
time: referencedAggregation.executionTime,
profile: referencedAggregation.performanceProfile
}
});
// Get performance analysis
const performanceAnalysis = await modelingManager.getPerformanceAnalysis();
console.log('Performance Analysis:', performanceAnalysis);
return {
embeddedResult,
referencedResult,
queryComparison: {
embedded: embeddedQuery,
referenced: referencedQuery
},
updateComparison: {
embedded: embeddedUpdate,
referenced: referencedUpdate
},
aggregationComparison: {
embedded: embeddedAggregation,
referenced: referencedAggregation
},
performanceAnalysis
};
} catch (error) {
console.error('Error demonstrating document modeling patterns:', error);
throw error;
} finally {
await modelingManager.cleanup();
await client.close();
}
}
// Benefits of MongoDB Flexible Document Modeling:
// - Embedded documents provide optimal query performance for frequently co-accessed data
// - Reference patterns enable normalized data structures and independent entity management
// - Hybrid patterns optimize for mixed access patterns and complex application requirements
// - Flexible schema evolution accommodates changing business requirements without migrations
// - Query optimization strategies can be tailored to specific data access patterns
// - Atomic operations available for embedded documents ensure data consistency
// - Application-level joins provide flexibility while maintaining performance where needed
// - Document size management enables balanced approaches between embedding and referencing
module.exports = {
AdvancedDocumentModelingManager,
demonstrateDocumentModelingPatterns
};
SQL-Style Document Modeling with QueryLeaf
QueryLeaf provides familiar SQL syntax for MongoDB document relationship management and modeling pattern optimization:
-- QueryLeaf document modeling with SQL-familiar embedded and reference pattern syntax
-- Configure document modeling optimization settings
SET enable_embedded_optimization = true;
SET enable_reference_optimization = true;
SET enable_hybrid_modeling = true;
SET document_size_monitoring = true;
SET query_pattern_analysis = true;
SET performance_monitoring = true;
-- Create embedded document pattern for frequently co-accessed data
WITH embedded_user_profiles AS (
INSERT INTO user_profiles_embedded
SELECT
GENERATE_UUID() as user_id,
'user' || generate_series(1, 1000) || '@example.com' as email,
'user' || generate_series(1, 1000) as username,
(ARRAY['John', 'Jane', 'Mike', 'Sarah', 'David'])[1 + floor(random() * 5)] as first_name,
(ARRAY['Smith', 'Johnson', 'Williams', 'Brown', 'Jones'])[1 + floor(random() * 5)] as last_name,
'+1-555-' || LPAD(floor(random() * 10000)::text, 4, '0') as phone_number,
CURRENT_DATE - (random() * 365 * 30 + 18 * 365)::int as date_of_birth,
'active' as status,
-- Embedded addresses array for optimal co-access
JSON_BUILD_ARRAY(
JSON_BUILD_OBJECT(
'_id', GENERATE_UUID(),
'type', 'home',
'streetAddress', floor(random() * 9999 + 1) || ' ' ||
(ARRAY['Main St', 'Oak Ave', 'First St', 'Second Ave', 'Third St'])[1 + floor(random() * 5)],
'apartmentUnit', CASE WHEN random() > 0.6 THEN 'Apt ' || (1 + floor(random() * 50))::text ELSE NULL END,
'city', (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[1 + floor(random() * 5)],
'stateProvince', (ARRAY['NY', 'CA', 'IL', 'TX', 'AZ'])[1 + floor(random() * 5)],
'postalCode', LPAD(floor(random() * 100000)::text, 5, '0'),
'country', 'USA',
'isPrimary', true,
'isBilling', random() > 0.5,
'isShipping', random() > 0.3,
'createdAt', CURRENT_TIMESTAMP,
'updatedAt', CURRENT_TIMESTAMP
),
-- Additional address if random condition met
CASE WHEN random() > 0.7 THEN
JSON_BUILD_OBJECT(
'_id', GENERATE_UUID(),
'type', 'work',
'streetAddress', floor(random() * 999 + 100) || ' Business Blvd',
'city', (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[1 + floor(random() * 5)],
'stateProvince', (ARRAY['NY', 'CA', 'IL', 'TX', 'AZ'])[1 + floor(random() * 5)],
'postalCode', LPAD(floor(random() * 100000)::text, 5, '0'),
'country', 'USA',
'isPrimary', false,
'isBilling', true,
'isShipping', false,
'createdAt', CURRENT_TIMESTAMP,
'updatedAt', CURRENT_TIMESTAMP
)
ELSE NULL END
) FILTER (WHERE JSON_BUILD_OBJECT IS NOT NULL) as addresses,
-- Embedded preferences for atomic updates
JSON_BUILD_ARRAY(
JSON_BUILD_OBJECT(
'_id', GENERATE_UUID(),
'category', 'notifications',
'key', 'email_frequency',
'value', (ARRAY['immediate', 'daily', 'weekly', 'never'])[1 + floor(random() * 4)],
'dataType', 'string',
'createdAt', CURRENT_TIMESTAMP,
'updatedAt', CURRENT_TIMESTAMP
),
JSON_BUILD_OBJECT(
'_id', GENERATE_UUID(),
'category', 'display',
'key', 'theme',
'value', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
'dataType', 'string',
'createdAt', CURRENT_TIMESTAMP,
'updatedAt', CURRENT_TIMESTAMP
),
JSON_BUILD_OBJECT(
'_id', GENERATE_UUID(),
'category', 'privacy',
'key', 'profile_visibility',
'value', (ARRAY['public', 'friends', 'private'])[1 + floor(random() * 3)],
'dataType', 'string',
'createdAt', CURRENT_TIMESTAMP,
'updatedAt', CURRENT_TIMESTAMP
)
) as preferences,
-- Embedded profile metadata for single-query access
JSON_BUILD_OBJECT(
'theme', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
'language', (ARRAY['en', 'es', 'fr', 'de'])[1 + floor(random() * 4)],
'timezone', (ARRAY['UTC', 'EST', 'PST', 'CST', 'MST'])[1 + floor(random() * 5)],
'notificationSettings', JSON_BUILD_OBJECT(
'email', random() > 0.2,
'push', random() > 0.3,
'sms', random() > 0.8
),
'privacySettings', JSON_BUILD_OBJECT(
'profileVisibility', (ARRAY['public', 'friends', 'private'])[1 + floor(random() * 3)],
'allowDirectMessages', random() > 0.1,
'shareActivityStatus', random() > 0.4
)
) as profile_metadata,
-- Embedded activity summary for performance
JSON_BUILD_OBJECT(
'totalLogins', floor(random() * 100),
'lastLoginAt', CURRENT_TIMESTAMP - (random() * INTERVAL '30 days'),
'lastActivityAt', CURRENT_TIMESTAMP - (random() * INTERVAL '7 days'),
'accountCreatedAt', CURRENT_TIMESTAMP - (random() * 365 + 30) * INTERVAL '1 day',
'profileCompletionScore', 70 + floor(random() * 30) -- 70-100%
) as activity_summary,
CURRENT_TIMESTAMP as created_at,
CURRENT_TIMESTAMP as updated_at,
1 as version
RETURNING user_id, email, username
),
-- Create normalized reference pattern for independent entity management
users_referenced AS (
INSERT INTO users_referenced
SELECT
GENERATE_UUID() as user_id,
'ref_user' || generate_series(1, 1000) || '@example.com' as email,
'ref_user' || generate_series(1, 1000) as username,
(ARRAY['Alice', 'Bob', 'Carol', 'David', 'Eve'])[1 + floor(random() * 5)] as first_name,
(ARRAY['Wilson', 'Davis', 'Miller', 'Moore', 'Taylor'])[1 + floor(random() * 5)] as last_name,
'+1-555-' || LPAD(floor(random() * 10000)::text, 4, '0') as phone_number,
CURRENT_DATE - (random() * 365 * 30 + 18 * 365)::int as date_of_birth,
'active' as status,
-- Basic profile metadata only (normalized approach)
JSON_BUILD_OBJECT(
'theme', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
'language', (ARRAY['en', 'es', 'fr', 'de'])[1 + floor(random() * 4)],
'timezone', (ARRAY['UTC', 'EST', 'PST', 'CST', 'MST'])[1 + floor(random() * 5)]
) as profile_metadata,
CURRENT_TIMESTAMP as created_at,
CURRENT_TIMESTAMP as updated_at,
1 as version
RETURNING user_id, email, username
),
-- Create separate referenced address documents
user_addresses_referenced AS (
INSERT INTO user_addresses_referenced
SELECT
GENERATE_UUID() as address_id,
ur.user_id,
-- Address type and details
(ARRAY['home', 'work', 'billing', 'shipping'])[1 + floor(random() * 4)] as type,
floor(random() * 9999 + 1) || ' ' ||
(ARRAY['Broadway', 'Park Ave', 'Wall St', 'Madison Ave', 'Fifth Ave'])[1 + floor(random() * 5)] as street_address,
CASE WHEN random() > 0.7 THEN 'Unit ' || (1 + floor(random() * 100))::text ELSE NULL END as apartment_unit,
(ARRAY['Boston', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas'])[1 + floor(random() * 5)] as city,
(ARRAY['MA', 'PA', 'TX', 'CA', 'TX'])[1 + floor(random() * 5)] as state_province,
LPAD(floor(random() * 100000)::text, 5, '0') as postal_code,
'USA' as country,
-- Address flags
row_number() OVER (PARTITION BY ur.user_id) = 1 as is_primary, -- First address is primary
random() > 0.6 as is_billing,
random() > 0.4 as is_shipping,
CURRENT_TIMESTAMP as created_at,
CURRENT_TIMESTAMP as updated_at
FROM users_referenced ur
CROSS JOIN generate_series(1, 1 + floor(random() * 2)::int) -- 1-3 addresses per user
RETURNING address_id, user_id, type
),
-- Create referenced user activities for independent tracking
user_activities_referenced AS (
INSERT INTO user_activities_referenced
SELECT
GENERATE_UUID() as activity_id,
ur.user_id,
-- Activity classification
(ARRAY['login', 'logout', 'page_view', 'action_performed', 'data_modified', 'error_occurred'])
[1 + floor(random() * 6)] as activity_type,
CURRENT_TIMESTAMP - (random() * INTERVAL '90 days') as activity_timestamp,
-- Activity details
JSON_BUILD_OBJECT(
'page', (ARRAY['/dashboard', '/profile', '/settings', '/reports', '/help'])[1 + floor(random() * 5)],
'action', (ARRAY['click', 'view', 'edit', 'save', 'delete'])[1 + floor(random() * 5)],
'duration', floor(random() * 300 + 5), -- 5-305 seconds
'userAgent', 'Mozilla/5.0 (Enterprise Browser)',
'ipAddress', '192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254))
) as activity_data,
(ARRAY['web', 'mobile', 'api', 'system'])[1 + floor(random() * 4)] as activity_source,
('192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254)))::inet as ip_address,
'Mozilla/5.0 (compatible; Enterprise App)' as user_agent,
-- Session and tracking
'session_' || floor(random() * 10000) as session_id,
'https://app.example.com' || (ARRAY['/dashboard', '/profile', '/settings'])[1 + floor(random() * 3)] as page_url,
-- Performance tracking
floor(random() * 500 + 50) as response_time_ms,
random() > 0.95 as error_occurred, -- 5% error rate
CASE WHEN random() > 0.95 THEN
JSON_BUILD_OBJECT('error', 'timeout', 'code', '500', 'message', 'Request timeout')
ELSE NULL END as error_details
FROM users_referenced ur
CROSS JOIN generate_series(1, floor(random() * 50 + 10)::int) -- 10-60 activities per user
RETURNING activity_id, user_id, activity_type, activity_timestamp
)
-- Query performance comparison between embedded and referenced patterns
SELECT
'EMBEDDED_PATTERN' as modeling_approach,
'Single document query for complete profile' as query_description,
1 as queries_required,
'Optimal - all data co-located' as performance_profile,
'Guaranteed - single document ACID' as consistency_model,
'Atomic updates possible' as update_characteristics,
'Potential 16MB limit concern' as scalability_considerations
UNION ALL
SELECT
'REFERENCED_PATTERN' as modeling_approach,
'Multiple queries required for complete profile' as query_description,
3 as queries_required,
'Moderate - requires joins/lookups' as performance_profile,
'Eventual - across multiple documents' as consistency_model,
'Independent entity updates' as update_characteristics,
'Unlimited growth potential' as scalability_considerations;
-- Demonstrate embedded document queries (single collection access)
WITH embedded_query_patterns AS (
-- Single query retrieves complete user profile with all related data
SELECT
user_id,
email,
first_name,
last_name,
-- Extract embedded address information
JSON_ARRAY_LENGTH(addresses) as total_addresses,
JSON_EXTRACT_PATH_TEXT(addresses, '0', 'city') as primary_city,
JSON_EXTRACT_PATH_TEXT(addresses, '0', 'stateProvince') as primary_state,
-- Extract embedded preferences
JSON_ARRAY_LENGTH(preferences) as total_preferences,
-- Extract activity summary (embedded for performance)
CAST(JSON_EXTRACT_PATH_TEXT(activity_summary, 'totalLogins') AS INTEGER) as total_logins,
TO_TIMESTAMP(JSON_EXTRACT_PATH_TEXT(activity_summary, 'lastLoginAt'), 'YYYY-MM-DD"T"HH24:MI:SS.MS"Z"') as last_login,
CAST(JSON_EXTRACT_PATH_TEXT(activity_summary, 'profileCompletionScore') AS INTEGER) as completion_score,
-- Performance metrics
1 as documents_accessed,
0 as join_operations_required,
'immediate' as consistency_guarantee,
-- Query classification
'embedded_single_document' as query_pattern,
'optimal_performance' as performance_classification
FROM user_profiles_embedded
WHERE status = 'active'
AND JSON_EXTRACT_PATH_TEXT(profile_metadata, 'theme') = 'dark'
LIMIT 100
),
-- Demonstrate referenced pattern queries (multiple collection access required)
referenced_query_patterns AS (
-- Multiple queries required to reconstruct complete user profile
SELECT
u.user_id,
u.email,
u.first_name,
u.last_name,
-- Address information requires separate query/join
COUNT(addr.address_id) as total_addresses,
addr_primary.city as primary_city,
addr_primary.state_province as primary_state,
-- Activity summary requires aggregation from separate collection
COUNT(act.activity_id) as total_activities,
MAX(act.activity_timestamp) FILTER (WHERE act.activity_type = 'login') as last_login,
COUNT(act.activity_id) FILTER (WHERE act.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days') as recent_activities,
-- Performance metrics
3 as documents_accessed, -- Users + Addresses + Activities
2 as join_operations_required,
'eventual' as consistency_guarantee,
-- Query classification
'referenced_multi_document' as query_pattern,
'moderate_performance' as performance_classification
FROM users_referenced u
LEFT JOIN user_addresses_referenced addr ON u.user_id = addr.user_id
LEFT JOIN user_addresses_referenced addr_primary ON u.user_id = addr_primary.user_id AND addr_primary.is_primary = true
LEFT JOIN user_activities_referenced act ON u.user_id = act.user_id
WHERE u.status = 'active'
AND JSON_EXTRACT_PATH_TEXT(u.profile_metadata, 'theme') = 'dark'
GROUP BY u.user_id, u.email, u.first_name, u.last_name, addr_primary.city, addr_primary.state_province
LIMIT 100
),
-- Performance analysis and comparison
modeling_performance_analysis AS (
SELECT
query_pattern,
performance_classification,
AVG(documents_accessed) as avg_documents_per_query,
AVG(join_operations_required) as avg_joins_per_query,
COUNT(*) as total_queries_analyzed,
-- Performance scoring
CASE
WHEN AVG(documents_accessed) = 1 AND AVG(join_operations_required) = 0 THEN 'excellent'
WHEN AVG(documents_accessed) <= 3 AND AVG(join_operations_required) <= 2 THEN 'good'
WHEN AVG(documents_accessed) <= 5 AND AVG(join_operations_required) <= 4 THEN 'acceptable'
ELSE 'needs_optimization'
END as overall_performance_rating,
-- Consistency analysis
MODE() WITHIN GROUP (ORDER BY consistency_guarantee) as primary_consistency_model,
-- Scalability assessment
CASE
WHEN query_pattern = 'embedded_single_document' THEN 'Limited by 16MB document size'
WHEN query_pattern = 'referenced_multi_document' THEN 'Unlimited horizontal scaling'
ELSE 'Hybrid scaling characteristics'
END as scalability_profile
FROM (
SELECT * FROM embedded_query_patterns
UNION ALL
SELECT * FROM referenced_query_patterns
) combined_patterns
GROUP BY query_pattern, performance_classification
),
-- Document modeling recommendations based on query patterns
modeling_recommendations AS (
SELECT
mpa.query_pattern,
mpa.overall_performance_rating,
mpa.scalability_profile,
mpa.primary_consistency_model,
-- Use case recommendations
CASE
WHEN mpa.query_pattern = 'embedded_single_document' THEN
JSON_BUILD_ARRAY(
'Optimal for frequently co-accessed related data',
'Best for read-heavy workloads with complete document queries',
'Ideal for maintaining ACID guarantees across related entities',
'Suitable for moderate data growth with stable relationships'
)
WHEN mpa.query_pattern = 'referenced_multi_document' THEN
JSON_BUILD_ARRAY(
'Best for large datasets with independent entity management',
'Optimal for write-heavy workloads with frequent partial updates',
'Ideal for applications requiring flexible schema evolution',
'Suitable for unlimited horizontal scaling requirements'
)
ELSE
JSON_BUILD_ARRAY(
'Consider hybrid approach for mixed access patterns',
'Evaluate specific query requirements for optimization',
'Balance performance and scalability based on use case'
)
END as use_case_recommendations,
-- Performance optimization strategies
CASE mpa.overall_performance_rating
WHEN 'excellent' THEN 'Continue current approach with monitoring'
WHEN 'good' THEN 'Minor optimizations possible through indexing'
WHEN 'acceptable' THEN 'Consider query pattern optimization or hybrid approach'
ELSE 'Significant architectural changes recommended'
END as optimization_strategy,
-- Specific implementation guidance
JSON_BUILD_OBJECT(
'indexing_strategy',
CASE
WHEN mpa.query_pattern = 'embedded_single_document' THEN 'Compound indexes on embedded fields'
ELSE 'Reference field optimization with lookup performance'
END,
'consistency_approach',
CASE mpa.primary_consistency_model
WHEN 'immediate' THEN 'Single document transactions available'
ELSE 'Application-level consistency management required'
END,
'scaling_considerations',
CASE
WHEN mpa.scalability_profile LIKE '%16MB%' THEN 'Monitor document sizes and consider archival strategies'
ELSE 'Plan for horizontal scaling and sharding strategies'
END
) as implementation_guidance
FROM modeling_performance_analysis mpa
)
-- Comprehensive document modeling strategy dashboard
SELECT
mr.query_pattern,
mr.overall_performance_rating,
mr.primary_consistency_model,
mr.optimization_strategy,
-- Performance characteristics
mpa.avg_documents_per_query as avg_docs_per_query,
mpa.avg_joins_per_query as avg_joins_required,
mpa.total_queries_analyzed,
-- Architectural guidance
mr.use_case_recommendations,
mr.implementation_guidance,
-- Decision matrix
CASE
WHEN mr.query_pattern = 'embedded_single_document' AND mr.overall_performance_rating = 'excellent' THEN
'RECOMMENDED: Use embedded documents for this use case'
WHEN mr.query_pattern = 'referenced_multi_document' AND mr.scalability_profile LIKE '%Unlimited%' THEN
'RECOMMENDED: Use referenced pattern for scalability requirements'
ELSE
'EVALUATE: Consider hybrid approach or further analysis'
END as architectural_recommendation,
-- Implementation priorities
JSON_BUILD_OBJECT(
'immediate_actions',
CASE mr.overall_performance_rating
WHEN 'needs_optimization' THEN JSON_BUILD_ARRAY('Review query patterns', 'Optimize indexing', 'Consider architectural changes')
WHEN 'acceptable' THEN JSON_BUILD_ARRAY('Monitor performance trends', 'Optimize critical queries')
ELSE JSON_BUILD_ARRAY('Continue monitoring', 'Plan for growth')
END,
'monitoring_focus',
CASE
WHEN mr.query_pattern = 'embedded_single_document' THEN 'Document size growth and query performance'
ELSE 'Join performance and data consistency'
END,
'success_metrics',
JSON_BUILD_OBJECT(
'performance_target', CASE mr.overall_performance_rating WHEN 'excellent' THEN 'maintain' ELSE 'improve' END,
'consistency_requirement', mr.primary_consistency_model,
'scalability_readiness',
CASE WHEN mr.scalability_profile LIKE '%Unlimited%' THEN 'high' ELSE 'moderate' END
)
) as implementation_roadmap
FROM modeling_recommendations mr
JOIN modeling_performance_analysis mpa ON mr.query_pattern = mpa.query_pattern
ORDER BY
CASE mr.overall_performance_rating
WHEN 'excellent' THEN 1
WHEN 'good' THEN 2
WHEN 'acceptable' THEN 3
ELSE 4
END,
mpa.avg_documents_per_query ASC;
-- QueryLeaf provides comprehensive MongoDB document modeling capabilities:
-- 1. Embedded document patterns for optimal query performance and data locality
-- 2. Referenced patterns for normalized structures and independent entity scaling
-- 3. Hybrid modeling strategies combining embedding and referencing for complex requirements
-- 4. Performance analysis and optimization recommendations based on query patterns
-- 5. SQL-familiar syntax for document relationship management and pattern selection
-- 6. Comprehensive modeling analytics with performance profiling and scalability assessment
-- 7. Automated recommendations for optimal modeling patterns based on access requirements
-- 8. Enterprise-grade consistency and performance monitoring for production deployments
-- 9. Flexible schema evolution support with minimal application impact
-- 10. Advanced query optimization techniques tailored to document modeling patterns
Best Practices for MongoDB Document Modeling Implementation
Strategic Modeling Decisions
Essential practices for making optimal embedded vs referenced modeling decisions:
- Query Pattern Analysis: Design document structure based on actual application query patterns and data access requirements
- Data Growth Assessment: Evaluate data growth patterns to prevent document size issues with embedded patterns
- Update Frequency Analysis: Consider update patterns when deciding between atomic embedded updates and independent referenced updates
- Consistency Requirements: Choose modeling patterns based on consistency requirements and transaction scope needs
- Performance Baseline Establishment: Establish performance baselines for different modeling approaches with realistic data volumes
- Scalability Planning: Design modeling strategies that accommodate expected growth in data volume and query complexity
Production Optimization and Management
Optimize document modeling for enterprise-scale applications:
- Index Strategy Optimization: Design indexes that support both embedded field queries and reference lookups efficiently
- Document Size Monitoring: Implement monitoring for document sizes to prevent 16MB limit issues with embedded patterns
- Query Performance Analysis: Continuously analyze query performance across different modeling patterns for optimization opportunities
- Migration Planning: Plan for potential modeling pattern changes as application requirements evolve
- Consistency Management: Implement appropriate consistency management strategies for referenced patterns
- Monitoring and Alerting: Establish comprehensive monitoring for performance, consistency, and scalability metrics
Conclusion
MongoDB's flexible document modeling provides powerful options for optimizing data relationships through embedded documents, references, or hybrid approaches. The choice between embedding and referencing depends on specific query patterns, consistency requirements, scalability needs, and performance objectives. Understanding these tradeoffs enables architects to design optimal data models that balance performance, scalability, and maintainability.
Key MongoDB Document Modeling benefits include:
- Performance Optimization: Choose modeling patterns that optimize for specific query patterns and data access requirements
- Flexible Relationships: Model relationships using the approach that best fits application needs rather than rigid normalization rules
- ACID Guarantees: Leverage single-document ACID properties for embedded patterns or manage consistency for referenced patterns
- Scalability Options: Scale using approaches appropriate to data growth patterns and access requirements
- Schema Evolution: Evolve document structures as requirements change without expensive migration procedures
- SQL Accessibility: Manage document relationships using familiar SQL-style syntax and optimization techniques
Whether you're building user management systems, content platforms, e-commerce applications, or analytics systems, MongoDB's document modeling flexibility with QueryLeaf's familiar SQL interface provides the foundation for scalable, performant, and maintainable data architectures.
QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB document relationships while providing SQL-familiar syntax for embedded and referenced pattern management. Advanced modeling strategies, performance analysis, and optimization recommendations are seamlessly accessible through familiar SQL constructs, making sophisticated document relationship management both powerful and approachable for SQL-oriented development teams.
The combination of MongoDB's flexible document modeling with SQL-style relationship management makes it an ideal platform for applications requiring both optimal query performance and familiar operational patterns, ensuring your data architecture can adapt to changing requirements while maintaining performance excellence and development productivity.