MongoDB Time-Series Data Management: SQL-Style Analytics for IoT and Metrics
Time-series data represents one of the fastest-growing data types in modern applications. From IoT sensor readings and application performance metrics to financial market data and user activity logs, time-series collections require specialized storage strategies and query patterns for optimal performance.
MongoDB's native time-series collections, introduced in version 5.0, provide powerful capabilities for storing and analyzing temporal data. Combined with SQL-style query patterns, you can build efficient time-series applications that scale to millions of data points while maintaining familiar development patterns.
The Time-Series Challenge
Consider an IoT monitoring system collecting data from thousands of sensors across multiple facilities. Each sensor generates readings every minute, creating millions of documents daily:
// Traditional document structure - inefficient for time-series
{
"_id": ObjectId("..."),
"sensor_id": "temp_001",
"facility": "warehouse_A",
"measurement_type": "temperature",
"value": 23.5,
"unit": "celsius",
"timestamp": ISODate("2025-08-24T14:30:00Z"),
"location": {
"building": "A",
"floor": 2,
"room": "storage_1"
}
}
Storing time-series data in regular collections leads to several problems:
-- SQL queries on regular collections become inefficient
SELECT
sensor_id,
AVG(value) AS avg_temp,
MAX(value) AS max_temp,
MIN(value) AS min_temp
FROM sensor_readings
WHERE measurement_type = 'temperature'
AND timestamp >= '2025-08-24 00:00:00'
AND timestamp < '2025-08-25 00:00:00'
GROUP BY sensor_id, DATE_TRUNC('hour', timestamp);
-- Problems:
-- - Poor compression (repetitive metadata)
-- - Inefficient indexing for temporal queries
-- - Slow aggregations across time ranges
-- - High storage overhead
MongoDB Time-Series Collections
MongoDB time-series collections optimize storage and query performance for temporal data:
// Create optimized time-series collection
db.createCollection("sensor_readings", {
timeseries: {
timeField: "timestamp", // Required: timestamp field
metaField: "metadata", // Optional: unchanging metadata
granularity: "minutes" // Optional: seconds, minutes, hours
}
})
// Optimized document structure
{
"timestamp": ISODate("2025-08-24T14:30:00Z"),
"temperature": 23.5,
"humidity": 65.2,
"pressure": 1013.25,
"metadata": {
"sensor_id": "env_001",
"facility": "warehouse_A",
"location": {
"building": "A",
"floor": 2,
"room": "storage_1"
},
"sensor_type": "environmental"
}
}
Benefits of time-series collections:
- 10x Storage Compression: Efficient bucketing and compression
- Faster Queries: Optimized indexes for temporal ranges
- Better Performance: Specialized aggregation pipeline optimization
- Automatic Bucketing: MongoDB groups documents by time ranges
SQL-Style Time-Series Queries
Basic Temporal Filtering
Query recent sensor data with familiar SQL patterns:
-- Get last 24 hours of temperature readings
SELECT
metadata.sensor_id,
metadata.location.room,
timestamp,
temperature,
humidity
FROM sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
AND metadata.sensor_type = 'environmental'
ORDER BY timestamp DESC
LIMIT 1000;
-- Equivalent time range query
SELECT *
FROM sensor_readings
WHERE timestamp BETWEEN '2025-08-24 00:00:00' AND '2025-08-24 23:59:59'
AND metadata.facility = 'warehouse_A';
Temporal Aggregations
Perform time-based analytics using SQL aggregation functions:
-- Hourly temperature averages by location
SELECT
metadata.location.building,
metadata.location.floor,
DATE_TRUNC('hour', timestamp) AS hour,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp,
MIN(temperature) AS min_temp,
COUNT(*) AS reading_count
FROM sensor_readings
WHERE timestamp >= '2025-08-24 00:00:00'
AND metadata.sensor_type = 'environmental'
GROUP BY
metadata.location.building,
metadata.location.floor,
DATE_TRUNC('hour', timestamp)
ORDER BY hour DESC, building, floor;
-- Daily facility summaries
SELECT
metadata.facility,
DATE(timestamp) AS date,
AVG(temperature) AS avg_daily_temp,
STDDEV(temperature) AS temp_variance,
COUNT(DISTINCT metadata.sensor_id) AS active_sensors
FROM sensor_readings
WHERE timestamp >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY metadata.facility, DATE(timestamp)
ORDER BY date DESC, facility;
Advanced Time-Series Patterns
Moving Averages and Windowing
Calculate sliding windows for trend analysis:
-- 10-minute moving average temperature
WITH moving_avg AS (
SELECT
metadata.sensor_id,
timestamp,
temperature,
AVG(temperature) OVER (
PARTITION BY metadata.sensor_id
ORDER BY timestamp
ROWS BETWEEN 9 PRECEDING AND CURRENT ROW
) AS moving_avg_10min
FROM sensor_readings
WHERE timestamp >= '2025-08-24 12:00:00'
AND timestamp <= '2025-08-24 18:00:00'
AND metadata.sensor_type = 'environmental'
)
SELECT
sensor_id,
timestamp,
temperature,
moving_avg_10min,
temperature - moving_avg_10min AS deviation
FROM moving_avg
WHERE ABS(temperature - moving_avg_10min) > 2.0 -- Anomaly detection
ORDER BY sensor_id, timestamp;
Time-Series Interpolation
Fill gaps in time-series data with interpolated values:
-- Generate hourly time series with interpolation
WITH time_grid AS (
SELECT generate_series(
'2025-08-24 00:00:00'::timestamp,
'2025-08-24 23:59:59'::timestamp,
'1 hour'::interval
) AS hour
),
sensor_hourly AS (
SELECT
metadata.sensor_id,
DATE_TRUNC('hour', timestamp) AS hour,
AVG(temperature) AS avg_temp,
COUNT(*) AS reading_count
FROM sensor_readings
WHERE timestamp >= '2025-08-24 00:00:00'
AND timestamp < '2025-08-25 00:00:00'
AND metadata.facility = 'warehouse_A'
GROUP BY metadata.sensor_id, DATE_TRUNC('hour', timestamp)
)
SELECT
tg.hour,
sh.sensor_id,
COALESCE(
sh.avg_temp,
LAG(sh.avg_temp) OVER (PARTITION BY sh.sensor_id ORDER BY tg.hour)
) AS temperature,
sh.reading_count
FROM time_grid tg
LEFT JOIN sensor_hourly sh ON tg.hour = sh.hour
WHERE sh.sensor_id IS NOT NULL
ORDER BY sensor_id, hour;
Application Performance Monitoring
Time-series collections excel at storing application metrics and performance data:
// APM document structure
{
"timestamp": ISODate("2025-08-24T14:30:15Z"),
"response_time": 245,
"request_count": 1,
"error_count": 0,
"cpu_usage": 45.2,
"memory_usage": 1024.5,
"metadata": {
"service": "user-api",
"version": "v2.1.4",
"instance": "api-server-03",
"environment": "production",
"datacenter": "us-east-1"
}
}
Performance Analytics Queries
-- Service performance dashboard
SELECT
metadata.service,
metadata.environment,
DATE_TRUNC('minute', timestamp) AS minute,
AVG(response_time) AS avg_response_ms,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_time) AS p95_response_ms,
SUM(request_count) AS total_requests,
SUM(error_count) AS total_errors,
CASE
WHEN SUM(request_count) > 0
THEN (SUM(error_count) * 100.0 / SUM(request_count))
ELSE 0
END AS error_rate_pct
FROM performance_metrics
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
AND metadata.environment = 'production'
GROUP BY
metadata.service,
metadata.environment,
DATE_TRUNC('minute', timestamp)
ORDER BY minute DESC, service;
-- Resource utilization trends
SELECT
metadata.instance,
DATE_TRUNC('hour', timestamp) AS hour,
MAX(cpu_usage) AS peak_cpu,
MAX(memory_usage) AS peak_memory_mb,
AVG(cpu_usage) AS avg_cpu,
AVG(memory_usage) AS avg_memory_mb
FROM performance_metrics
WHERE timestamp >= CURRENT_DATE - INTERVAL '7 days'
AND metadata.service = 'user-api'
GROUP BY metadata.instance, DATE_TRUNC('hour', timestamp)
ORDER BY hour DESC, instance;
Anomaly Detection
Identify performance anomalies using statistical analysis:
-- Detect response time anomalies
WITH performance_stats AS (
SELECT
metadata.service,
AVG(response_time) AS avg_response,
STDDEV(response_time) AS stddev_response
FROM performance_metrics
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
AND metadata.environment = 'production'
GROUP BY metadata.service
),
recent_metrics AS (
SELECT
metadata.service,
timestamp,
response_time,
metadata.instance
FROM performance_metrics
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
AND metadata.environment = 'production'
)
SELECT
rm.service,
rm.timestamp,
rm.instance,
rm.response_time,
ps.avg_response,
(rm.response_time - ps.avg_response) / ps.stddev_response AS z_score,
CASE
WHEN ABS((rm.response_time - ps.avg_response) / ps.stddev_response) > 3
THEN 'CRITICAL_ANOMALY'
WHEN ABS((rm.response_time - ps.avg_response) / ps.stddev_response) > 2
THEN 'WARNING_ANOMALY'
ELSE 'NORMAL'
END AS anomaly_status
FROM recent_metrics rm
JOIN performance_stats ps ON rm.service = ps.service
WHERE ABS((rm.response_time - ps.avg_response) / ps.stddev_response) > 2
ORDER BY ABS((rm.response_time - ps.avg_response) / ps.stddev_response) DESC;
Financial Time-Series Data
Handle high-frequency trading data and market analytics:
// Market data structure
{
"timestamp": ISODate("2025-08-24T14:30:15.123Z"),
"open": 150.25,
"high": 150.75,
"low": 150.10,
"close": 150.60,
"volume": 1250,
"metadata": {
"symbol": "AAPL",
"exchange": "NASDAQ",
"data_provider": "market_feed_01",
"market_session": "regular"
}
}
Financial Analytics
-- OHLCV data with technical indicators
WITH price_data AS (
SELECT
metadata.symbol,
timestamp,
close,
volume,
LAG(close, 1) OVER (
PARTITION BY metadata.symbol
ORDER BY timestamp
) AS prev_close,
AVG(close) OVER (
PARTITION BY metadata.symbol
ORDER BY timestamp
ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
) AS sma_20,
AVG(close) OVER (
PARTITION BY metadata.symbol
ORDER BY timestamp
ROWS BETWEEN 49 PRECEDING AND CURRENT ROW
) AS sma_50
FROM market_data
WHERE timestamp >= '2025-08-24 09:30:00'
AND timestamp <= '2025-08-24 16:00:00'
AND metadata.exchange = 'NASDAQ'
)
SELECT
symbol,
timestamp,
close,
volume,
CASE
WHEN prev_close > 0
THEN ((close - prev_close) / prev_close * 100)
ELSE 0
END AS price_change_pct,
sma_20,
sma_50,
CASE
WHEN sma_20 > sma_50 THEN 'BULLISH_SIGNAL'
WHEN sma_20 < sma_50 THEN 'BEARISH_SIGNAL'
ELSE 'NEUTRAL'
END AS trend_signal
FROM price_data
WHERE sma_50 IS NOT NULL -- Ensure we have enough data
ORDER BY symbol, timestamp DESC;
-- Trading volume analysis
SELECT
metadata.symbol,
DATE(timestamp) AS trading_date,
COUNT(*) AS tick_count,
SUM(volume) AS total_volume,
AVG(volume) AS avg_volume_per_tick,
MAX(high) AS daily_high,
MIN(low) AS daily_low,
FIRST_VALUE(open) OVER (
PARTITION BY metadata.symbol, DATE(timestamp)
ORDER BY timestamp
) AS daily_open,
LAST_VALUE(close) OVER (
PARTITION BY metadata.symbol, DATE(timestamp)
ORDER BY timestamp
) AS daily_close
FROM market_data
WHERE timestamp >= '2025-08-01'
AND metadata.market_session = 'regular'
GROUP BY metadata.symbol, DATE(timestamp)
ORDER BY trading_date DESC, symbol;
Performance Optimization Strategies
Efficient Indexing for Time-Series
// Create optimized indexes for time-series queries
db.sensor_readings.createIndex({
"metadata.facility": 1,
"timestamp": 1
})
db.sensor_readings.createIndex({
"metadata.sensor_id": 1,
"timestamp": 1
})
db.performance_metrics.createIndex({
"metadata.service": 1,
"metadata.environment": 1,
"timestamp": 1
})
SQL equivalent for index planning:
-- Index recommendations for common time-series queries
CREATE INDEX idx_sensor_facility_time ON sensor_readings (
(metadata.facility),
timestamp DESC
);
CREATE INDEX idx_sensor_id_time ON sensor_readings (
(metadata.sensor_id),
timestamp DESC
);
-- Covering index for performance metrics
CREATE INDEX idx_perf_service_env_time_covering ON performance_metrics (
(metadata.service),
(metadata.environment),
timestamp DESC
) INCLUDE (response_time, request_count, error_count);
Data Retention and Partitioning
Implement time-based data lifecycle management:
-- Automated data retention
WITH old_data AS (
SELECT _id
FROM sensor_readings
WHERE timestamp < CURRENT_DATE - INTERVAL '90 days'
LIMIT 10000 -- Batch deletion
)
DELETE FROM sensor_readings
WHERE _id IN (SELECT _id FROM old_data);
-- Archive old data before deletion
INSERT INTO sensor_readings_archive
SELECT * FROM sensor_readings
WHERE timestamp >= CURRENT_DATE - INTERVAL '90 days'
AND timestamp < CURRENT_DATE - INTERVAL '30 days';
-- Create summary tables for historical data
INSERT INTO daily_sensor_summaries (
date,
sensor_id,
facility,
avg_temperature,
max_temperature,
min_temperature,
reading_count
)
SELECT
DATE(timestamp) AS date,
metadata.sensor_id,
metadata.facility,
AVG(temperature) AS avg_temperature,
MAX(temperature) AS max_temperature,
MIN(temperature) AS min_temperature,
COUNT(*) AS reading_count
FROM sensor_readings
WHERE timestamp >= CURRENT_DATE - INTERVAL '1 day'
AND timestamp < CURRENT_DATE
GROUP BY
DATE(timestamp),
metadata.sensor_id,
metadata.facility;
Real-Time Monitoring and Alerts
Threshold-Based Alerting
-- Real-time temperature monitoring
SELECT
metadata.sensor_id,
metadata.location.building,
metadata.location.room,
timestamp,
temperature,
CASE
WHEN temperature > 35 THEN 'CRITICAL_HIGH'
WHEN temperature > 30 THEN 'WARNING_HIGH'
WHEN temperature < 5 THEN 'CRITICAL_LOW'
WHEN temperature < 10 THEN 'WARNING_LOW'
ELSE 'NORMAL'
END AS alert_level
FROM sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
AND metadata.sensor_type = 'environmental'
AND (temperature > 30 OR temperature < 10)
ORDER BY
CASE
WHEN temperature > 35 OR temperature < 5 THEN 1
ELSE 2
END,
timestamp DESC;
-- Service health monitoring
SELECT
metadata.service,
metadata.instance,
AVG(response_time) AS avg_response,
SUM(error_count) AS error_count,
SUM(request_count) AS request_count,
CASE
WHEN SUM(request_count) > 0
THEN (SUM(error_count) * 100.0 / SUM(request_count))
ELSE 0
END AS error_rate,
CASE
WHEN AVG(response_time) > 1000 THEN 'CRITICAL_SLOW'
WHEN AVG(response_time) > 500 THEN 'WARNING_SLOW'
WHEN SUM(error_count) * 100.0 / NULLIF(SUM(request_count), 0) > 5 THEN 'HIGH_ERROR_RATE'
ELSE 'HEALTHY'
END AS health_status
FROM performance_metrics
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
AND metadata.environment = 'production'
GROUP BY metadata.service, metadata.instance
HAVING health_status != 'HEALTHY'
ORDER BY
CASE health_status
WHEN 'CRITICAL_SLOW' THEN 1
WHEN 'HIGH_ERROR_RATE' THEN 2
WHEN 'WARNING_SLOW' THEN 3
END,
error_rate DESC;
QueryLeaf Time-Series Integration
QueryLeaf automatically optimizes time-series queries and provides intelligent query planning:
-- QueryLeaf handles time-series collection optimization automatically
WITH hourly_metrics AS (
SELECT
metadata.facility,
DATE_TRUNC('hour', timestamp) AS hour,
AVG(temperature) AS avg_temp,
AVG(humidity) AS avg_humidity,
COUNT(*) AS reading_count,
COUNT(DISTINCT metadata.sensor_id) AS sensor_count
FROM sensor_readings
WHERE timestamp >= CURRENT_DATE - INTERVAL '7 days'
AND metadata.sensor_type = 'environmental'
GROUP BY metadata.facility, DATE_TRUNC('hour', timestamp)
)
SELECT
facility,
hour,
avg_temp,
avg_humidity,
reading_count,
sensor_count,
LAG(avg_temp) OVER (
PARTITION BY facility
ORDER BY hour
) AS prev_hour_temp,
avg_temp - LAG(avg_temp) OVER (
PARTITION BY facility
ORDER BY hour
) AS temp_change
FROM hourly_metrics
WHERE hour >= CURRENT_DATE - INTERVAL '24 hours'
ORDER BY facility, hour DESC;
-- QueryLeaf automatically:
-- 1. Uses time-series collection bucketing
-- 2. Optimizes temporal range queries
-- 3. Leverages efficient aggregation pipelines
-- 4. Provides index recommendations
-- 5. Handles metadata field queries optimally
Best Practices for Time-Series Collections
- Choose Appropriate Granularity: Match collection granularity to your query patterns
- Design Efficient Metadata: Store unchanging data in the metaField for better compression
- Use Compound Indexes: Create indexes that support your most common query patterns
- Implement Data Lifecycle: Plan for data retention and archival strategies
- Monitor Performance: Track query patterns and adjust indexes accordingly
- Batch Operations: Use bulk inserts and updates for better throughput
Conclusion
MongoDB time-series collections, combined with SQL-style query patterns, provide powerful capabilities for managing temporal data at scale. Whether you're building IoT monitoring systems, application performance dashboards, or financial analytics platforms, proper time-series design ensures optimal performance and storage efficiency.
Key advantages of SQL-style time-series management:
- Familiar Syntax: Use well-understood SQL patterns for temporal queries
- Automatic Optimization: MongoDB handles bucketing and compression transparently
- Scalable Analytics: Perform complex aggregations on millions of time-series data points
- Flexible Schema: Leverage document model flexibility with time-series performance
- Real-Time Insights: Build responsive monitoring and alerting systems
The combination of MongoDB's optimized time-series storage with QueryLeaf's intuitive SQL interface creates an ideal platform for modern time-series applications. You get the performance benefits of specialized time-series databases with the development familiarity of SQL and the operational simplicity of MongoDB.
Whether you're tracking sensor data, monitoring application performance, or analyzing market trends, SQL-style time-series queries make complex temporal analytics accessible while maintaining the performance characteristics needed for production-scale systems.