When Cosmos DB Gets Expensive for Feature Stores
A data engineer messaged me last week: “We’re spending $4K/month on Cosmos DB for our feature store. Reads are steady, but the bill feels off.”
I asked about their architecture. Almost 90% of the bill was from loading data, not serving reads.
This is a classic Cosmos DB surprise.
The Feature Store Pattern
Here’s the standard ML feature store architecture:
- Daily Spark job computes features from raw data
- Write snapshot to Blob (parquet files, checkpoints)
- Load snapshot into Cosmos DB for serving
- ML models read features during inference
The workflow makes sense. Blob is cheap for batch writes. Cosmos DB is fast for point reads. But step 3–loading 1TB into Cosmos DB every day–is where the bill explodes.
A Real-World Scenario
Specs:
- 1TB feature store (embeddings, signals, model inputs)
- Daily full refresh (Spark job computes new features every 24 hours)
- 2,000 reads/sec (model inference hitting the store)
- Average item size: 1KB
Nothing unusual. This is a typical mid-size feature store.
Where The Money Goes
Cosmos DB pricing is RU-based. For 1 KB items, Microsoft’s docs estimate 1 RU per read and 5 RUs per write, and give effective costs of $0.022 per million reads and $0.111 per million writes for standard provisioned throughput in US/EU (docs).
Here’s the cost breakdown:
| Cost Component | Calculation | Monthly Cost | % of Total |
|---|---|---|---|
| Daily Load (Writes) | 1TB / 1KB * 30 days * $0.111/million writes | $3,330 | 90% |
| Reads | 2,000/sec * 86,400 * 30 * $0.022/million | $114 | 3% |
| Storage | 1TB * $0.25/GB-month (docs) | $256 | 7% |
| Monthly Total | $3,700 | ||
| Annual Total | $44,401 |
90% of your bill is loading data you already have in Blob.
Why This Happens
Cosmos DB doesn’t have a discounted bulk-import price. Whether you use SDK bulk mode, Data Factory, or a custom loader, you still pay the same RU cost per write.
So the math is brutal for daily refreshes:
- 1TB daily = ~1 billion items * 5 RUs = 5 billion RUs/day
- Even at standard RU pricing, that’s thousands per month just to load data
You’re paying a premium to move data from one store (Blob) into another store (Cosmos DB)–every single day.
The Alternative: Local Disk + Object Storage
What if you skip step 3 entirely?
- Daily Spark job computes features from raw data
- Write snapshot to Blob (parquet files, checkpoints)
Load snapshot into Cosmos DB for servingSync to local disk with BoulderKV- ML models read features during inference
BoulderKV keeps Blob as the source of truth and syncs data to local disk for serving. When your Spark job writes a new snapshot, BoulderKV pulls it to local SSD automatically. No expensive per-item writes. Just a fast file sync.
Cost Comparison
| Component | Cosmos DB | BoulderKV (Blob + Local Disk) |
|---|---|---|
| Storage | $256 | $21 |
| Daily Sync/Load | $3,330 | $0 |
| Compute/Cache | – | $720 |
| Read Serving | $114 | – |
| Monthly Total | $3,700 | $741 |
| Annual Total | $44,401 | $8,896 |
| Savings vs Cosmos DB | – | 80% |
BoulderKV assumptions: Azure Blob hot tier (LRS, East US) at $0.0208/GB-month (Azure Retail Prices API), 2 NVMe cache nodes at $0.50/hr each, single region. Adjust to your VM size, cache strategy, and region pricing.
But What About Latency?
Cosmos DB delivers single-digit millisecond P99. Can local disk match that?
Yes–local SSD delivers comparable performance:
- Local disk reads: <5ms P99 (comparable to Cosmos DB)
- No cache misses: The full dataset is on disk
- Predictable latency: No cold starts or cache warming
For a feature store refreshed daily, you sync once and serve all day from local storage.
Multi-Region Makes It Worse
The costs above are for a single region. Cosmos DB bills RU and storage in every region you replicate to, so costs scale roughly linearly:
| Regions | Cosmos DB | BoulderKV |
|---|---|---|
| 1 region | $3,700 | $741 |
| 3 regions | $11,100 | $2,223 |
For a deeper dive into multi-region costs, see our Cosmos DB cost analysis for a 5TB feature store.
Conclusion
For batch-refreshed feature stores, the daily load into Cosmos DB is often the majority of your bill. You’re paying RU costs for bulk copies–not for the fast reads that justified choosing Cosmos DB in the first place.
The alternative is simple: keep your features in Blob, sync to local disk for serving, and skip the expensive Cosmos DB load entirely.
BoulderKV is built for exactly this pattern–read-heavy workloads where the data already lives in object storage. Sync to local disk for fast reads. No daily loads. No write amplification. Just Cosmos DB-level latency at a fraction of the cost.
Think our math is off? We’d love to hear from you–email hello@boulderkv.com