Scaling Smarter

Scaling Smarter

·

3 min read

How we handle massive data loads

In a system with thousands of nodes constantly generating logs, metrics, and critical operational data, the challenge isn't just about managing data—it's about transporting and storing it efficiently.

Each deployment adds to this enormous data stream, and making it accessible to our users is crucial. So how do we handle it all while saving time, money, and bandwidth? Enter ZSTD streaming compression.

Zstandard (ZSTD) streaming compression is a feature of the Zstandard compression algorithm. The algorithm is designed to compress or decompress data in real-time as it is being produced or consumed, without requiring the entire dataset to be available in memory. This makes it ideal for large datasets, continuous data streams, or real-time applications.

Key Features of ZSTD Streaming Compression:

  1. Block-based Design:

    • ZSTD compresses data in blocks, with each block being processed independently. In streaming mode, it compresses/decompresses data in chunks rather than waiting for the entire dataset.
  2. Low Memory Overhead:

    • ZSTD is memory-efficient, making it suitable for resource-constrained environments.
  3. Real-Time Processing:

    • Data is processed incrementally as it’s received, enabling applications like log compression, network transmission, and real-time backups.
  4. Adaptable Compression Levels:

    • You can choose from a wide range of compression levels to balance between speed and compression ratio.
  5. Preservation of Streaming Integrity:

    • The algorithm ensures that data can be decompressed as it’s being streamed, making it robust for pipelines and continuous data flows.

Compressing text-based data at scale

ZSTD is a game-changer for handling text-based data at scale. It's highly efficient for multi-threaded environments, which is perfect for NodeOps’ global infrastructure. By integrating ZSTD, we’ve reduced our data size significantly, achieving compression ratios ranging from 16x to 19x. This means we can send and store data faster while consuming less bandwidth, which directly translates to cost savings.

Let’s put that into perspective with some rough numbers: For every 100TB of original data, we compress it down to about ~5–6 TB. In terms of cloud costs on AWS or GCP, that’s a ~70–80% reduction in storage costs, potentially saving ~$1,500–$2,000 per 100TB of data, depending on storage tiers. For network egress, which can cost ~$90 per TB, this optimization reduces expenses by ~$8,000–$9,000 per 100TB. Those are massive savings at scale.

Take a look at the graphs below: the first panel shows the difference between compressed data size and original data size, highlighting the dramatic reduction. The second panel showcases the compression ratio, staying consistently high throughout. It's not just optimization—it’s innovation at scale.

Of course, this level of compression comes at a price: increased CPU usage. But when we compare the savings in bandwidth and storage to the CPU cost, it’s an easy trade-off. For us, it’s about delivering the best performance and value for our users while staying ahead in efficiency.

This is just one of the many optimizations we’ve implemented at NodeOps to ensure a scalable, high-performance infrastructure. Follow us for more updates on the tech innovations powering NodeOps.