Millions of cross-node transactional operations per second on a distributed cluster of commodity hardware.

Make sure to check out our approach and disclaimer at the bottom

  • Random 90/10

    90% read/10% write operations on 10 random cross-node keys per transaction

    This test shows the performance over time of a saturating workload of random operations on the database. It is designed to emulate a typical real-world workload that exhibits more reads than writes. These results are limited in performance by the SSD subsystem. Steady-state performance is 890,000 operations per second while initial burst performance is roughly 1,450,000 operations per second

  • Random 50/50

    50% read/50% write operations on 10 random cross-node keys per transaction

    This test shows the performance over time of a saturating workload of random operations on the database. It is designed to stress the overall performance of the storage, networking, and transaction processing subsystems. Steady state performance is 235,000 operations per second while initial burst performance is roughly 380,000 operations per second

  • Cached Reads

    Random reads from a working set of keys that fits in RAM

    This test shows the performance over time of a saturating workload of random reads on set of commonly accessed keys. This test emulates a system with a database too large to fit in memory, but that can be effectively cached. Performance is about half as fast as a pure memcached cluster at 3,750,000 reads per second but provides absolute consistency with the database. (It takes about 10 seconds to warm the cache.)

  • Range Reads

    Reads of key ranges of various sizes

    This test shows the read throughput achieved by random reads of various-sized ranges of key-value pairs. This is designed to test the performance of FoundationDB when used for streaming or analytical applications, as well as retrieving a single logical element, which may be encoded as a contiguous range of sub-keys. At a range size of only 63 keys (approximately 4k of data returned) FoundationDB is already largely limited by the gigabit networking subsystem to reading 26 million keys per second (about 1,800 MB/s)

  • Adjacent Writes

    Random writes of blocks of 140 adjacent keys

    This test is designed to show the write performance of FoundationDB when manipulating keys that are adjacent. In real workloads, updates are often made to keys that are close to each other (in sorted order). FoundationDB can coalesce these operations to improve efficiency and achieve 1,500,000 writes per second

  • Bulk Load

    Loading small key-values into an empty database

    This test shows the speed over time of loading an empty database with 100 million small key-values. It is designed to exercise data distribution algorithms and write bandwidth. Note that this bulk load is done using standard API transactions with strong ACID properties, not a separate optimized path. For large datasets, FoundationDB sustains about 85 MB/s, loading almost 5 billion key-values per hour It is limited by the performance of the SSDs.

  • Scalability

    Performance of different-sized clusters

    This test shows the relative performance of a typical 90/10 read/write workload over a variety of cluster size configurations. The dataset is sized proportionally to the cluster. The results for the 24-machine cluster are faster than those above because the dataset used for this test is smaller per-machine than the canonical dataset. Near-linear scaling is exhibited.

  • Read Latency

    Mean latency to read a single key

    FoundationDB works hard to achieve low latencies even at moderate or high utilization. This test measures individual key read latencies (in ms) by percent system load. sub-millisecond read latencies are achieved over a broad range of system loads.

  • Commit Latency

    Mean latency to commit a cross-node transaction with 5 reads and 5 writes

    FoundationDB works hard to achieve low latencies even at moderate or high utilization. This test measures the latency to durably commit a cross-node transaction (in ms) by percent system load. About 10 millisecond commit latency is achieved over a broad range of system loads, with complete durability.

  • Transaction Latency

    Mean latency of a cross-node mixed 90% read/10% write workload

    FoundationDB works hard to achieve low latencies even at moderate or high utilization. This test measures the mean latency (in ms) of a blend of two types of cross-node transactions: 20% doing 5 reads and 5 writes and 80% doing 10 reads. The latency is shown as a function of percent system load. About 10 millisecond mean cross-node transaction latency is achieved over a broad range of system loads, with complete durability.

  • Approach

    We took performance seriously right from the start of FoundationDB's design process. Some distributed systems are "scalable" to many nodes, but each node is slowed by the weight of the distributed programming techniques. We've developed new data structures, memory management tools, communications models, and even our own programming language in the service of not just scalability, but also raw speed per node. When you combine our raw speed with scalability, we deliver awesome performance with modest hardware.

    We measure 50+ different key performance metrics every night during a 10-hour test. Above are performance details for a few of the most important metrics.

  • Disclaimer

    This is a tricky subject, as we all know how easy performance numbers are to sway one way or another. High numbers can be achieved by testing with small datasets that fit entirely in RAM, workloads with few writes, reduced replication factors, durability guarantees disabled, lots of consecutive changes to single keys, datasets that have just been freshly loaded, or, for ACID systems, transactions that conveniently never involve data elements stored on different nodes. Likewise, low numbers usually come from tests with low numbers of outstanding requests so that sufficient parallelism to get the system really humming is never achieved, or from insufficient load from the test clients.

    We don't use any of these tricks in the numbers posted above.

  • Hardware

    24-machine Ethernet cluster

    E3-1240v1 CPU, 16GB RAM, 2x 200GB SATA SSD

    ~2.0kW peak power consumption in 9U of rack space

  • Dataset*

    2 billion key-value pairs with 16 byte keys and random 8-100 byte values. The dataset on disk is 3.5 times as large as the page cache. Data stored with triple replication.

    *Unless otherwise noted

  • Configuration

    All operations are fully ACID transactional at the highest possible level of isolation (serializable) and durability (commit means flushed to disk in three places).

  • Preparation

    The database has been "aged" by running many hours of random mutations to avoid unrealistically-clean data structures.

Want more tech info?

White Papers →