Tech blog: MongoDB Benchmark

There are no official MongoDB benchmarks because the developers don’t believe they accurately represent real world usage. This is true because you can only really get an idea of performance when you’re testing your own queries on your own hardware. Raw figures can seem impressive but they’re not representative of how your own application is likely to perform. Benchmarks are useful for indicating how different hardware specs might perform but are really only worth it if you use real world queries.
For Server Density v2 I have been benchmarking MongoDB with different tweaks so we can get maximum performance for our high throughput clusters, but make cost savings for our less important systems. A lot has been said about various choices of write concern, deploying to SSDs and replication lag but there aren’t really any numbers to base your decision on.
This set of MongoDB benchmarks is not about the absolute numbers but is designed to give you an idea of how each of the different options affects performance. Your own queries will differ but the idea is to prove general assumptions and principles about the relative differences between each of the write options.
These MongoDB benchmarks test various options for configuring and querying MongoDB. I wrote a simple Python script to issue 200 queries and record the execution time for each. It was run with Python 2.7.3 and Pymongo 2.5 against MongoDB 2.4.1 on an Ubuntu Linux 12.04 Intel Xeon-SandyBridge E3-1270-Quadcore 3.4GHz dedicated server with 32GB RAM, Western Digital WD Caviar RE4 500GB spinning disk and Smart XceedIOPS 200GB SSD.
The script was run twice, taking the results from the second execution. This avoids slowdown cause by initially allocating files, collections, etc – MongoDB only creates databases when they’re first written which adds a bit of time to the first call but isn’t really relevant in real world usage.

Test methodology

import time

import pymongo

m = pymongo.MongoClient()

doc = {'a': 1, 'b': 'hat'}

i = 0

while (i < 200):

start = time.time()

m.tests.insertTest.insert(doc, manipulate=False, w=1)

end = time.time()

executionTime = (end - start) * 1000 # Convert to ms

print executionTime

i = i + 1

This is a dummy document because I’m not trying to simulate a real application here. Document size, number/size of indexes and the type of operation will all play a part in the actual numbers. This is only testing inserts but there are other optimisations you can make with updates, particularly ensuring documents don’t grow. However, this is sufficient for what I’m trying to show in these tests – the relative difference between the write options.
Write concern
The write concern allows you to trade write performance with knowing the status of the write. If you’re doing high throughput logging but aren’t concerned about possibly losing some writes (e.g. if the mongod crashes or there is a network error) then you can set the write concern low. Your write calls will return quickly but you won’t know if they were successful. The write concern can be dialed up to including error handling (the default) so the write will be acknowledged (not necessarily safe on disk).
It’s important to know that an acknowledgement is not the same as a successful write – it simply gives you a receipt that the server accepted the write to process. If you need to know that writes were actually successful one option is to require confirmation the write has hit the journal. This is essentially a safe write to the single node with the option to go further torequest acknowledgement from replica slaves. It’s much slower to do this but guarantees your data is replicated.

`w=0` is the fastest way to issue writes, with an average execution time of 0.07ms, max of 0.11ms and min of 0.06ms. This setting disables basic acknowledgment of write operations, but returns information about socket excepts and networking errors to the application.

`w=1` takes double the time to return, with an average execution time of 0.13ms, max of 0.32ms and min of 0.11ms. This guarantees that the write has been acknowledged but doesn’t guarantee that it has reached disk (the journal), so there is still potential for the write to be lost – there’s a 100ms window where the journal might not be flushed to disk. Setting `j=1` protects against this.

`j=1` (spinning disk) is several orders of magnitude slower than even `w=1`, with an average execution time of 34.19ms, max of 34.28ms and min of 34.10ms. The mongod will confirm the write operation only after it has written the operation to the journal. This confirms that the write operation can survive a mongod shutdown and ensures that the write operation is durable.

`j=1` (SSD) is x3 faster than a spinning disk with an average execution time of 11.18ms, max of 11.24ms and min of 11.11ms.

There is an interesting ramp up for the initial few queries every time the script is run. This is likely to do with connection pooling and opening the initial connection to the database, whereas subsequent queries can use the already open connection.

Some spikes appear during the script execution. This could be the connection closing and being recreated.

This means that you can reasonably use the default `w=1` as a safe starting point but if you need to be sure data has gone to a single node, `j=1` is the option you need. And for high throughput you can half query times by going down to `w=0`.
SSD vs Spinning Disk
It’s a safe assumption that SSDs will always be faster than spinning disks, but the question is how much – and is that worth paying for them? The more data you store, the more expensive the SSD will be – higher capacity SSDs are available but they are fairly cost prohibitive. However, MongoDB supports storing databases in directories which can be mounted to their own devices, giving you the option of putting certain databases on SSDs.
Putting your journal on an SSD and then using the `j=1` flag is a good optimisation. You need the `--directoryperdb` config flag and you can then mount the databases on their own disks. The journal is always in its own directory so you can mount it separately without any changes if you wish.
Replication
If you specify a number greater than 1 for the `w` flag then this will require `n`number of replica slaves to acknowledge the write before the query completes. I tested this in a x4 node replica set with the primary and a slave in the same data centre (San Jose, USA) as the execution script and the remaining x2 nodes in a different data centre (Washington DC, USA).
The average round trip time between the nodes in the same data centre is 0.864ms and between different data centres is 71.187ms.

`w=2` required acknowledgement from the primary and one of the 3 slaves. Average execution time was 14ms, max of 867ms and min of 1.6ms.

`w=3` required acknowledgement from the primary plus 2 slaves. Average execution time was 310ms, max of 1329ms and min of 96ms. The killer here is the range in response times, which are affected by network latency + congestion, communication overhead between 3 nodes and having to wait for each one.

Using an integer for the `w` flag lets MongoDB decide which nodes must acknowledge. My replica set has 4 nodes and I specified 2 and 3 but I didn’t get to choose which ones were part of the acknowledgement. This could be local slaves but could also be remote, which is probably responsible for the range in response times where a remote slave happened to return faster than the local one. More control is possible using tags.
Conclusion
It’s fairly clear that these MongoDB benchmark results validate the general assumptions that SSDs are faster and there is a fairly variable latency involved with replicating over a network, particularly over long distances. What this experiment shows is the differences between the write concern options so you can make the right tradeoff between durability and performance. It also highlights that you can significantly improve performance if you need the journal based durability by adding SSDs.
MongoDB benchmarks raw results

`w=0` `w=1` `j=1`
Spinning `j=1`
SSD `w=2`
Same DC `w=3`
Multi-DC

Average 0.07ms 0.13ms 34.19ms 11.18ms 14.26ms 311ms

Min 0.06ms 0.11ms 34.10ms 11.11ms 1.65ms 97ms

Max 0.11ms 0.32ms 34.28ms 11.24ms 867.29ms 1,329ms

Tech blog

My Pages

Tuesday, April 12, 2016

MongoDB Benchmark

No comments:

Post a Comment

	`w=0`	`w=1`	`j=1` Spinning	`j=1` SSD	`w=2` Same DC	`w=3` Multi-DC
Average	0.07ms	0.13ms	34.19ms	11.18ms	14.26ms	311ms
Min	0.06ms	0.11ms	34.10ms	11.11ms	1.65ms	97ms
Max	0.11ms	0.32ms	34.28ms	11.24ms	867.29ms	1,329ms