Tuesday, April 12, 2016

MongoDB performance bottlenecks, optimization Strategies for MongoDB

I will try to describe here all potential performance bottlenecks and possible solutions and tips for performance optimization, but first of all – You should to ensure that MongoDB was the right choice for your project. You should clearly understand that MongoDB is completely “Nonrelational Database” (I mean no joins). And MongoDB is Document orientated database (not graph oriented). This is completely Important to be sure that you made the right choice of database.
0. Map-Reduce 
Before MongoDB 2.4 update (main point here is update to V8 engine) MongoDB have been using SpiderMonkey as a javascript engine, and the problem was that it’s single threaded (that was pretty awkward when Map-Reduce has been working only on 1 core from e.g. 48 ones). So after the 2.4 update performance was raised up, but there are too many pitfalls and you’d better to read this out http://docs.mongodb.org/manual/core/map-reduce/ and this one http://docs.mongodb.org/manual/core/aggregation-pipeline/.
N.B. Bear in mind that the performance of Map-Reduce is depends upon “the state of data”, I mean, that the difference between Map-Reduce on the data “as-is” and on the sorted data is too huge (on sorted data Map-Reduce will be something like 10-100x faster then without sorting). So, to raise up the performance of Map-Reduce you need:
  • Find out the key, that you will use for Map-Reduce job (usually it’s the same as the emit key) and ENSURE that you have added indexes for this key (you can try to run your query filter)
  • Add input sort for key for the Map-Reduce job(emit key)
1. Sharding
It is hella cool to have out of the box sharding, but apart from the sharding you have also one of the performance pitfalls.
Shard keys should satisfy the following:
  • “distributable” – the worst case of the shard key is auto-incremented value (this will entail the “hot shard” behavior, when all writes will be balanced to the single shard – here is the bottle neck). Ideal shard key should be as much “randomness” as possible.
  • Ideal shard key should be the primary field used for your queries.
  • An easily divisible shard key makes it easy for MongoDB to distribute content among the shards. Shard keys that have a limited number of possible values can result in chunks that are “unsplittable.”
  • unique fields in your collection should be part of the shard key
Here is the doc about shard key
2. Balancing
You should bear in mind that moving chunks from shard to another shard is a very expensive operation (adding of new shards may significantly slow down the performance).
As an helpful option – you could stop the balancer during the “prime time”.
3. Disk Input Output operations
You should understand that in most cases the hardware bottleneck will be HDD (not CPU or RAM), especially if you have several shards.
So, during the growth of data, the number of I/O operations will rapidly increase. Also keep monitoring free disk space. So fast disks are more important in case if you are using sharding.
4. Locks
MongoDB uses a readers-writer lock that allows concurrent reads access to a database but gives exclusive access to a single write operation.
When a read lock exists, many read operations may use this lock. However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock.
Locks are “writer greedy,” which means writes have preference over reads. When both a read and write are waiting for a lock, MongoDB grants the lock to the write.
And the very sad point – MongoDB implements locks on a per-database basis for most read and write operations (before 2.2 update was the global lock – one per instance for al databases).
This is very valuable point, and if you have too many write requests here will be the bottleneck with the solution (m.b. it’s really to create hack with several databases, but better forget about this).
In case if your application have too many write operations it make sense to think about migration to something like Cassandra (In Cassandra, a write is atomic at the row-level, meaning inserting or updating columns for a given row key will be treated as one write operation).
Please take a look at concurrency docs to ensure that you understanding mongo concurrency.
5. Fast Writes
Use Capped Collections for Fast Writes
Capped Collections are circular, fixed-size collections that keep documents well-ordered, even without the use of an index. This means that capped collections can receive very high-speed writes and sequential reads.
These collections are particularly useful for keeping log files but are not limited to that purpose. Use capped collections where appropriate.
6. Fast Reads
Use Natural Order for Fast Reads. To return documents in the order they exist on disk, return sorted operations using the $natural operator. On a capped collection, this also returns the documents in the order in which they were written.
Natural order does not use indexes but can be fast for operations when you want to select the first or last items on disk.
7. Query Performance
Read out about query performance, especially please pay attention to Indexesand Compound Indexes.
8. Remove Expired Data
It seems to be a good practice to enable the TTL (time to live) in your collections, add expireAfterSeconds value and use Expire Data from Collections by Setting TTL technique. This approach will allow you to get rid of “unnecessary data”.
9. The size of Database
As far as you might understand MongoDB will store e.g. this document
{ UserFirstAndLastName: "Mikita Manko",
 LinkToUsersFacebookPage: "https://www.facebook.com/mikita.manko"
}
“as-is”. I mean that names of these fields “UserFirstAndLastName” and “LinkToUsersFacebookPage” will reduce free space.
Buy the using “name shorting” technique you can minimize the usage of memory (you can get rig of something like 30-40% of unnecessary data):
{ FL: "Mikita Manko",
 BFL: "https://www.facebook.com/mikita.manko"
}
Obviously that it will cause the creation of “mapper” in your code (You should map shortened unreadable names from database to long ones to allow to use readable fields in your code)
A. Application Design
Take a look at these notes and bear them in mind during the designing of your architecture and solutions.
B. Profiling and Investigations
You should be familiar with such features as:
C. Updates
The most obvious point is to be on the cutting edge of technologies and Investigate and Install last updates.
P.S.
As i mentioned before – Use MongoDB not just for fun, but if your project is applicable for Document Oriented Database, that is the most important point.

No comments:

Post a Comment