Tuesday, March 29, 2016

Nginx Performance tuning

Well I’ve got some bad news for you, you can’t really optimize nginx very much. There’s no magic settings that will reduce your load by half or make PHP run twice as fast. Thankfully, the good news is that nginx doesn’t require any tuning because it is already optimized out of the box. The biggest optimization happened when you decided to use nginx and ran that apt-get install, yum install or make install. (Please note that repositories are often out of date. The wiki install page usually has a more up-to-date repository)

That said, there’s a lot of options in nginx that affects its behaviour and not all of their defaults values are completely optimized for high traffic situations. We also need to consider the platform that nginx runs on and optimize our OS as there are limitations in place there as well.

So in short, while we cannot optimize the load time of individual connections we can ensure that nginx has the ideal environment optimized for handling high traffic situations. Of course, by high traffic I mean several hundreds of requests per second, the far majority of people don’t need to mess around with this, but if you are curious or want to be prepared then read on.

First of all we need to consider the platform to use as nginx is available on Linux, MacOS, FreeBSD, Solaris, Windows as well as some more esoteric systems. They all implement high performance event based polling methods, sadly, nginx only support 4 of them. I tend to favour FreeBSD out of the four but you should not see huge differences and it’s more important that you are comfortable with your OS of choice than that you get the absolutely most optimized OS.

In case you hadn’t guessed it already then the odd one out is Windows. Nginx on Windows is really not an option for anything you’re going to put into production. Windows has a different way of handling event polling and the nginx author has chosen not to support this; as such it defaults back to using select() which isn’t overly efficient and your performance will suffer quite quickly as a result.


The second biggest limitation that most people run into is also related to your OS. Open up a shell, su to the user nginx runs as and then run the command `ulimit -a`. Those values are all limitations nginx cannot exceed. In many default systems the open files value is rather limited, on a system I just checked it was set to 1024. If nginx runs into a situation where it hits this limit it will log the error (24: Too many open files) and return an error to the client. Naturally nginx can handle a lot more than 1024 file descriptors and chances are your OS can as well. You can safely increase this value.

To do this you can either set the limit with ulimit or you can use worker_rlimit_nofile to define your desired open file descriptor limit. (This requires nginx starts as root before dropping its privileges)

Nginx Limitations
With the OS taken care of it’s time to dive into nginx itself and have a look at some of the directives and methods we can use to tune things.

Worker Processes
The worker process is the backbone of nginx, once the master has bound to to the required IP/ports it will spawn workers as the specified user and they’ll then handle all the work. Workers are not multi-threaded so they do not spread the per-connection across CPU cores. Thus it makes sense for us to run multiple workers, usually 1 worker per CPU core. For most work loads anything above 2-4 workers is overkill as nginx will hit other bottlenecks before the CPU becomes an issue and usually you’ll just have idle processes. If your nginx instances are CPU bound after 4 workers then hopefully you don’t need me to tell you.

An argument for more worker processes can be made when you’re dealing with situations where you have a lot of blocking disk IO. You will need to test your specific setup to check the waiting time on static files, and if it’s big then try to increase worker processes.

Worker Connections
Worker connections effectively limits how many connections each worker can maintain at a time. This directive is most likely designed to prevent run-away processes and in case your OS is configured to allow more than your hardware can handle. As nginx developer Valentine points out on the nginx mailing list nginx can close keep-alive connections if it hits the limit so we don’t have to worry about our keep-alive value here. Instead we’re concerned with the amount of currently active connections that nginx is handling. The formula for maximum number of connections we can handle then becomes:

worker_processes * worker_connections * (K / average $request_time)

Where K is the amount of currently active connections. Additionally, for the value K, we also have to consider reverse proxying which will open up an additional connection to your backend.

In the default configuration file the worker_connections directive is set to 1024, if we consider that browsers normally open up 2 connections for pipe lining site assets then that leaves us with a maximum of 512 users handled simultaneously. With proxying this is even lower, though, your backend hopefully responds fairly quickly to free to the connection.

All things considered about worker connections it should be fairly clear that if you grow in traffic you’ll want to eventually increase the amount of connections each worker can do. 2048 should do for most people but honestly, if you have this kind of traffic you should not have any doubt how high you need this number to be.

CPU Affinity
Setting CPU affinity basically means you tell each worker which CPU core to use, they’ll then use only that CPU core. I’m not going to cover this much too much except to say that you should be really careful doing this. Chances are your OS CPU scheduler is far, far better at handling load balancing than you are. If you think you have issues with CPU load balancing then optimize this at the scheduler level, potentially find an alternative scheduler but unless you know what you’re doing then don’t touch this.

Keep Alive
Keep alive is a HTTP feature which allows user agents to keep the connection to your server open for a number of requests or until the specified time out is reached. This won’t actually change the performance of our nginx server very much as it handles idle connections very well. The author of nginx claims that 10,000 idle connections will use only 2.5 MB of memory, and from what I’ve seen this seems to be correct.

The reason I cover this in a performance guide is pretty simple. Keep alive have a huge effect on the perceived load time for the end user. This is the most important measurement you can ever optimize, if your website seem to load fast to users then they’re happy. Studies done by Amazon and other large online retailers shows that there is a direct correlation between perceived load time and sales completed.

It should be somewhat obvious why keep alive connections have such a huge impact, namely you avoid the whole HTTP connection creation aspect, which is not insignificant. You probably don’t need a keep alive time out value of 65, but 10-20 is definitely recommended, and as previously stated, nginx can easily handle this.

tcp_nodelay and tcp_nopush
These two directives are probably some of the most difficult to understand as they affect nginx on a very low networking level. The very short and superficial explanation is that these directives determine how the OS handles the network buffers and when to flush them to the end user. I can only recommend that if you do not know about these already then you shouldn’t mess with them. They won’t significantly improve or change anything so best to just leave them at their default values.

Hardware Limitations
Since we’ve now dealt with all the possible limitations imposed by nginx it time to figure out how to push the most out of our server. To do this we need to look to the hardware level as this is the most likely place to find our bottleneck.

With servers we have primarily 3 potential bottleneck areas. The CPU, the memory and the IO layers. nginx is very efficient with its CPU usage so I can tell you straight up that this is not going to be your bottleneck, ever. Likewise, it’s also very efficient with its memory usage so this is very unlikely to be our bottleneck as well. This leaves IO as the primary culprit of our server bottleneck.

If you’re used to dealing with servers then you’ve probably experienced this before. Hard drives are really, really slow. Reading from the hard drive is probably one of the most expensive operations you can do in a server and therefore the natural conclusion is that to avoid an IO bottleneck we need to reduce the amount of hard drive reading and writing nginx does.

To do this we can modify the behaviour of nginx to minimize disk writes as well as make sure the memory constraints imposed on nginx allows it to avoid disk access.

Access Logs
By default nginx will write every request to a file on disk for logging purposes, you can use this for statistics, security checks and such, however it comes at the cost of IO usage. If you don’t use access logs for anything you can simply just turn it off and avoid the disk writes. However, if you do require access logs then consider saving the logs to a memory parition instead. This will be much faster than writing to the disk and will reduce IO usage significantly.

If you only use access logs for statistics then consider whether you can use something like Google Analytics instead, or whether you can log only a subset of requests instead of all of them.

Error Logs
I sort of debated internally whether I should even cover this directive as you really don’t want to disable error logging, especially considering how low volume the error log actually is. That said, there is one gotcha with this directive, namely the error log level parameter you can supply, if set too low this will log 404 errors and possibly even debug info. Setting this to warn level in production environments should be more than sufficient and keep the IO low.

Open File Cache
A part of reading from the file system consists of opening and closing files, considering that this is a blocking operation it is a not insignificant part. Thus, it makes good sense for us to cache the open file descriptors and this is where the open file cache comes in. The linked wiki has a pretty decent explanation of how to enable and configure it so I suggest you go read that.

Buffers
One of the most important things you need to tune is the buffer sizes you allow nginx to use. If the buffer sizes are set too low nginx will have to store the responses from upstreams in a temporary file which causes both write and read IO, the more traffic you get the more of a problem this becomes.

client_body_buffer_size is the directive which handles the client request buffer size, meaning the incoming request body. This is used to handle POST data, meaning form submissions, file uploads etc. You’ll want to make sure that the buffer is large enough if you handle a lot of large POST data submissions.

fastcgi_buffers and proxy_buffers are the directives which deal with the response from your upstream, meaning PHP, Apache or whatever you use. The concept is exactly the same as above, if the buffers aren’t large enough the data will be saved to disk before being served to the user. Notice that there is an upper limit for what nginx will buffer, even on disk, before it transfer it synchronously to the client. This limit is governed by fastcgi_max_temp_file_size as well as proxy_max_temp_file_size. In addition you can also turn it off entirely for proxy connections with proxy_buffering set to off. (Usually not a good idea!)

Removing Disk IO Entirely
The best way to remove disk IO is of course to not use the disks at all, if you have only a small amount of data chances are you can just fit it all in memory and thus remove the limitation of disk IO entirely. By default your OS will also cache the frequently asked disk sectors so the more memory you have the less IO you will do. What this means is that you can buy your way out of this limitation by just adding more memory. The more data you have the more memory you’ll need, of course.

Network IO
For the sake of fun we will assume that you’ve managed to get enough memory that you can fit your entire data set in there. This means you can theoretically do around 3-6gbps of read IO. Chances are, though, that you do not have that fast of a network pipe. Sadly, it’s limited how much we can actually optimize the amount of network IO as we need to transfer the data somehow. The only real way is to either minimize the starting data amount or compress it.

Thankfully nginx has a gzip module which allow us to compress the data before it’s sent to the client, this can drastically reduce the size of the data. Generally the gzip_comp_level where you start not getting any further improvements is around 4-5. There’s no point in increasing it further as you will just waste CPU cycles.

You can also minimize the data by using various javascript and css minimizers. This is not really nginx related so I will trust that you can find enough information on this using Google.

Phew
And with that we’ve reached the end of this subject. If you still require additional optimization then it’s time to consider using extra servers to scale your service instead of wasting time micro optimizing nginx further, but that’s a topic for another time as I’ve been running on for quite a while now. In case you’re curious the end word count was just above 2400, so best to take a small break before you go explore my blog further!

No comments:

Post a Comment