Tech blog: Improving cluster performance by tuning Apache

In my last article about cluster performance, I found that a cluster performs better than a single Pi, but there was a still lot of room for improvement. I've made changes to Apache's configuration files, and I've modified the way page caching works in my CMS.

Moving the page cache

The CMS that I've written can generate pages dynamically, and it can cache pages so that they can be served instantly without having to be assembled. Only pages that consist of static HTML can be cached. Pages that contain dynamic content generated by executable scripts aren't cached.

The page cache used to be in /usr/share/cms/cache. The Python interpreter had to be loaded to serve cached pages from /usr/share/cms/cache. Now, the root directory of the page cache is /var/www, so Apache can serve cached pages without invoking Python to run the CMS script.

One downside is that the CMS can not count track traffic anymore. When the CMS executed every time a page was requested, a function was called to increment a number in a data file. This doesn't work now that pages can be served without the CMS executing.

Unload unused modules

One of the best ways to improve Apache's performance is by unloading modules that aren't needed. First you need to list all the modules that are currently loaded using this command:

apache2ctl -M

It can be difficult to determine which modules are in use. It really depends on which directives are used in .htaccess files and virtual host files. For example, if you disable authz_host_module, then Allow, Order and Deny directives won't work. Each time you disable a module, restart Apache with these commands:


$ sudo service apache2 stop
$ sudo service apache2 start

You can use 'restart' instead of 'start' and 'stop', but there are some variables that require Apache to be stopped before they can be updated. It's a good idea to thoroughly test your site before you disable any more modules. I disabled these modules:


$ sudo a2dismod autoindex
$ sudo a2dismod auth_basic
$ sudo a2dismod status
$ sudo a2dismod deflate
$ sudo a2dismod ssl
$ sudo a2dismod authz_default

If you find that a module is required, you can re-enable it with the a2enmod command like this:

$ sudo a2enmod authz_host

Lower the timeout

I set the timeout in /etc/apache2/apache2.conf to 30. This prevents concurrent requests from occupying memory for long periods and reduces memory usage.

Tune Apache processes

Apache has several different multiprocessing models. They each use a number of server processes and child threads to handle HTTP requests. MPM worker is the most modern configuration. MPM prefork used to be the standard, but MPM worker gives better performance and memory usage. Use this command to check what mode Apache is in:

$ sudo apache2 -V

Part of the output will be something like this:


Server MPM:     Worker
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)

These are the default settings for MPM Worker:


<IfModule mpm_worker_module>
    StartServers          5
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadLimit          64
    ThreadsPerChild      25
    MaxClients          150
    MaxRequestsPerChild   0
</IfModule>

The size of the Apache processes varies depending on the content being served and any scripts that might be running. This command shows the number of Apache processes, and their size:

ps aux | grep 'apache2'

The 6th column contains the amount of memory used by each process. Dividing the amount of spare memory by the size of the average Apache process gives a rough indication of the maximum number of server processes that you can run. Each Pi in my cluster has about 280MB of RAM that's free, and the average size of Apache processes is about 7MB. 280 divided by 7 gives 40.

StartServers is the number of server threads that Apache creates when it starts up. Creating new server processes can be time consuming, so I want Apache to start a lot server processes when it starts. This means it won't have to spend time creating more processes while it's busy processing a lot of traffic. I've set StartServers to 40.

I don't want Apache to be able to create too many processes, as my Pi might run out of memory, so I've set the ServerLimit to 40.

Each server process can have a varying number of threads. It's the threads that actually process requests. I've set the default number of threads per child to 8. I didn't calculate this, I just tried a lot of different numbers and ran a lot of tests with siege until I found the optimum value.

The total number of threads is the number of server processes multiplied by ThreadsPerChild, which is 320 with my settings. I set MaxClients to 320 to prevent Apache from creating extra threads.

These settings will cause Apache to create a lot of processes and threads that don't get used immediately. In order to prevent Apache from deleting them, I set MaxSpareThreads to 320.

MaxRequestsPerChild is the number of requests a process should handle before it is killed and a replacement process is started. This is done to prevent memory leaks from accumulating large amounts of memory. It should be set to the number of hits a server gets in a day so that processes are restarted once a day.

The MPM Worker settings are now


<IfModule mpm_worker_module>
    StartServers          40
    ServerLimit           40
    MinSpareThreads       25
    MaxSpareThreads      320
    ThreadLimit           64
    ThreadsPerChild        8
    MaxClients           320
    MaxRequestsPerChild 2000
</IfModule>

Before I made changes to the caching system, a siege test with only 25 concurrent users yeilded these results:


Lifting the server siege...      done.
Transactions:            30 hits
Availability:        100.00 %
Elapsed time:         59.79 secs
Data transferred:         0.08 MB
Response time:         26.07 secs
Transaction rate:         0.50 trans/sec
Throughput:          0.00 MB/sec
Concurrency:         13.08
Successful transactions:          30
Failed transactions:            0
Longest transaction:        29.32
Shortest transaction:        14.03

After I improved the caching system I tested a single node with 200 concurrent users using this command:

$ siege -d1 -c200 -t1m http://192.168.0.4/specs.html

The results were


Lifting the server siege...      done.
Transactions:          6492 hits
Availability:        100.00 %
Elapsed time:         59.28 secs
Data transferred:        38.86 MB
Response time:          1.29 secs
Transaction rate:       109.51 trans/sec
Throughput:          0.66 MB/sec
Concurrency:        141.10
Successful transactions:        6492
Failed transactions:            0
Longest transaction:        11.23
Shortest transaction:         0.32

After restarting Apache with the configuration changes and running the same test again, I got these results:


Lifting the server siege...      done.
Transactions:          6449 hits
Availability:        100.00 %
Elapsed time:         59.53 secs
Data transferred:        38.60 MB
Response time:          1.31 secs
Transaction rate:       108.33 trans/sec
Throughput:          0.65 MB/sec
Concurrency:        142.32
Successful transactions:        6449
Failed transactions:            0
Longest transaction:         4.16
Shortest transaction:         0.01

After optimizing page caching, removing unused modules from Apache and tuning the server process, the number of transactions per second for a single node has gone from 0.5 to over a hundred. The number of conncurrent requests that can be handled has increased by a factor of 8.

Tuning Apache processes has resulted in a very small decrease in the number of transactions per second, but the longest transaction time has decreased considerably.

Testing the whole cluster

Once I was happy with my new settings, I flushed them through to the entire cluster. I ran more tests with siege, first with 200 concurrent users:


Lifting the server siege...      done.
Transactions:         23218 hits
Availability:        100.00 %
Elapsed time:         59.26 secs
Data transferred:        48.44 MB
Response time:          0.01 secs
Transaction rate:       391.80 trans/sec
Throughput:          0.82 MB/sec
Concurrency:          3.90
Successful transactions:       23218
Failed transactions:            0
Longest transaction:         0.66
Shortest transaction:         0.00

...and then with 800 concurrent users:


Lifting the server siege...      done.
Transactions:         56899 hits
Availability:        100.00 %
Elapsed time:         60.05 secs
Data transferred:       118.59 MB
Response time:          0.34 secs
Transaction rate:       947.53 trans/sec
Throughput:          1.97 MB/sec
Concurrency:        317.44
Successful transactions:       56899
Failed transactions:            0
Longest transaction:         9.71
Shortest transaction:         0.00

Before tuning Apache, the cluster could handle 350 concurrent requests, and the maximum transaction rate was 460 transactions per second. Now the maximum number of concurrent users with 100% success rate is 800. The maximum number of transactions per second is now 947.

I will carefully watch the amount of spare memory over the next few days. If it starts to get too low, I'll reduce some of these settings.

Using siege isn't completely realistic. A request from a browser puts a much larger load on a server than a request from siege, and the timing of tests in siege is different from the ebb and flow of real traffic. Tests done using siege do not predict the number of requests a server can handle. Tests like these do give a basis for comparison of different server configurations. I don't think my cluster can handle 947 actual visitors per second, but I'm confident that my server's performance is better than it was.

Tech blog

My Pages

Saturday, April 2, 2016

Improving cluster performance by tuning Apache

Moving the page cache

Unload unused modules

Lower the timeout

Tune Apache processes

Testing the whole cluster

No comments:

Post a Comment