User:Terrye/I O Performance Impacts on phpBB

Some of the technical background
A few years ago I wrote up a overview paper of the technical issues which underpin this discussion for my senior engineers and put a précised version in the public domain (see Performance Collapse in Systems). I had to sanitise some of the quoted examples for confidentiality reasons. Nonetheless it is still worth a read if you want to know why complex computer systems sometimes 'die on their feet'. A summary is that a computer system is made of a complex hierarchy of intercommunicating sequential processes. Each component process services a set of requests at some arrival distribution with a mean arrival rate (μ) and is capable of serving them at some service distribution with a mean service rate (λ). If it is unable to serve a request immediately then that request must be queued. The ratio μ/λ is called the traffic density (ρ), and this ratio is key queue size that forms and therefore to the total processing delay through the process. If ρ is less than 0.5, say, then the queues into the process will be almost always empty and the requests turned around immediately.

For any given system you can plot the average queue length against ρ. The average length will increase as ρ increases towards unity: steadily at first; but at some point there is a knee in the curve and the queue size will explode towards infinity. When this happens (and it can be hidden deep inside the system), the upstream processes all start to back up, input queues start overflowing, requests time out, and any retry logic that programmers add to 'improve' resilience can actually make things worse causing a gridlock (as anyone stuck for hours in a traffic jam after a lorry shed its load can testify). All queuing systems all have this same macroscopic curve with a knee bend, though the shape and location of the knee can be accentuated by bursty distributions at either the arrival or service sides, and can be softened by having multiple queues, priority schemes, etc. It is very easy to get into games tweaking system and application parameter to move the knee slightly, but the reality is that the behaviour of any real system is chaotic and incredibly sensitive to small changes of ρ around the knee, so tweaking is usually a waste of time. If you system has any subsystem is near its knee then the only real answer is to address the arrival distribution and drop μ or increase the system resources to fix the service side λ.

So why is this relevant to phpBB performance? Well there are many possible subprocesses / systems where this queue blow-up can occur, but in such single server applications this almost invariably occurs where the application is hitting a hard constraint such as it needs more RAM than is available or where physics gets in the way. Moore's Law-like trends have dominated most areas of IT over the last four decades. The older among set us will still remember the first 5Mhz IBM PC XT with 128Kb RAM and 10Mb HDD). Memory and disc capacity, processor speed have all increase over 5 orders of magnitude in this last thirty five years, but the time to read a dozen random blocks from the disk has decreased by less than one order.  Put simply, whereas the CPU could once execute a few million instructions whilst waiting for the disk to read a file, it can now execute billions:
 * Each time your application goes to the disc, this is like going back 2 decades in performance terms.

Caching Technologies
Caching technologies are the main mechanism used to mitigate and sometime effectively to eliminate the consequences of these disparate performance characteristics. For this page, I want to focus on the system and application caches which mitigate the effect of stepping down from the speed of light to 7200RPM; of avoiding disk I/O. Caches can be located in many layers: the phpBB application itself, the database, in the file system and even in the controller of the physical disk itself, and the bandwidths and latencies can vary quite dramatically across these. In the case where an application needs to read or to write a piece of data that is primarily located on disk, there are three main alternatives that can occur:


 * Cache Hit. The data has been previously read into memory and a copy still exists there.  The data can be copied directly from memory without reference to the disk.
 * Cache Miss. The no copies of the data exist in memory so must be read from disk.  This is intrinsically a synchronous activity.  A disk read is scheduled and the process which has requested the data is put into an I/O wait-state pending its retrieval.  In scheduling terms, the processes compute time-slice is terminated and rescheduling occurs to allow other processes to take control of the processing resources.
 * Cache Write-through. Unlike reads, it is entirely practical to adopt an asynchronous strategy for writing data back to disk, so the process doesn't have to wait at all in this case.  However to maintain consistency the data is written through the cache to ensure that subsequent reads access the correct data.

Various strategies can be adopted to minimise the probability of cache miss, such as write-through as discussed and block read-ahead; with this the application might only request a few bytes but the OS (and the disk controller) do a minimum read-ahead of a significantly larger data block, because the incremental cost is low and the typical applications read data serially from HDD. File system also I include a supporting structure where metadata such as directories an Wikipedia:Inode|inodes) are known to be 'hot' and are therefore prioritised. The HDD itself is the main latent bottleneck and therefore file systems apply such tricks as batching up read and write using elevator algortihm's to minimise the average seek distance and therefore increasing throughput, albeit at a potential consequential increased delay for individual I/O operations.  As well as the data itself, the system has to write and read metadata such as inodes and journal records, which add to the physical I/O required; for this reason creating a new small file an expensive compared to writing the same data into an existing file.  Write operations have a critical impact on disk queues because no matter how effective the file system cache is, dirty blocks must ultimately be written back to disk and once the transition ratio goes past the λ knee, the queues build up and applications will block on all writes.

One of the major design issues is that of cache coherence, that is ensuring that when there are multiple path to the master data that the copy in the cache doesn't become stale and out of sync with the master. The designers of file systems, databases, etc. go to great lengths to ensure such coherency. It is also important to do this at an application level as well if data caching is being carried out in the application.

Type of Cache
There are four main levels of cache involved in your phpBB application:
 * File System Cache
 * As previously mentioned, modern file systems (both *nix and Windows NTFS) maintain large directory and buffer caches in memory . These enable very efficient access to frequently used files and in many logical read operations these obviate the need for disk I/O altogether.  However, any file updates do have to be flushed back to disk in order to maintain file system on-disk integrity  Under normal circumstances such writes are asynchronous to the application.  However the overall seek / transfer rate on to physical devices represents a hard physical upper constrains and the application can reach a tipping point where the demand flush rate exceeds the corresponding, with the application becoming I/O bound at that point.  The main Linux reporting tool is iostat -x and the excellent Windows Perfmon can provide very useful performance graphic of file system and physical HDD indicators. If the physical disk queue lengths are averaging more than about 0.5 then this is an indication that that system is becoming I/O bound, and it is getting close to that knee.


 * MySQL Cache
 * The MySQL database engine maintains three main caches to accelerate performance. As SQL data is structured, large metadata structures known as key indexes are used to navigate and access the data.  When practical sufficient key buffers should be allocated to buffer all key indexes in memory; when this is achieved the situation and if the tables are properly indexed (as is the case with phpBB), the MySQL engine can use in-memory indices when executing a query to work out precisely what table rows need to be read.  Any data records read can also be buffered in data buffers, and in the case of update does smart placement, both further reducing the need to go the physical disk.  Lastly the result sets for queries which have fixed embedded parameters (as is case for all phpBB queries) are also themselves cached.  The coherency algorithm is simple: the result sets which depend on table X are dropped whenever X is updated.  Nonetheless, this strategy is surprisingly effective especially for low volatility tables.  This algorithm  is similar to one I discuss below for phpBB SQL caching.  On my systems the indexes are fully cached and the Query cache hit rate is over 50%.  The following commands indicate the effectiveness of these strategies (the global modifier reports on the server as whole and not just the current session).

mysql> show [global] status like 'Key%'; # Show Key cache performance indicators mysql> show [global] status like 'Qc%'; # Show Query cache performance indicators


 * Note that the database layer is discussed here in terms of MySQL, but these general principles apply to all mainstream datases.


 * PHP Cache
 * This has two facets. The PHP architecture splits the execution of PHP code into two stages: the compilation of source into bytecode and its subsequent execution.  This facilitates caching within Apache PHP Engine, so that the relatively expensive compile operations can be carried out on a just-in-time basis and stored in a cache.  Since complex applications such as phpBB, Drupal and MediaWiki have a large source base, this compilation would normally take perhaps 50-75% of CPU cycles; an effective caching strategy can therefore result in a 2-4 x decrease in CPU loading ,.  More importantly when the cached bytecode is available in a PHP accelerator, the PHP engines doesn't need to load the source files to compile them.  There are a number of PHP accelerators available but phpBB is tested against the main two: APC and Xcache.  (See Figure1 for the benefits of PHP code caching).  APC and XCache both have APIs and toolsets to enable you to monitor cache performance.  The APC, Xcache and Memcache can also store and retrieve application data by key and this can be used by a phpBB Application.


 * phpBB Application Cache
 * Individual applications can also maintain their own caching strategies. In the case of phpBB, the primary repository for all data is the database; this includes session context which is maintained in a couple of session tables.  (The PHP run-time system's session cache is not used by phpBB.)  However a lot of data needed to prepare the phpBB webpages is pretty constant and rarely changes from page to page.  The application uses this application context to cache such data locally in its ACM cache.


 * See also: User:Terrye/ACM Stategies