phpBB

Development Wiki

User talk:Terrye/ACM Stategies

From phpBB Development Wiki

Revision as of 19:40, 19 June 2009 by Toonarmy (Talk | contribs)

file_get_contents() and file_put_contents()

From what I gather from your tests is a simple file_get_contents() would be a) atomic and b) not noticeably slower on the size of file we are operating on. It would be fairly easy to alter the algorithm to work on an array of lines read with file_get_contents() but we cannot use file_put_contents() as that requires 5.0.0. --Toonarmy 09:03, 19 June 2009 (UTC)

It's a bummer about the 4.3.3 minimum baseline for PHP. Surely you guys must be thinking of doing the same as MediaWiki et al and raising the minimum to 5.0.0. Then you don't have to worry about compatibility with the phpBB object syntax, etc.
The reason for the atomicity is that the actual I/O for the entire contents of the file is handled by a single I/O system call and these are atomic with respect to the filesystem. If you look at the benchmarks which do true large random access so that all of the file reads are cache misses then there is little difference because the phys I/O delays totally dominate. Once cached, we are talking the second order effects of API overheads, and for our application these will be in the noise. If you think about the current code (simplified):
   $handle = @fopen($file, 'wb');
   fwrite($handle, '<' . '?php exit; ?' . '>');
   fwrite($handle, "\n" . $expires . "\n");
   $data = serialize($data);
   fwrite($handle, strlen($data) . "\n");
   fclose($handle);
This generate three filesystem API write calls which is why you need the flocks. I think that this should be something like:
   $data="<?php exit; ?>\n$expires\n".serialize($data);
   for ($i = 0; $i<3; $i++)
   {
       if ($handle = fopen($file, 'wb'))
       {
           $n_bytes = fwrite($handle, $data);
           @fclose($handle);
           if ($n_bytes) 
           {
               #  bizarre phpbb_chmod stuff goes here
               return true;
           }
       }
       usleep(rand(1,2000)*1000);
   }
   return false;
This only uses a single filesystem API write call. It also handles the file collision case with up to 2 retries with a 0-2sec delay. Let's say we are hammering the FS and we are doing 1 page per sec. At a collision probablility of 1:1,000,000 we'll still get roughly one per day. At 1:1,000 we'll get hundreds per day. (The app where we found we really needed this collision detect approach was a PHP +W2K3/IIS one where it was being hammered like this.) The number of retries will be small, but now no logical collisions.
On the inbound just do a file_get_contents() followed by a:
   $temp = explode ("\n", $data, 3);
PS are you also a javascript programmer by chance ;-) — Terrye 12:48, 19 June 2009 (UTC)
Primarily a PHP and Java developer, why do you ask? We are considering dropping guaranteed compatibility but we committed to 4.3.3 a long time before it was even considering deprecated. I think I will add a file_put_contents() compatibility function in includes/functions.php and use that wrapped in a small loop as you suggest.