Tuesday, July 21, 2009

WSF/PHP apparent hang: PHP script exceeded 30 seconds

The problem

So I am using the WSF/PHP web services for a project. All was fine until usage of the web services was stepped up. After a while web service calls started timing out PHP (3o seconds exceeded). Regular web pages were fine, so it had something to do with Web services. I was stumped, but at some point I noticed that the log files produced by the wsf_c extension were just a tiny bit larger than 32MB and my suspicion was raised.

I recreated the situation on a development system by creating a dummy log file just under 32M and then doing some requests. I used top to observe the httpd process being busy for 30 seconds and while it was, attached gdb to it. Some stepping and tracing revealed the cause.

The wsf_c extension starts up early when apache starts and loads the worker processes. It loads mod_pdp, which then causes PHP to load the extension. At this point Apache still runs as root. The extension attempts to write a few log lines, causing the log file to be created, owned by "root" and with permissions "rw-r--r--". Next Apache switches away from root (for security reasons), in my case to user "apache". The running httpd worker processes inherit the open file descriptor and now, effectively, have write permission to this file, so all is fine and logging can continue as needed. I can separately argue the merits of this strange write permission, but it is not relevant here.

Logging continues until the log file exceeds 32MB, which is the built in size limit for the underlying Apache Axis code. The code now copies the log file to "logfile.old" and then closes the log file and attempts to reopen it using "fopen(file, "w+")". This would normally truncate the large log file, as intended. However, it fails to reopen because of a permission issue. The process is "apache" but the file is owned by "root". Logging silently fails and things continue.

So what is the problem? Well, the attempt to log one line of output caused 32MB of data to be copied. The next attempt at a line of logging will do the same because the file was never truncated. If you are logging enough lines per request, the time involved in all that (useless) copying will cause your script to exceed 30 seconds, and bingo: it gets cut off. Of course no logging whatsoever happens.

The solution: part I

The obvious solution is for the WSF/PHP folks to fix the problem (perhaps deferring to the Axis folks). Either delay the log file creation until Apache is running as effective uid "apache", or change ownership of the created log file. If history is any indication, this may take a while to get done.

The solution: part II

The work around I have installed is to run an hourly cron script that inspects the file size. When it gets uncomfortably close to 32MB it restarts the server. The theory is that on restart the rotation attempt happens as well, but now it is done as root and everything should be fine. Not soooo. Unless you force the restart right when there is not enough room to do the initial startup logging, the logging will not cause a rotate and nothing will have changed.

The solution: part III

So finally, I modified my script to not restart the server, but to change ownership of the file. I do this as soon as the file appears to exist, regardless of size. Running this once an hour, or so, will not allow the logfile to reach 32MB before the ownership is changed.

No comments: