Monday, July 27, 2009

ZendStudio 6.x.x performance issue

I have occasionally been plagued by less than stellar performance while using ZendStudio 6.x.x series. It happened again the other day and I finally made some progress in tracking it down, so let me share.

What happened

Recently I had to work on a fairly large PHP project on my laptop for a while. My normal environment is a 8 core MacPro, the laptop is a Core2Due MacBookPro. In both cases I run VMWare Fusion hosting a Fedora Core 10 guest as my development system. While I always thought things were somewhat slow on the desktop system, but manageable. Not so on the laptop. So I decided to look into it as I was wasting a lot of time. In particular one observed behavior was the right after starting up ZS cpu usages would go to 100% and stay there for quite a while, even though I was not "touching" anything. In fact it could last as long as 35 minutes.

Investigation first led me down the wrong path. My laptop is limited to 2.5GB memory and the VM I was using only had about 1.2GB. There was, however, no paging going on. I finally discovered that a serious amount of disk IO was happening. Then I discovered that a giant file existed in the php plugin's cache directory (/workspaces/DefaultWorkspace/.metadata/.plugins/org.eclipse.php.core/.cache). It was 74M. There was an accompanying file named "includes.xml" which made me suspicious. I peeked inside the giant file and found references to, and the content of, many files that had nothing to do with the project.

The cause

I wondered why ZS was even looking at these files, and then "the light went on." I use a "lib" directory that contains 3rd party libraries that I use, such as the Zend framework. Here is what that directory looked like:
[dolf@fc10 lib]$ ls -l
total 4
lrwxrwxrwx 1 dolf dolf 29 2009-07-27 15:01 jpgraph -> ../libreleases/jpgraph-2.3.3/
lrwxrwxrwx 1 dolf dolf 31 2009-07-27 15:00 krumo -> ../libreleases/krumo-0.2.1a-ds/
lrwxrwxrwx 1 dolf dolf 29 2009-07-27 14:59 Smarty -> ../libreleases/smarty-2.6.19/
drwxrwxr-x 8 dolf dolf 4096 2009-05-06 10:11 wso2
lrwxrwxrwx 1 dolf dolf 56 2009-07-27 14:58 Zend -> ../libreleases/ZendFramework-1.8.3-minimal/library/Zend/
Well, not quite! This is what it looked like after I solved the problem. Before the directories linked to where not in the "libreleases" directory, but in the "lib" directory. I figured that was fine because I included the "lib" directory on the include path and files would be referred to as "Zend/...".

Sure enough the files would be found, but somehow or other, ZS insists on inspecting every single file that is reachable via the include path. Since I had some 4-5 old versions of Zend Framework hanging around there, as well as some older version of the other stuff, the total amount of files inspected was many times more than necessary. After I "solved" this problem the aforementioned include file went from 74M to about 10MB. The 100% CPU period became much, much shorter.

The overall impact of this was even larger than you might think because a laptop's disk throughput is lower and because I had much less memory than on the desktop (8GB physical, 2GB for the VM), there was less use of a disk cache.

Startup settings

During my search I also encountered some ideas about optimizing the startup settings of Eclipse to make sure it had enough heap memory. I did apply those settings and they seemed to improve things as well. Here is the contents of my current ZendStudio.ini file:
--launcher.XXMaxPermSize 300M
-startup
plugins/org.zend.php.startup_1.0.0.jar
-vmargs
-Dorg.eclipse.swt.browser.XULRunnerPath=/home/dolf/Zend/ZendStudioForEclipse-6.1.1/plugins/org.mozilla.xulrunner.gtk.linux.x86_1.8.1.3-20070404/xulrunner
-Xms375M
-Xmx600M
The ones that are of particular interest here are:
  • XXMaxPermSize: The maximum size of the "permanent" generation on the heap. You may need to "play" with this for your situation.
  • Xms: This is the minimum size of the "regular" heap. Initially set it high and observe Ecelipse's "Heap Status" widget "Preferences>General>Show Heap Status". Wait until you have a stable situation typical of your normal workload. You should set the minimum size to something close to what you see here.
  • Xmx: This is the maximum size of the "regular" heap. Set it somewhat higher than the minimum as determined above.
There are potentially many other Java VM settings you can play with, but I'll leave that to you (I didn't touch them).

Saturday, July 25, 2009

My laptop battery bulged and deformed!

Yesterday everything was fine, tonight I noticed my 17" MacBook Pro was wobbling like a table on even ground. I looked under it and the battery appeared to have come loose. I shut everything down and flipped it over. Well… it was a little worse than that.

With difficulty I removed the battery and noticed something inside had swelled up tremendously. Not good. I remembered a battery recall notice from a while ago, so I went looking. I did check at the time but since my battery was not actually exhbiting any symptoms, the Apple web site told everybody to run Battery Update, which I did.

Coming back to the site now I find that while my computer was covered in the plan, it is now too late. The program closed in May 2009. Browsing around in the forums I find plenty of accounts from users that are in the same boat. Bought their laptop in similar time frame and only now does the problem happen. The original article admits a design/manufacturing issue, but yet no replacement.

I decided I wanted to call Apple to see if I could get a replacement battery anyway. Normally I would have gone to the Palo Alto Apple store, but as it so happens I am in West Yellowstone Montana at the moment. The nearest Apple Store is in Salt Lake City, over 6 hours away! No go, at Apple Care (the only phone number really that one can find to call), is closed. I decided to call my "home" store and see what they can do.

The person in the Palo Alto store keeps insisting that batteries do not normally last that long: Tough, buy a new one. He insists I should go to a Store. The message that the nearest store is 100's of miles away falls on deaf ears the first several times. After insisting about the Apple program about these batteries etc. and that this is a safety related issue, he tries to get a "Genius" to help me. They're all busy. I get told to call Apple Care tomorrow and get case number. With that in hand I should call back to see if, maybe they can ship me a battery.

I am somewhat hopeful about getting a replacement as the messages in some boards seem to indicate that it does happen. In fact, some even got them shipped overnight. Of course that is what I am hoping as I am using this machine every day and waiting for the battery for a week would not be good. More later.

UPDATE: 07/27/09
Talked to AppleCare yesterday (I do not have a contract) and after some hemming and hawing that this is "normal" I managed to convince them that I know it is not. Basically what they said is that it is normal, but if the battery deforms there is a small potential for it to damage other parts of the laptop and told me to bring it in to have it looked at. If, and only if, there is damage would Apple then repair the damage and issue a new battery. I explained that I am nowhere near an authorized repair center (turns out the nearest one is 2hours away in Bozeman, MT) and that I had discovered the problem right after it happened (it seemed to be a sudden event), and could find no visible damage. With that in mind, I asked them to see if they could spare me the trip and send a battery.
It took 10 minutes on hold and then it was approved! Called back today and got a tracking #. It tracks and the battery is supposed to be here tomorrow. Great that Apple does work with customers this way, but too bad that if you don't know how to push the buttons, how to call when it seems they won't listen (because you don't have Apple Care), or if you live near a store you are likely to not get the same service.

UPDATE: 07/28/09
Battery arrived, opened the box, put the old battery in, ripped of the label to reveal a return shipping label, and gave the box back to the Fedex guy who was friendly enough to wait a few minutes for me to do that. Done!

UPDATE: 07/01/11
Same thing happened to my wife's slightly older 13" MacBook. Called them again and had to go through supervisor to end up with customer relations (I spoke with Charles). Again they insist this is normal, and every case is handled without regard for previous cases. He saw the case this post is about in the system, but refused replacement in this case. In both cases the system was out of warranty. He refused to tell me why this case was different from the prior case and refused to send a new battery. It appears that if the battery bulged to the point of doing damage inside the laptop, they will take responsibility. Perhaps the advise to customers should be to leave the battery in until damage has been done! It did not seem to matter that I have been a Macintosh customer since 1984, currently have 4 actively used Macintoshes (2 desktop, 2 laptop) in my house, as well as 2 iPhones and three iPods. They are willing to really tick of a customer over this. From now on my message to every potential laptop buying friend will be that after 1-3 years they will have to spend $130 on a battery because Apple does not stand behind its product in a consistent manner. The replacement battery, consequently will not be bought from Apple.

Tuesday, July 21, 2009

WSF/PHP apparent hang: PHP script exceeded 30 seconds

The problem

So I am using the WSF/PHP web services for a project. All was fine until usage of the web services was stepped up. After a while web service calls started timing out PHP (3o seconds exceeded). Regular web pages were fine, so it had something to do with Web services. I was stumped, but at some point I noticed that the log files produced by the wsf_c extension were just a tiny bit larger than 32MB and my suspicion was raised.

I recreated the situation on a development system by creating a dummy log file just under 32M and then doing some requests. I used top to observe the httpd process being busy for 30 seconds and while it was, attached gdb to it. Some stepping and tracing revealed the cause.

The wsf_c extension starts up early when apache starts and loads the worker processes. It loads mod_pdp, which then causes PHP to load the extension. At this point Apache still runs as root. The extension attempts to write a few log lines, causing the log file to be created, owned by "root" and with permissions "rw-r--r--". Next Apache switches away from root (for security reasons), in my case to user "apache". The running httpd worker processes inherit the open file descriptor and now, effectively, have write permission to this file, so all is fine and logging can continue as needed. I can separately argue the merits of this strange write permission, but it is not relevant here.

Logging continues until the log file exceeds 32MB, which is the built in size limit for the underlying Apache Axis code. The code now copies the log file to "logfile.old" and then closes the log file and attempts to reopen it using "fopen(file, "w+")". This would normally truncate the large log file, as intended. However, it fails to reopen because of a permission issue. The process is "apache" but the file is owned by "root". Logging silently fails and things continue.

So what is the problem? Well, the attempt to log one line of output caused 32MB of data to be copied. The next attempt at a line of logging will do the same because the file was never truncated. If you are logging enough lines per request, the time involved in all that (useless) copying will cause your script to exceed 30 seconds, and bingo: it gets cut off. Of course no logging whatsoever happens.

The solution: part I

The obvious solution is for the WSF/PHP folks to fix the problem (perhaps deferring to the Axis folks). Either delay the log file creation until Apache is running as effective uid "apache", or change ownership of the created log file. If history is any indication, this may take a while to get done.

The solution: part II

The work around I have installed is to run an hourly cron script that inspects the file size. When it gets uncomfortably close to 32MB it restarts the server. The theory is that on restart the rotation attempt happens as well, but now it is done as root and everything should be fine. Not soooo. Unless you force the restart right when there is not enough room to do the initial startup logging, the logging will not cause a rotate and nothing will have changed.

The solution: part III

So finally, I modified my script to not restart the server, but to change ownership of the file. I do this as soon as the file appears to exist, regardless of size. Running this once an hour, or so, will not allow the logfile to reach 32MB before the ownership is changed.