Bruce's Tech Blog: July 2011

32bit Solaris performance issues

[2011-05-27]
We have 3 OpenSolaris/OpenIndiana fileservers at work. One of them is an intel core 2 duo with 8gig of DDR2 ram (it has 64bit Opensolaris and works like a champ). The second is a really old Dell pentium 4 with 2.5 gig of memory, and the third is an old HP 4U server (I’ll note down the model number and a pic later). After multiple benchmarks and digging, I’ve come to the conclusion:
Never use 32bit solaris for an ESXi nfs target!!!!
Why you ask? It seems the 32bit kernel has been put to pasture in terms of feature updates (and rightly so) and will scarcely use any of the server’s memory for the ARC (ZFS memory cache). Write performance doesn’t seem to take a hit (that I’ve noticed) but read performance is awful. Even just doing a
echo ::memstat | mdb -k
you can’t even see the ZFS file cache memory allocation on a 32bit installation. Doing kstat monitoring, you see about 64meg to 128meg (pitiful) memory being used short term for the ARC and then cleared (yeah the data doesn’t even stay in memory).
so… if you are like us and severely constrained by budget, resist the temptation to use that old 32bit machine (even if it was an expensive 4U server at the time) for an nfs target for ESXi, your VM’s will thank you.
Incidentally, we use the pentium-4 with a RAIDZ array of 1TB drives connected via USB to store database backups to disk. You heard me right, a USB RAIDZ array. What kind of write performance do you get with such a beast? A whopping 4megabytes/sec. Yes 4. But that’s enough for our purposes to keep transaction/full dumps of our Sybase database for an entire year. (Done in addition to tape backups).
So… 32bit Solaris/OpenSolaris/OpenIndiana should be relegated to disk backups replacing a tape drive more or less. Just my 2 cents.
(the 4U HP server has 9 gigs of ram, of which a whopping 900 megabytes or so actually get used by the OS, what a waste)
[2011-07-16]
I read somewhere online that Oracle has chosen to drop 32bit support from subsequenct Solaris versions (11+). I'm guessing that the openindiana folks are keeping it because they're trying to appeal to desktop users in addition to the server crowd.

ESXi performance issues

[2011-05-23]
Lately at work I’ve been putting some thought into how to make ESXi in our environment perform better. Don’t get me wrong, its doing what its designed to and advertises to be able to do. Its just that I would love to get closer to replacing ALL our servers with VM’s except for hypervisors and fileservers.
For lots of VM’s, the CPU speed and disk read/write speed are not an issue. For example we have a windows server 2003 hosting a website that reads and writes data to/from the production database server, and its fine as a VM.
However, we do compiles on a (different) server 2003 machine and what takes about 1h5mins on the physical machine is taking almost 2 hours in a exact copy (vmware converter) vm. The CPU is rarely maxed out, so the only conclusion is that its continual (small) reads and writes that get bogged down going against the nfs share on a Solaris RAIDZ array. RAIDZ is kind of awful for small frequent reads and writes (performance goes way up on big file transfers) and our “fileserver” is really a desktop machine with 8GB of memory so… you get what you pay for?
Another issue may be too many hands in the same cookie jar. Obviously when you’re using a single RAIDZ array for 5+ VM’s and they’re all doing disk I/O at roughly the same time your performance is going to go to crap. If we had the resources, I’d like to see how 5+ separate mirrored zpools would do in comparison (I would expect a significant increase but at the moment no way to test it). Another consideration might be a single large RAIDZ for vm’s not needing the performance, and 2 or 3 mirrored zpools of SSD’s. Only problem is the SSD’s are definitely out of the budget unless they come down in price a bit first.
Not to mention we also use the same RAIDZ array for CIFS shares for code, excel templates, documents, backups, all kinds of things.
So… what steps can be taken to get that 2 hour compile time reasonably closer to the 1 hour and 5 minutes of a physical (1U rack) server?
(more later and as additional optimization attempts are done)
[2011-07-07]
I just re-read this blog and I realize the obvious solution I didn't mention about adding a SATA SSD directly to the hypervisor. This would be great solution except for the fact that the hypervisor in question is a consumer HP desktop PC with no open drive bays and the $500 ish for the SSD is not in the budget.

cwrsync and openindiana -> windows server 2008 R2

[2011-05-15]
Today I ran into an issue with cwrsync going from openindiana to windows server 2008 r2. If you’re not familiar with it, cwrsync is a port of rsync to windows by the folks at
http://www.itefix.no/i2/cwrsync
I’m sure there are other ports, but cwrsync is free, and seems to support all the functionality (including ssh transport and rsync daemon connections) that the linux/unix versions do, so I’ve decided to go with that.
If you’re not familiar with rsync, its well worth your effort to look into it, if you do any kind of file or directory synchronization across servers. (Yes there are a number of uses even locally but that’s not my focus). It has a windows version robocopy but robocopy does not play as well with linux and unix. (You can use robocopy to/from linux but it requires a samba share)
Anyway, my issue was that I was really stressing the server by trying to do 3 massive directory syncronizations from three source openindiana servers (hosting the esxi vm’s) to a windows server 2008 r2 machine via rsync sender / cwrsync server receiver.
I was getting a network error at seemingly random places on one or more of the OI (openindiana) boxes.
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: read error: Connection reset by peer (131)
rsync error: error in rsync protocol data stream (code 12) at io.c(759) [sender=3.0.6]
I did some research and all I could come up with was setting the timeout= in the rsyncd.conf file on the windows server 2008 r2.
rsyncd.conf
——————-
use chroot = false
strict modes = false
hosts allow = *
log file = rsyncd.log
uid = 0
gid = 0
timeout = 3000
contimeout = 3000
[ydrive]
path = /cygdrive/y/esxi
read only = false
transfer logging = no
timeout = 3000
contimeout = 3000
[zdrive]
path = /cygdrive/z/esxi
read only = false
transfer logging = no
timeout = 3000
contimeout = 3000
I tried values of 30 (which is supposed to be 30 seconds) which didn’t work, then 300 (which didn’t work) and then finally (keeping my fingers crossed) 3000 seconds. Don’t ask me why if the value is in seconds you have to set the timeout= to such a high value to get it to work, but we’ll see if that was indeed the case. (The file transfer in question takes 20+ hours complete over gigabit)
Why on earth are we transferring all that data off OI onto 2008 R2 you ask? (well, I would be) we have a Neo tape drive and Backup Exec software only installs on windows now. (Not referring to the remote agent i mean the machine that drives the physical tape drive) so… we need to
from each OI box:
create snapshot
clone snapshot
rsync —progress —times —update —recursive —delete —z —compress-level=1 —inplace /zpool1/nfs/vm/esxibackup/* XXX.XXX.XXX.XXX::ydrive
from the three openindiana boxes that host our VM’s. (dedicated fileservers)
This is the only way I’m aware of to (easily) backup all your vm’s while they’re still running without manually copying VMX files, then creating snapshots, then copying the VMDK files minus deltas, then deleting the snapshot of every single VM. I realize some people have scripts to do this, but for me trying to get that to work flawlessly on a weekly basis for 40+ VM’s is not a good solution. The snapshot->clone->rsync solution guarantees at least that we get a “exact moment in time” copy. Probably might be issues with a VM with a *non* journaling filesystem, but we don’t have anything without one, so it works for us. (note: we don’t use vm’s for production databases or mail servers)
Anyway, i hope the timeout=3000 idea helps if you run into a similar situation.
I only experienced it when the server was getting really hammered from three other machines rsync’ing to it simultaneously, but your mileage may vary.
(I’m not at work right now, I’ll log on and post the rysncd.conf bits and the results of the timeout=3000 change tomorrow)
[2011-05-17] followup:
The timeout= didn’t help. Two of the rsync’s still crashed with similar errors
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on “FHQ/FHQ_1-flat.vmdk” (in zdrive): Permission denied (13)
rsync error: error in file IO (code 11) at receiver.c(322) [receiver=3.0.8]
rsync: read error: Connection reset by peer (131)
rsync error: error in rsync protocol data stream (code 12) at io.c(759) [sender=3.0.6]
later that night. I am trying various options. It almost seems as if I am either overloading the gigabit switch we’re using or the target server. I am experimenting now with using —compression-level=9 and —sparse (—spare and —inplace are mutually exclusive) to see if that helps and will update this blog tomorrow. (I believe using —sparse instead of —inplace would actually make the difference. There might be an issue with cwrsync and trying to do in place updates to existing files, we’ll see)
[2011-05-18]
still getting similar error messages even after changing to —sparse and (thus) not doing an —inplace anymore. I thought that maybe it had to do with enabling compression on the target windows directory (windows built in compression) causing too much stress on the server (with three simultaneous rsync’s going) but that doesn’t really seem to be the case either (after re-testing with compression off). So I’m left with some problem inherent to cwrsync specific (?) to server 2008 r2 and doing multiple inbound rsyncs at the same time. My solution for now is to stagger the backups from the three NFS servers during the week so only one is running at a time such that we can have a weekly backup to tape on sundays. I’m sticking a fork in it, cause I’m done messing with cwrsync to get it to work. Shame really cause it would be a much more elegant solution to be able to schedule all three servers to backup during overlapping time windows. c’est la vie.
[2011-05-23]
No problems since staggering the backups so that only one is running at a time.
Works out to be a fairly hands free (hopefully) trouble free backup solution for all the VM’s to disk and tape backup.
[2011-05-27]
It appears —whole-file (essentially turning off the file delta algorithm built into rsync) works much better with cwrsync than trying to let it update only the parts that changed.
[2011-07-07]
Just a quick update. Since switching to --whole-file, the backups are working like a charm. They run from bash scripts set up in crontab. I also installed "mutt" to enable emailing myself and another system admin when the backup is finished with a confirmation message and the rsync log file as an attachment. (cuts down on logging in the check on backup status)

Thursday, July 7, 2011

32bit Solaris performance issues

ESXi performance issues

cwrsync and openindiana -> windows server 2008 R2