Archive for September, 2009
vSphere 4.0 Fault Tolerance (Architecture Diagram, Video and Use Cases)
This is a response to the new vSphere Blogging contest that was announced in the middle of this month. I truly think that it's a cool idea, and I believe that regardless of winning or losing, the excitement and fun a blogger would have during his/her participation is something awesome by itself.
The rules say that my post need to be compact and straight to the point, so I won't be able to cover all the aspects about something huge like FT. If by any chance I failed to write such a short post, then here are some tips to avoid wasting your time:
1 – If you are one of those people who wear a tie at work, you should jump straight to the "Use Cases" section.
2 – If you are one of those people who are using the words: 10GbE, VMkernel and %RDY, then you probably don't have time for this, but take my advice and have a look on the next two sections.
Fault Tolerance Architecture Diagram:
From the newly launched vSphere Blog on VMTN I quote this part: "[..] they do say a picture is worth a thousand words [..]", well, I believe I've said that also once before on a previous post, and in fact this is the whole concept my blog is built on. So here is a blueprint for the FT with my own tweaks to save you (and me) a thousand words describing how FT works and how it's architectured.. (BTW, this is the first of vSphere blueprints to come):
Fault Tolerance Video Demonstration:
Thank god the contest rules mentioned the possibility of republishing an old content; otherwise I would have been rerecording and video editing this from the scratch, which is a kind of nightmare. I published this video back in April this year when the vSphere was just announced by VMware (the bits were not even available for download at that time), this means it's one of the very first videos ever published about this cool new feature.
Before you hit the play button, let me tell you why this video is deferent from many of the other ones that I've seen later on:
1 – In my scenario, I have three ESX hosts in a cluster rather than two as you may see in most of the FT demonstrations. What is so special about that? Well, it clearly shows the true concept of the "continues availability", where in case of a complete ESX host failure, the FT will not just failover to another host, but will also automatically assign a third host in the cluster to protect itself in case of another host failure until the SysAdmin attend to the incident.
2 – I'm using in this video a continues file copy to the protected VM throughout the host failure process. This is to show you a "real-life" scenario where your VM is busy doing something critical (backup for example). You really don't play movies in your mission critical VMs (I think Microsoft is the one who invented this idea in their Hyper-V live-migration demonstration, kind of weird!)
My Real-life FT use cases:
I'm taking off now my "VMware Evangelist" hat, and putting on the "VMware Customer" hat. What you'll read here is my real-life use cases for the FT, no marketing talk, no political debates. This "is" the real deal:
1 – Blackberry Enterprise Server & RoveIT Mobile Admin:
BES is one of our most business critical applications because it's being used by our higher management in their day-to-day communications. Initially we were depending on HA since we didn't think that our luck would be that bad to have an ESX host failure while one of the executives sending an email.
This continued to be the case until we deployed the RoveIT Mobile Admin & vCenter Mobile Access (with BES/MDS in the backend). We basically wanted to have a 24/7 access for our SysAdmins to our entire IT environment (including VMware) while they are on the go, using their Blackberry smart-phones (given by the corp for this specific purpose). This was mainly to improve our response time for emergency situations, and of course this service makes no sense unless it can tolerate the most severe situations of hardware failures. Enabling FT on both the BES and the Mobile Admin VMs allow us, from one hand, to ensure that our executives will never complain that they can't use their Blackberry whenever they need, and that "IT Suck". From the other hand, we, the IT suckers..er..i mean SysAdmins & consultants, can have a piece of mind that we will always be able to get to our backend systems wherever there is a problem that requires an immediate attention.
2 – ManageEngine Application Manager:
We heavily depend on the ManageEngine Application Manager in our environment, where we get real-time emails and SMS notifications for any issues happening either in the OS layer (e.g. disk usage, service status ..etc) or the applications (e.g. Exchange high local queue, MS SQL DB issues ..etc). In order to maintain this level of real-time notifications, we had to put this application in a very high availability. Although the application comes with optional cluster capabilities, the VMware HA really was doing this trick without paying extra money. In both cases (the cluster option or the ESX HA) if an ESX host fails, we will have to wait approximately 10min for the application to be powered and operational on another host in the cluster. This is not realistic for an application that is supposed to tell us that the ESX has failed at the first place. With FT we are able to have the application up & running all the time with no interruption whatsoever, and consequently send us the notifications of any Host/OS/Application issues no matter what happens across the underlying infrastructure.
3 – Custom Application – Online payment gateway:
We have an online customer payment service consisting of a custom written application integrated with the IBM Websphere MQ and a backend Oracle DB. Everything is in high availability as you would expect, except for the custom application! I must add also that it is poorly written that it needs human intervention every time the VM needs to be rebooted in order to bring it up again. That being said, HA is not even an option in case of host failures. Unfortunately the application developer does not know how to address these issues in his application, and we are stuck with that fact since he's working with the same backend payment gateway provider. We came up with two solutions for that:
a) The Long run: gradually migrate the online service to a new system with a new backend payment gateway. We are around 30% now on this new service.
b) The short run: put the custom application on FT enabled VM where we don't have to suffer from any unplanned downtime associated with the VM and/or the host.
The conclusion:
FT is a "must have" not a "nice to have" feature in any environment. I don't really understand the big debate around it from the so-called "experts" who have been flooding us on twitter or the blogosphere about reasons why it's "not enterprise ready yet". Most of these debates are coming really from people who have not seen enough of these enterprise environments they are talking about and the challenges we have every day with scenarios like the ones I've listed above. Surely enough, FT has a quite long list of limitations that you can find on any of these blog posts (or on the VMware website itself), but you should also know that VMware is working on most of these limitations in future releases. The number one limitation that you will always hear about is the (1 vCPU) restriction for the FT enabled VM, well, let me tell you two things about that to finish up my article:
1 – The vast majority of the applications running in any datacenter do not need, or even make use of SMP. My three use cases above are examples for that.
2 – VMware has published recently this blog post showing how a 1vCPU VM based on the revolutionary Intel Nehalem processor, can perform better than 2vCPUs using older generations.
P.S. This is probably one of my largest blog posts. I'm disqualified from the contest.
How do you download your VMworld 2009 Sessions?
Few days back I "tweeted" about my VMworld 2009 sessions download progress, where I had around 10GB worth of FLVs & PDFs files at the time. I then got quite few "DMs" (if you are not using twitter pls ignore my weird abbreviations) asking me about the way I download these sessions since there is no obvious way to download/automate this.
I don't have this "automation" per say, but at least I managed to find an easy way (well, kind of) to download these sessions fast enough. That said, I'm sharing herein my method, and if you have a better/faster way, please share it with us, the community, in the comments below:
- I use here Firefox as my browser with a couple of add-ons that you need to have. The first is: FlashGot and the second is: ReloadEvery
- I open first the VMworld 2009 sessions main page, and then right-click and choose from the context menu the Reload page option and set it to 1 min. (I'll clarify the reason in a bit)
- If you don't have a download manager at this point, you need one. I use Getright for years now, because I simply love it. It's also compatible with the FlashGot plugin mentioned above.
- Now to the hardest part, I need to go to each session page and then click Crtl+F7 to grab the FLV file from the page, which will be passed to Getright for queuing and download.
- After you have all your FLV/PDFs files listed in Getright you will be nearly done, just start the auto download and let GR do its thing where you don't have to follow anything else.
- Now, since you'll be probably away from your PC, your web session will expire and this will cause your Getright download connections to timeout. This is why you need to keep the main download page open with the reload option set as described in the second point above.
- There is a nice feature in GR, which is the segmentation, where you can segment your file download to more than one connection, and consequently have an insane speed per each FLV file.
If you are wondering why on earth would you go through all that if you can just go online and view the session whenever you want, then maybe you need to read Duncan Epping's post here. I initially thought that I'm the only one who's doing that, but apparently not
. The PDF indexing part is indispensible for me, and I "always" start searching the VMworld sessions for any subject that I need to have more information on, then I start exploring other materials like documentations and whitepapers.
I hope any of you have easier approach for doing that, if not, then at least this can get you started. May be XtraVirt will write for us another incredible tool like the one they have for the VMware documentations to auto download these sessions on the fly, who knows
A new beautiful Self-Paced eLearning offering from VMware Education
I don't think many of us know that VMware Education has a great new offering, which is the self-paced eLearning. This All-Access eLearning subscription offering provides an individual unlimited access for one year to the following Self-Paced online courses, as well as others that may be added over the course of the subscription term:
- VMware ESXi: Administration
- VMware Workstation 6: Mastery
- VMware Site Recovery Manager (SRM): Fundamentals
- VMware Stage Manager: Fundamentals
- VMware Lifecycle Manager: Fundamentals
- VMware Infrastructure 3: What's New in VI3.5
- VMware Server 1: Quick Start Guide
- VMware ACE 2: Quick Start Guide
- VMware vCenter AppSpeed Fundamentals
- VMware vCenter Chargeback Fundamentals
- VMware vSphere Fundamentals
- VMware vSphere: What's New in vSphere 4
When I first knew about these courses I said to myself, well they are pretty basic so they won't make much sense to me, as a VMware professional, but they might be a good resource for my colleagues in my existing employer. I went to the management and told them that we have a spare training credits that I need to use for purchasing this subscription, and my argument was that everything is turning virtual in our environment, and that we need to increase the virtualization awareness across the team even for the members who are not directly involved in the virtual infrastructure.
After the subscription was complete, I accessed these courses out of curiosity, and to my surprise I found them really useful even for myself! I didn't go through everything yet, but for example, in the "what's new in vSphere 4" course I found a great deal of high quality information there! Even though I've been reading, studying and evaluation vSphere since it was in early Beta, I still found lots of new stuff that I didn't know about. Not to mention the wonderful layout of the training courses and the ease of follow-up across the deferent subjects it represent. I also loved the architecture illustration of the deferent technologies in these courses, very advanced and easy to understand in the same time.
I'll include here some screenshots to give you a high-level idea on how these courses look like, but still, these screenshots won't reflect the real look and feel of these courses with all the rich animation and audio contents. I'm not 100% sure if this is fine by VMware or not, I mean publishing these screenshots, so if there are any issues with that please drop a comment or send me an email, and I'll take them offline.









