Archive for the ‘Tutorials’ Category
vSphere 4.0 vNetwork Distributed Switch (vDS) – Video Demonstration + Architecture Diagram
A Boring Introduction:
It's been a crazy week! A lot of stuff is happening right now at my work, personal life, and my career. For example, I'm building our much-awaited "Private Cloud" at work, using both the ultimate vSphere Cloud OS and the rock-solid IBM hardware that was finally delivered this week. But wait, this is not 'the' exciting thing that was happing this week for me, it's most definitely the news that I've received last Thursday about winning the first round of the vSphere blogging contest. I will not thank John Troyer & Mike Adams for their great idea and their incredible efforts for organizing this contest, and I will not thank Deepak Narain, the man behind this blog existence who kept pushing me to lunch it a year back (more on this soon), I will, instead, thank everyone for their kind words and encouragement (including the names I've just mentioned), I was literally thrilled by the emails, blog posts and "tweets" that were thrown at me since the news was out. THANK YOU!
Now, enough of this boring talk about me, myself and I, and let's get started with this new round of the blogging contest. A heads up first: I was not supposed to participate this week since I've been so busy as you see, but I had a 24 hours after canceling some plans that I had at the last minute. That said, what you see here is not quite final, I believe I need to work more on the diagram especially the IO plane layout in the hidden vSS, and probably add a couple of more configurations to the video to show some cool stuff like the consistent network stats of a mobile VM jumping from an ESX to another. I'll be updating all these stuff hopefully during next week.
The Configuration Video:
The Architecture Diagram:
Another vSphere diagram! I told you, you are going to see these blueprints more than any time before. Quick notes:
- This is an A3 scale diagram in case you want to print it.
- The diagram reflects the exact configuration on the video. I've done this intentionally to make it easier and faster for any one new to the vDS to understand the concept and the various configuration aspects.
- As I mentioned above, due to the very short period of time that I had, I will most probably modify small parts in the diagram to achieve better results. You can come back and check the version number of the diagram to download the latest updates.
MASTER IT!
I love this part at the end of any book/chapter published by SYBEX. It gets down and dirty with all the theoretic parts covered, and guide you through a practical path to try what you've learned. This is what I want to do here as well. The vDS is quite confusing as a concept and configuration for the first time, and I personally didn't get it except when I started getting my hands on it and playing around with the configurations. The challenge here is that you probably won't have the required lab to do this, especially that you need large number of NICs to test all the configurations. If you are one of my regular blog readers, you've probably guessed what I'm getting to. It's the "vSphere in a box"!
Around three month back, I published a series of posts talking about building a vSphere configurations using ESX inside itself. Instead of rewriting the whole story again, here is the links for your consideration. One last thing to note here: the entire lab you've seen in the video was built using Lab Manager 4.0 as you will read in the following posts.
- vSphere in A Box: A "Virtual Private Cloud" Blueprint
- vSphere In A Box: Part (2): Putting the pieces all together
- vSphere In A Box: Part (3): The Lab Manager 4.0 Automation
Special Thanks:
I'd like to thank Duncan Epping for reviewing part of the contents here. I was having some doubts about few points and due to the time constrain, I didn't have the time to research more on them. I asked for Duncan's help and he was very kind to do so.
Additional Recourses:
These are the best resources that I've found so far for the vDS:
- WitePapers: VMware vNetwork Distributed Switch: Migration and Configuration
- VMworld 2009 Sessions: TA2525, TA2105
- Blog Posts: Eric Sloof, Barry Combs, Luc Dekens, ICT-Freak
vSphere 4.0 Fault Tolerance (Architecture Diagram, Video and Use Cases)
This is a response to the new vSphere Blogging contest that was announced in the middle of this month. I truly think that it's a cool idea, and I believe that regardless of winning or losing, the excitement and fun a blogger would have during his/her participation is something awesome by itself.
The rules say that my post need to be compact and straight to the point, so I won't be able to cover all the aspects about something huge like FT. If by any chance I failed to write such a short post, then here are some tips to avoid wasting your time:
1 – If you are one of those people who wear a tie at work, you should jump straight to the "Use Cases" section.
2 – If you are one of those people who are using the words: 10GbE, VMkernel and %RDY, then you probably don't have time for this, but take my advice and have a look on the next two sections.
Fault Tolerance Architecture Diagram:
From the newly launched vSphere Blog on VMTN I quote this part: "[..] they do say a picture is worth a thousand words [..]", well, I believe I've said that also once before on a previous post, and in fact this is the whole concept my blog is built on. So here is a blueprint for the FT with my own tweaks to save you (and me) a thousand words describing how FT works and how it's architectured.. (BTW, this is the first of vSphere blueprints to come):
Fault Tolerance Video Demonstration:
Thank god the contest rules mentioned the possibility of republishing an old content; otherwise I would have been rerecording and video editing this from the scratch, which is a kind of nightmare. I published this video back in April this year when the vSphere was just announced by VMware (the bits were not even available for download at that time), this means it's one of the very first videos ever published about this cool new feature.
Before you hit the play button, let me tell you why this video is deferent from many of the other ones that I've seen later on:
1 – In my scenario, I have three ESX hosts in a cluster rather than two as you may see in most of the FT demonstrations. What is so special about that? Well, it clearly shows the true concept of the "continues availability", where in case of a complete ESX host failure, the FT will not just failover to another host, but will also automatically assign a third host in the cluster to protect itself in case of another host failure until the SysAdmin attend to the incident.
2 – I'm using in this video a continues file copy to the protected VM throughout the host failure process. This is to show you a "real-life" scenario where your VM is busy doing something critical (backup for example). You really don't play movies in your mission critical VMs (I think Microsoft is the one who invented this idea in their Hyper-V live-migration demonstration, kind of weird!)
My Real-life FT use cases:
I'm taking off now my "VMware Evangelist" hat, and putting on the "VMware Customer" hat. What you'll read here is my real-life use cases for the FT, no marketing talk, no political debates. This "is" the real deal:
1 – Blackberry Enterprise Server & RoveIT Mobile Admin:
BES is one of our most business critical applications because it's being used by our higher management in their day-to-day communications. Initially we were depending on HA since we didn't think that our luck would be that bad to have an ESX host failure while one of the executives sending an email.
This continued to be the case until we deployed the RoveIT Mobile Admin & vCenter Mobile Access (with BES/MDS in the backend). We basically wanted to have a 24/7 access for our SysAdmins to our entire IT environment (including VMware) while they are on the go, using their Blackberry smart-phones (given by the corp for this specific purpose). This was mainly to improve our response time for emergency situations, and of course this service makes no sense unless it can tolerate the most severe situations of hardware failures. Enabling FT on both the BES and the Mobile Admin VMs allow us, from one hand, to ensure that our executives will never complain that they can't use their Blackberry whenever they need, and that "IT Suck". From the other hand, we, the IT suckers..er..i mean SysAdmins & consultants, can have a piece of mind that we will always be able to get to our backend systems wherever there is a problem that requires an immediate attention.
2 – ManageEngine Application Manager:
We heavily depend on the ManageEngine Application Manager in our environment, where we get real-time emails and SMS notifications for any issues happening either in the OS layer (e.g. disk usage, service status ..etc) or the applications (e.g. Exchange high local queue, MS SQL DB issues ..etc). In order to maintain this level of real-time notifications, we had to put this application in a very high availability. Although the application comes with optional cluster capabilities, the VMware HA really was doing this trick without paying extra money. In both cases (the cluster option or the ESX HA) if an ESX host fails, we will have to wait approximately 10min for the application to be powered and operational on another host in the cluster. This is not realistic for an application that is supposed to tell us that the ESX has failed at the first place. With FT we are able to have the application up & running all the time with no interruption whatsoever, and consequently send us the notifications of any Host/OS/Application issues no matter what happens across the underlying infrastructure.
3 – Custom Application – Online payment gateway:
We have an online customer payment service consisting of a custom written application integrated with the IBM Websphere MQ and a backend Oracle DB. Everything is in high availability as you would expect, except for the custom application! I must add also that it is poorly written that it needs human intervention every time the VM needs to be rebooted in order to bring it up again. That being said, HA is not even an option in case of host failures. Unfortunately the application developer does not know how to address these issues in his application, and we are stuck with that fact since he's working with the same backend payment gateway provider. We came up with two solutions for that:
a) The Long run: gradually migrate the online service to a new system with a new backend payment gateway. We are around 30% now on this new service.
b) The short run: put the custom application on FT enabled VM where we don't have to suffer from any unplanned downtime associated with the VM and/or the host.
The conclusion:
FT is a "must have" not a "nice to have" feature in any environment. I don't really understand the big debate around it from the so-called "experts" who have been flooding us on twitter or the blogosphere about reasons why it's "not enterprise ready yet". Most of these debates are coming really from people who have not seen enough of these enterprise environments they are talking about and the challenges we have every day with scenarios like the ones I've listed above. Surely enough, FT has a quite long list of limitations that you can find on any of these blog posts (or on the VMware website itself), but you should also know that VMware is working on most of these limitations in future releases. The number one limitation that you will always hear about is the (1 vCPU) restriction for the FT enabled VM, well, let me tell you two things about that to finish up my article:
1 – The vast majority of the applications running in any datacenter do not need, or even make use of SMP. My three use cases above are examples for that.
2 – VMware has published recently this blog post showing how a 1vCPU VM based on the revolutionary Intel Nehalem processor, can perform better than 2vCPUs using older generations.
P.S. This is probably one of my largest blog posts. I'm disqualified from the contest.
vSphere In A Box: Part(2): Putting the pieces all together
The response to my previous post has been unreal! The amount of tweets, ping backs, hits, linking, emails was quite amazing. I have to admit, I didn't expect to see that much of interest in the subject, but thanks to everyone who participated in promoting this idea on twitter and the blogosphere.
In the second part of this series we will spice things up a bit and explore through the following video many aspects of the idea we've talked about. Following to that an important screenshots and some considerations you should be aware of before and after implementing this in your lab.
What you'll see in this video:
1- Deploy a thin-provisioned vESX(i) VM from a template.
2- Check the required configuration parameter on the vESX to run nested VMs.
3- Customizing the new vESX server (assign password, set a static IP, put the DNS config ..etc)
4- Add the vESX to an existing HA Cluster.
5- Test the vMotion within the same cluster and across deferent clusters in the datacenter.
6- Add the required configuration parameter on the nested VM for enabling the FT.
7- Enable the FT and test the failover across deferent vESX servers.
A Quick note on the hardware used:
Technically speaking, only one server can be used in this whole setup. What is deferent in my setup (as you saw in the diagram) is that I use an external iSCSI array (the CLARiiON AX4) for hosting the vESX VMs. I just needed the flexibility to have them on an external storage to share them later on with other servers, but it is not a requirement. You can simply use the internal storage of the pESX server to host your vESX and you will still have everything you see in the video. As far as the shared storage for the vESX servers is concerned, you can use the Celerra VSA. In my case I use the Celerra only for the SRM labs to do the replication trick. Other than that, I use the OpenFiler as my shared storage for the nested VMs.
Two things to be set on the pESX hosts:
1 – Increase the number of ports on your pESX vSwitch to accommodate the increased number of connections required by your vESX VMs.

2 – Enable the "Promiscuous Mode" on the vSpwitch

The configuration parameters on the Virtual Machines are as follows:
1 – The virtual ESX (vESX) host: monitor_control.restrict_backdoor = TRUE

2 – The nested VM: replay.allowBTOnly = TRUE

The iSCSI vSwitch on the pESX host bound to vmnic1 (phisical NIC 2) and connected to the EMC CLARiiON AX4 iSCSI array

The vMotion internal vSwitch on the pESX host

The Fault Tolerance internal vSwitch on the pESX host

The thin-provisioned ESX(i) size on disk (475MB) + the memory swap:

The thin-provisioned ESX(i) size on disk(3.5GB) + the memory swap:

Other considerations and GOTCHAs:
1 – When enabling the FT, make sure you have your VMs powered off, even if they have eager zeroed disks, I used to get some errors when the VMs were powered on while enabling FT.
2 – Sometimes the network card order and numbering could be confusing. For example, in the vESX VM, you will have the NICs order starting from 1 to 10, but in the actual vESX network configuration tab, you will find the NICs starting from 0 and counting towards 9. This could be confusing when mapping your vnics to the pESX host vSwitches, like the VMotion internal switch, the FT internal switch and so forth. Just make sure you count the nics order accurately.
3 – Deploying vESX from templates may be cool and fast, but it could a bit challenging sometime in troubleshooting the network related issues. The reason behind that is the fact that all the network cards will have the same MAC address, or worst, the Port Groups like (VMotion) could have the same MAC address even if you completely remove the vnics and created brand new ones. The work around for that is to create the vESX template, and then remove all the NICs form it. When you deploy a new vESX, you can just add the new nics as you like, and by that you'll have a new MAC addresses. Beside that, you may need to add a new VMkernel network for the VMotion, and then remove the old one. Of course you may be thinking that deploying a brand new vESX would be easier, you are right, but with the scripting everything could be automated. I will try to write a PS script to automate this network changes/settings and post it here later on.
Video: vSphere4 Unleashed: 06 – Storage vMotion
A new, thin and compact video! Here we'll demo the new GUI based storage vMotion feature built-in vSphere. Yes, GUI based and built in, no more command lines and hassle. It's fast, it's easy and it's very reliable, especially when doing the thin/thick conversions while you're migrating the VMs.
Happy viewing!
Video: vSphere4 Unleashed: 05 – Hot-Add and Thin-Provisioning
In this video I'll be demonstrating the following in order:
- Enable the Hot-Add/Plug in the virtual machines.
- Hot-Add memory for a VM and increase it from 512MB to 1GB.
- See from the VM's Task Manager the new amount of memory available for the GOS.
- Hot-Add a new NIC, and configure it on the fly.
- Thin-Provision a new 10GB hard drive for a VM.
- Hot-Expand the new hard drive from 10GB to 20GB.
- Extend the volume live within the GOS using the "diskpart" tool.
I have the Storage vMotion & vCenter linked-mode recorded and will be publishing them this week. Stay tuned!



