postheadericon VMware vSphere on IBM BladeCenter H – (Part 1 of 2)

Important: In case you haven’t done that already, please take a moment to read the first post of this series.

Due to the insane number of expansion modules/options available in the IBM BladeCenter H, I had to split this post into two parts. In fact, I was initially planning to have around 12 different designs for vSphere on BladeCenter H (yes twelve) but I then I started to shrink and skip some designs to fit as many scenarios as possible in a reasonable two-part article. With that said, the following is by no mean a list of all the possible design scenarios you can achieve with this hardware platform. If you started the “mix and match” game, you may literally end-up with uncountable possibilities!

The Diagram

Here is some important notes before using the diagram:

  • You will see different configurations in this post and the relevant architecture of each configuration in the diagram. This is done through the PDF layers, which basically means than you should *not* activate more that more layer in the same time.
  • By default, “Configuration 1″ is the first active layer when you open the PDF file. You can show/hide the other layers by simply clicking on them. Again, you should only show one layer/configuration at a time.
  • You will always see two boxes on the right side of the diagram, the upper one will show you the current vSphere configuration, and the lower one will show you the relevant hardware configuration. You should typically start looking at those two boxes before scanning through the diagram to understand the “ingredients” of the design.
  • At the time of writing this post, you will see four configurations only in this diagram, however, when I publish the second part, there will be additional configurations that I will add to the existing ones. In other words, the diagram will be updated later on to have those additional configurations so keep that also in mind.

The common design and configurations

You will find in most of the configurations a common design, unless I explicitly state otherwise. I will list them here in details:

The Clusters:

You will see two type of clusters:

  • Management Cluster: it is typically a two node cluster running the management and infrastructure services. For example, if you want to virtualize the vCenter Server, the VM should be running on this cluster rather than the actual production clusters. Same thing holds true for other vCenter products like: AppSpeed, CapacityIQ, SRM and so forth. There are two reasons for doing that: the first, we don’t want to run into the problem where vCenter Server is not accessible (there are some examples published in the community but my favorites are Jason Boche’s Catch22s!). The second reason, we don’t want to either affect our workloads’ performance with our management virtual appliances or vice versa.
  • Production Clusters: You can see here two production clusters (Cluster A and Cluster B). The take away from that is the following:
    • You don’t have to stick with that number of hosts per cluster, it depends on what you want to achieve, and also on some configuration maximums that may or may not limit you.
    • The nodes have to be spanned across the two chassis as numbered and illustrated in (Config 1). There are two reasons for that: Firstly, you don’t want your whole cluster to fail in an unluckily event when a whole chassis fails. Secondly, you have to keep in mind that VMware HA selects the first 5 hosts in the cluster and promote them as a “Primary” nodes, if they fail, your HA cluster fails.

The Blades:

You will see two consistent blades throughout the first four configurations, the HS22 and the HS22V. Both blade servers share the same IO expansion capabilities, however, there are a some differences between them. For example, the HS22V has no hot swappable HDD but it is superior in the memory capacity (144GB compared to 96GB in the HS22). In part-two of this article, I’ll talk in details about the new HX5 and what it can bring to the table in terms of scalability.

The Expansion cards:

Every HSxx blade comes with two onboard 1Gbps Ethernet ports for basic networking. They will always show in vSphere as vmnic0, and vmnic1. These ports are in turn mapped to Bay 1 and Bay 2 in the chassis. Of course no one recommends implementing vSphere using 2 x 1GbE ports in an enterprise environment (although it will technically work), so we will use here what we call: expansion cards. There are two slots for expansion cards in any HS22/V blade, the first one is called CIOv (for vertical expansion modules) and CFFh (for the horizontal fast IO modules). The CIOv is usually used with the FC HBAs (although we will see later how we will utilize it for iSCSI connectivity), and they are mapped to Bay 3 and Bay 4 in the chassis. The CFFh on the other hand is mapped to four fast expansion modules (7, 8, 9 and 10). I say fast because this is the only card that can leverage the 10GbE connectivity (or Infinibad but it’s not relevant to our series). Depending on the configuration, you will see how we will use different cards to support our designs, however, the onboard 2 x 1GbE port will be always common, and always there.

Now that we’ve talked about the common stuff, let’s start talking about the unique configurations. Oh yes, we were just warming up!

CONFIGURATION (1):

We have in this configuration 6 x 1GbE pNICs per blade to support our MGMT, VMkernel and Virtual Machine networks. We teamed three pNICs here in a vNetwork Standard Switch (vSS) to serve the SC, vMotion and FT. The other three pNICs are teamed in a vNetwork Distributed Switch (vDS) to serve the VM networks. Let’s dig litter deeper on how this is done.

As mentioned earlier, we have three type of IO ports on the blades: the onboard ports, the CIOv, and the CFFh. In order to achieve the maximum availability, we teamed one onboard port with a couple of ports from the CFFh card. In this case, if we had a failure in any IO port (on board or expansion card) we will be able to tolerate that failure.

The second consideration here is to distribute the load and bandwidth for our networks. For example, the SC network will be active on vmnic0 and standby on vmnic1. The vMotion will be active on vmnic1 and standby on vmnic0 and so forth.

You may have noticed also that we grouped the SC + VMkernel network on a vSS, while we grouped the VM networks on a vDS. The reason behind that is to ensure that you would still be able to control your SC network even if your vCenter fails. For the VM networks, you would still leverage the great enhancements and features of the vDS. This is *not* a best practice from VMware, and as far as I know there is no documentation recommending that. It is up to you whether you would go with that setup or simply have everything on a single vDS.

CONFIGURATION (2):

This is nearly identical configuration except for the IP SAN. In Config1 we were running on a FibreChannel SAN, while in this configuration we have an iSCSI. The thing to note here is that you will need to install your Ethernet expansion modules in Bay 3 & 4. We will swap also the CIOv card from being a FC HBAs to a traditional 2 x 1GbE card. Of course you will use in this case the vSphere iSCSI initiator for doing your storage networking. This is fine in nearly most cases, except the one where you will actually need to boot your ESX server from SAN.

Please also note there that you can use NFS with the same layout. Your 2 x 1GbE blade ports + the 2 x expansion modules (bay 3 & 4) will all serve your NFS requirement in a high availability design.

CONFIGURATION (3):

What you will see in this configuration is something a bit different. We are using here a 2 x 10GbE ports through the CFFh expansion card to serve “all” our networks. This card is mapped to two 10GbE expansion modules sitting in Bay 7 and Bay 9.

The trick here is this: how can you have a proper network segmentation if you are using two pNICs only? The answer, of course, is VLANS. As you see in the diagram, we have two production networks and one lab network. All these networks are tagged with a VLAN ID to flow the traffic through the vmnics to pNICS all the way to your enterprise/core switches. The ports on your core switches need to be of course in trunk mode.

Now, the second question here would be this: how can you ensure that no network will saturate the whole link and affect the performance of the others. The solution for that is to use the vSphere traffic shipping. You can simply dedicate the bandwidth to each “port group” per your requirement. Example, for SC you normally don’t need more than 1Gbps. For vMotion and FT you would definitely require more bandwidth. To keep things simple, I illustrated in the diagram how the segmentation and bandwidth allocation can be distributed across the two links in an Active/Standby approach.

You will notice here also that we are utilizing the two on board Ethernet ports to have an additional iSCSI SAN (for the Lab environment for example) along with the FC SAN for your production workloads.

CONFIGURATION (4):

In the previous configuration we saw how we leveraged the VLANs to do our network segmentation and how that was quite easy and flexible. But what if the customer has a policy not to use VLANs to consolidate the networks (for a security reason as an example)? Easy, we would still be able to comply with that. Basically we will need to swap here the 2 x 10GbE CFFh card with a 4 x 10GbE card and of course add additional two 10GbE expansion modules to Bay 8 and Bay 10.

Now, what did we achieve by doing that? Two things:

1 – We are compliant with the customer requirement to have a physical segmentation between the Management/FT/vMotion networks and the production networks.
2 – We are using the vSS for our management network while leveraging the vDS for our Virtual Machine networks.

You have also here another two options that were not included in the diagram. You can make use of the two onboard ports to have an additional iSCSI SAN as we did in the previous configuration, or, you can use them as a standby ports for your Management/VM networks in case of a CFFh card failure. Do you see now what I meant above by the “mix and match game”?

Coming Soon – Part 2:

I’ll talk about the new HX5 and how you can have a lot more memory or extended IOs to support special workloads or strict design requirements. I will talk about FCoE and CNAs. I will also talk about the new & promising Virtual Fabric from IBM, and how you can basically slice your pNics into almost any protocol or speed you want.

Stay tuned!

Share
  • http://twitter.com/dpironet Didier Pironet

    Hi Hany,
    Depending of your customer's requirements all 4 configurations would be OK.
    Sure you can enhance each one, a bit here, a bit there. That mostly depends on how confident the designer is with different technologies, the customer's requirements, the infrastructure limitations, etc..

    Here are my thoughts:
    1- Do not bond (teaming) different NIC models/vendors like in conf#1 and #2.
    2- As long as you have maximum 5 HA primaries do not installed more than 4 blades per chassis within a cluster.
    3- I'm in favour of less cables, thus config#3 is my favourite in this case.
    4- Use VLANs as much as you can, unless it is mandatory to have traffics physically separated.
    5- Config#3 – I would have gone for 2*10GbE for iSCSI SAN.
    6- Config#3 – You shape traffic to 2Gb for Prod1. But what's the size of the pipe at the perimeter of your DC? This is many times overlooked, server clients are often remote and the WAN pipe is a T3 (48Mbps), sometimes an OC-3 (155Mbps). I would say shape Prod1 bandwidth according WAN pipe.
    7- I would have set active/active NICs in all 4 configs. That will give a bit of work for the network team :)

    My network design would be 2*10GbE active/active doing everything with VLAN tagging and traffic shaping.
    Too bad that the onboard NICs can't do 10GbE. We would have even less cables (and switches).

    Looks like you're heading VCDX, aren't you ;)

    Cheers,
    Dider

  • http://www.hypervizor.com Hany Michael

    Hi Dider,

    Many thanks for your detailed reply! I love it when my visitors do that ;)

    Although your reply is focused more on your specific preferences, my configurations are showing different set of options, and it all depends on customer requirement are decisions.

    Here is my detailed reply anyways:
    1 – Why don’t you team two different NICs? In Config 1 & Config2 you need that to survive a card failure (e.g. the LOM & the CFFh)
    2 – That’s what we have here. The nodes of the HA cluster are distributed and spanned across the two chassis (the numbering and order are illustrated in Config1)
    3 – Again, that’s your preference while some customers policies will strictly require physical segmentation.
    4 – Same as point 3.
    5 – It depends. In the Config3 we already have a FC SAN for production workload, while the iSCSI is a secondary SAN for supporting the Lab workloads.
    6 – Why are you assuming here that the end-users are remote? Why not local?
    7 – I don’t agree on that for certain networks like SC. I would always prefer to have the SC on Active/Standby in order to know exactly where it’s connecting in a case you are troubleshooting for example. Besides, just having an Active/Active links for everything can affect and saturate the BW for the networks. Like FT for example, it should be typically running without anything that can saturate its link, and/or increase the latency.

    Thanks again for the reply :)

  • Pingback: Top 5 Planet V12n blog posts week 13 | VMvisor

  • http://twitter.com/dpironet Didier Pironet

    Hi Hany,

    > you need that to survive a card failure (e.g. the LOM & the CFFh)
    Following that logic you would need two HBAs on two different PCI slots but you don't. Do you trust more Emulex/QLogic hardware vendors than NICs :)

    >The nodes of the HA cluster are distributed and spanned across the two chassis..
    Yes I noticed that but HA primaries are promoted and demoted dynamically, put a primary in MM or just reboot it, and another primary is promoted, same goes for the 'active' primary. At the end all your primaries could be located on one chassis!

    Point#3, 4, and 5, indeed it depends on customers requirements, policies, hardware restrictions, etc…

    >Why are you assuming here that the end-users are remote? Why not local?
    That wasn't specified in the diagram if I'm not wrong. I use to work for customers where datacenters were physically separated from end users. End users that were spanned World wide, thus WAN is an important aspect of a design in this kind of situation.

    Point#7, I don't follow you with the BW thing, if a gigabit card can handle the traffic in an A/P mode, surely it can survive in an A/A mode BW wise!?

    Regarding the SC, best in A/P mode, is it a VMware best practice thing?

    Thx again for your time on this I just love to share ideas, and discussing on VMware designs :)
    Didier

  • http://www.hypervizor.com Hany Michael

    Hi Didier,

    Do you know that you are simulating a VCDX defense panel for me now LOL :) – I’m the one who should be thanking you for your time and discussion, as I told you earlier: I love that on my blog ;)

    Here we go again :

    >Following that logic you would need two HBAs on two different PCI slots but you don't. Do you trust more Emulex/QLogic hardware vendors than NICs :)

    Actually you do need two HBA cards to ensure the maximum availability! This is common on any rack server (I hardly ever find anyone using one card with dual port). In our case, the blades, you may have noticed that we have one CIOv expansion card, which in turn mapped directly to Bay3 & Bay 4. Now, how would you have two cards in that case? The answer is to use Storage & IO Expansion Unit (SIO) which of course is specific here to IBM. In that case you will have two HBA cards. In the new HX5 (as you will see on my part-two) you also can have two HBA CIOv cards on dual unit blade. I personally do not trust any hardware vendor, and in fact I’ve seen a mission critical system in a large Telco goes down because of a “processor failure” not just an HBA card! It all boils down to what the customer is doing and what need to be achieved. I don’t see why a large bank or a telco would want to have this single point of failure in their system (whether it’s a NIC or HBA).

    >At the end all your primaries could be located on one chassis!

    You have a valid point here, and the solution for that is to have a 6 nodes per cluster. In that case you will always have three nodes on one chassis and two on the second. Check again what I’ve mentioned in my article regarding the number of hosts per cluster and how that can be determined by many factors.

    >Point#7, I don't follow you with the BW thing, if a gigabit card can handle the traffic in an A/P mode, surely it can survive in an A/A mode BW wise!?

    If you are having two A/A Nics for SC, FT, and vMotion, while your FT’ed VMs are generating lot’s of logging traffic, and in the same time you are doing lot’s vMotion across the wire, and copying files on your SC links, what would you expect in this situation? The FT network will be saturated or at least have an increased latency. This is why we have the 3 teamed NICs designed in A/P to serve the SC, vMotion and FT in such a way none of them would affect the other.

    As per the SC on A/P NIC, this is my preference as mentioned earlier (just like the vCenter VM on a separate cluster) I don’t think there is a written best practice from VMware on this.

    Best,
    Hany

  • http://www.heatshrinkwire.net Heat Shrink Wire

    Bravo, Bros! keep going like this, more good info again.

  • http://www.filecabinetkey.net/used-lateral-file-cabinets-los-angeles Used Lateral File Cabinets LA

    I'll back again for sure, thanks for great article :D

  • Juwelpatil

    Hi Hany,

    This is a legend! Thanks for sharing this. I am trying to build a similar setup, ESX4.1 with HS 22 baldes in 2 different chassis. There's one vDS switch for production traffic and I have used standard switches for SC and vMotion. The commuication from 1 chassis to another chassis is really slow. Internal to chassis is lighting fast although. Where I should be looking at ? Can I have some guidance here ?


My name is Hany Michael, Consulting Architect at VMware. I blog about various topics ranging from the core vSphere technologies all the way to the vCloud based products. (Read more)
Sponsors
Disclaimer
Any views or opinions expressed on this blog are strictly my own and not the opinions and views of my employer.