Reference Architecture: Building your Hybrid Cloud with vCAC, vCHS and NSX

Introduction:

In this blog post, I will try to provide a practical approach to building your first hybrid cloud using various VMware solutions and services. In a nutshell, we will use vSphere for the private/on-prem cloud while we will be leveraging the vCloud Hybrid Service (vCHS) for the public/off-prem cloud. We will then bridge both clouds with a VPN tunnel using NSX (or vCNS, whichever applicable to you). Last but definitely not least, we will stand a vCloud Automation Center (vCAC) as an abstracted layer above these two clouds for automation and policy-based provisioning of workloads in a true hybrid cloud model.

This is not going to be a detailed how-to guide, but rather an architecture discussion with some good practices that I have seen from my real-world experience in the field. The important thing to note also is that everything you will read in this post has been tested and proven to work.

The Architecture:


 

As you see in the architecture diagram above, we have four main areas to look at. Starting from the bottom-up, we have NSX for vSphere running a site-to-site VPN tunnel with an Edge Gateway on vCHS. This is to bridge the communication between the two clouds. I will explain why in just a bit. Next, we have on the left the traditional vSphere infrastructure where NSX is connected to. On the right side we have our Virtual Private Cloud (VPC) on vCHS. The latter could be also a Dedicate Cloud on vCHS depending on your business requirements. Lastly, and at the top of the architecture, you can find vCAC as a one single entry to this hybrid cloud with policy based configurations mapped to your business groups. Now let’s examine all that in details.

The Business Driver:

Before we take a technical deep dive into all these components, let us stop first for a moment to understand the business driver behind this architecture. It is important to note here that I did not base this solution on a fictitious scenario. This is a real-word requirement from one of my large customers in the financial sector. Their main challenge is that they cannot predict the timing and the size of new projects that require from IT to provide compute resources for. Another challenge is that these requirements are very dynamic in nature that their Operations team cannot handle using the traditional VM provisioning on vSphere. These projects run through three main phases: Test/Dev, UAT and Production. The second and third phases could be handled internally since the Ops team will have the time and exact requirements to plan for. It is the first phase that is unpredictable and require the greatest amount of agility to respond to. These Test/Dev VMs are not an isolated type of workloads that can be just spun up on any public cloud. The customer needs to have a communication back and forth to internal infrastructure services (like AD, DNS, DB, Antivirus ..etc) or even corp apps/databases that cannot be hosted off-prem. Lastly, The templates used to spin up these workloads in phase one must be based on internally maintained corp images not those that are available on the public cloud.

The Networking Infrastructure:

Now to the interesting part. We will start here with the underlying network infrastructure required for cross site communications. As I have mentioned in the previous section, the workloads that are provisioned on the cloud (be it internal or external) must be able to communicate to the on-prem infrastructure services. To achieve that, we are setting up a site-to-site VPN tunnel between two gateways. Those gateways, in our case here, are NSX 6.0 for vSphere sitting on-prem while on the other end we have an Edge Gateway running on vCHS. You can find many tutorials on the internet that describe this part in details. Please note that you can utilize vCNS (part of the vCloud Suite) or any hardware based VPN gateway instead of NSX. The choice is really yours. With NSX (or vCNS), you will need to have at least two interfaces, one internal to the corp network fabric and other other one external to the public Internet. The latter needs to be set with a public IP or NAT’ed in a DMZ provided that it can receive the VPN connection requests from the internet (in our case here, from the Edge Gateway in vCHS). I would’t personally hesitate to leverage NSX here due to the incredible flexibility that it has when it comes to the design and deployment. If you haven’t already done so, please take the time to review my blog posts on NSX as an SDN gateway on the internet. http://www.hypervizor.com/nsx/

On the counterpart, we will need to configure the VPN information on the vCHS Edge Gateway. At the time of this writing, these configuration parameters are not exposed on the vCHS UI itself so you will have to jump into your vCloud Director UI and manage the Edge Gateway from there. Nevertheless, it is quite straight forward and it is almost identical to what you configured on your NSX end.

The Compute Resources

Now that we have setup our network piece and our traffic is flowing back and forth between the two clouds, it is time to construct our compute resources. On the private cloud part, this would typically be a vSphere cluster which we will designate here as “Production” to run the workloads in phase three. This could also host the UAT workloads or we can simply allocate another cluster for that. The important part here is that the NSX Edge must have a network interface/route to the VM Networks on that cluster. This is showing on the architecture diagram as 192.168.110.0/24.

On the other side, and assuming that we have a VPC subscription on vCHS, we already get an Org-vDC with 5Ghz CPU (burstable to 10Ghz) and 20GB Memory to start with. That will be our compute resource on the public cloud side. This, of course, could be scaled up per demand. Like with our on-prem vSphere cluster, this Org-vDC must have a Routed Organization Network that is acting as the internal interface for the Edge Device. This is illustrated on the architecture diagram as 172.16.1.0/24.

Content Synchronization

Before we go to the provisioning part, let us stop for a minute to address the requirement of content synchronization across the clouds. As I have mentioned earlier in the business requirements section, we need to maintain the core templates on premise and have these templates synchronized (or pushed out) to the public cloud for consumption through the blueprints which we will talk about in just a bit. To do that, we just need to setup vCloud Connector (vCC) on our private cloud side only. vCC comes in two components – Server and Node. The vCC Server is a one time setup that will typically sit in your management cluster. This acts as a hub (if you will) to coordinate the workloads migrations and template sync across your private and public clouds. The Node part is where you need to deploy per each internal endpoint (e.g. vSphere) or public cloud (e.g. vCHS). In the case of the latter, it is already setup for you. You just need to change the port number from 443 to 8443 on your vCD Organization URL when connecting to it and you will be all set. Again, there are some resources out there on the Internet that explain these configurations in details.

Once you have your Server and Node vCC components setup on-prem, you need to “Subscribe” to the template location to sync that content out to your public cloud endpoint. It is as simple as this.

vCloud Automation Center

This is where everything comes together to form our first and true hybrid cloud. In vCAC, we first need to add two endpoints, the first being the local vCenter Server that is managing the Production Cluster. And the second being the vCloud Director of our VPC subscription on vCHS. I will not go through the exact configuration steps, there is already some excellent and detailed videos on YouTube by the VMware Technical Marketing. I will list below some guidelines specific to our architecture here:
– Blueprints: You will need to create two different Blueprints, one for the local/prod VMs and the other for the remote/dev VMs. Each one should be pointing to the relevant Reservation Policy that you set in the previous point.
– Templates: The Templates you set in the blueprints above will be selected respectively from vSphere and vCHS. Both will be identical since they are syned by vCloud Connector as exampled in an earlier section.
– Customization: On the private cloud, we can leverage any vCenter Customization Specification to customize your template while provisioning. On the vCHS side, you do not have to identify that as vCD takes care of this for you.
– Network Profiles: You will maintain both the external and internal IP addressing through the Network Profile configurations. You do not have to work around IP address conflicts even on your public cloud since both vCAC and vCD will keep track of the consumed IPs.
– Reservation Policies: You will have to create here two Reservation Policies, one for the local compute cluster running on vSphere, and the second for the remote compute resources running on vCHS.

Putting it all together

The last piece in this architecture is to create the entitlements for each Blueprint/Service and relevant approval processes. What is even more interesting is what you can do to delegate specific and precise capabilities to your internal users or even external contractors or consultants working on your projects. Imagine, as an example, that a new SharePoint consultant is on your site ready to start the three phases process of Test/Dev, UAT and Prod. You simple assign him an account on AD, provide him access to the vCAC portal, and give him entitlement to create VMs only on the public side of your hybrid cloud. This, of course, is still controlled with your approval process. Now that the consultant has his VM(s), he can connect to your on-prem infrastructure services per the security policies that you defined on NSX. During the consultant work, he wanted to create a snapshot before he upgrades a specific components on his software, no problem, he has that permission right form his vCAC portal – no need to bother your Ops team. But let’s even say that he messed up something so badly that he wants just to start fresh. Still no issue, he can “reprovision” his VM and have it prepared from scratch in a matter of minuets. In this case, he doesn’t really need another approval process (unless you define that) since he already got that the first time – imagine how much time you have saved him and your team in operations tasks like this. And have I mentioned that this very consultant can even access the console of his VM(s) on vCHS through the VMRC without even touching vCD or knowing anything about it?

We have just scratched the surface here. There are so much that you can do with such flexible architecture and agile infrastructure that is defined and driven per your very own business requirements.

Share Button