VCP5–In the Books

VCP345On the afternoon of the last day at VMware PEX I decided to take the VCP5 exam since I needed to pass it before 2/28 to avoid having to take the vSphere 5: What’s New course.

I admittedly did not spend a tremendous amount of time prepping for this one – as most of my focus has been around the VCAP4-DCA exam that I recently took. It was a bit of a last minute decision to sit for the exam – I had time to kill because most people left mid-afternoon on Thursday anyway to go home from the conference.

I passed the exam with plenty of room to spare, so I am now a certified VCP on Vi3, VS4, and VS5. The exam was pretty straightforward unlike prior VCP exams in my experience. A lot of the focus on the older VCP exams was configuration maximum questions. In this version of the exam I don’t think I had a single configuration maximum question. As long as you are a current VCP4, are familiar with the new features of vSphere 5 like auto deploy, storage DRS, etc., then this one should be a walk in the park for you.

All VMs In Inventory Display Alarms

I came across an issue this morning where all virtual machines in the server inventory were displaying alarm icons (image) next to them, however there were no alarms associated with those VMs in the alarms tab in vcenter. I checked the vCenter level, datacenter, cluster, host, etc., not a single alarm, yet EVERY VM in the inventory had the icon.

The fix was simple, just vMotion the VM to a different host in the cluster and that seemed to jar the icons back to normal. If I vMotion them back to the original host it was on the icon stays the normal happy green (image).

This is the first time I’ve ever seen this before… the environment is vCenter 5.0, build 455964, and ESXi 5.0 build 515841.

VCAP4-DCA Passed

vcap-dcaI’ve been a bit quiet in the blogging arena over the past little while, and that is because I have been focusing on preparing for the VCAP4-DCA exam that I took on January 23rd. Yesterday while at VMware PEX in Las Vegas I received my results and was notified that I passed the exam. I’m not going to try and reinvent the wheel here and explain how to pass, how to prepare, etc., because there are a LOT of folks out there that have paved the road for people like you and I, so this post is more of a THANK YOU to those that helped me in my journey. In no particular order…

Thank you @Trainsignal for your AWESOME training material. If you are serious about VMware training and serious about passing the VCAP-DCA exam, then head over to Trainsignal’s website and buy their VCAP training package. Trust me, it’s worth it. More specifically, thank you to David Davis, Jason Nash, Lane Leverett, Hal Rottenberg (instructors of the VCAP package), as well as to Eric Siebert, Rick Scherer, and Sean Clark (instructors of the pro series – either vol 1 or vol2). Believe me when I say this – I probably wouldn’t have passed without your help.

Huge thanks also to Sean Crookston, Damian Karlson, and Ed Grigson for their VCAP4-DCA study guides. These docs literally walk you through the entire exam blueprint… and if you know everything on the blueprint you know everything you need to pass the exam.

If I were to give one tip at all, it would be to make sure you have a vSphere 4.0 lab available if you are considering taking the VCAP4. It’s been quite some time now that I have been working with vSphere 4.1 and 5.0 since their releases, and the exam environment is ALL 4.0 based. My company allowed me to setup a 4.0 lab similar to the one in the exam environment (a 4.0 vCenter server, 4.0 ESX server, 4.0 ESXi server, and a 4.0 vMA appliance) to practice with. Practicing on this was big because some of the CLI commands have changed in the more recent releases.

Next up for me is upgrading my VCP5, and then probably the VCAP5-DCD beta. Then it’s time to start thinking about VCDX.

EMC VNX: Windows Hosts Display as “Unmanaged”

I was doing a migration to an EMC VNX5300 array this weekend and came across an issue I haven’t seen before. In this environment there are 2 physical Windows 2003 SP2 hosts. After installing the Unisphere Host Agent for Windows, zoning the FC HBAs to the array ports, and registering those FC connections in Unisphere, the host was displayed with a “U” icon next to it, indicating that it is unmanaged. You will also notice that information like the server info like the OS is not populated because Unisphere is not getting anything from the agent on the box.

imageimage

To fix this, go to the Windows host and navigate to C:\Program Files (x86)\EMC\HostAgent.  Create a text file called agentID.txt. The file should have the server FQDN on the first line and the IP address of the server on the second line. An example is below:

image

 

 

 

After saving the file, stop and start the “Navisphere Agent” service on the server and the “U” should go away and the host info should be populated. (note – ignore that it says “HAVT issues,” it says that because at the time of writing this host only has 1x HBA connected).

image

Finally, if for some reason that does not resolve the problem you may have to login to restart the SP management servers. To do that, go to https://SP/setup, login using Unisphere credentials, and select the option to restart the management server. Note it is safe to do this without disrupting the array, and you will need to do it for both SP-A and SP-B.

SNAGHTML6cd55a

iSCSI Issues with vSphere 5 GA Build (not just slow boot times)

It is well documented by now that there is an issue with ESXi 5.0 GA code that causes excessively long boot times when using hosts that are connected due to iSCSI storage arrays, documented in VMware KB 2007108.

What isn’t well documented is another issue with the ESXi 5.0 GA code and software iSCSI, which has much more serious symptoms than long boot times – like losing access to your storage with the right configuration.

If you are using ESXi 5.0 with software iSCSI that is connected to an array with iSCSI targets on separate network fabrics (specifically separate subnets), then you must use 2 different vSwitches for the vmkernel port groups to avoid losing access to the storage.

An example configuration with screenshots is below. It shows a single ESXi 5.0 GA host with 2 physical NICs dedicated to iSCSI storage traffic, vmnic3 and vmnic7. These NICs are connected to separate physical switches for redundancy and multipathing. Each array storage processor is also connected to each iSCSI switch, which creates a fully redundant fabric. vmnic3 and vmnic7 are each bound to their own dedicated vmkernel port group, iSCSI1 and iSCSI2. Both vmkernel port groups belong to the same vSwitch (vSwitch1), and we ensure that iSCSI1 is exclusively used by vmnic3 and iSCSI2 is exclusively used by vnmic7 by overriding the vSwitch failover settings on the properties of the vmkernel port group. Diagrams below:

imageimage

To the left is a quick visio I drew to help illustrate how the system is physically configured. Note the drawing only shows the relevant storage network infrastructure and iSCSI NICs, this host has other NICs that are used for other things. The snapshot above shows the vSwitch and vmkernel configuration.

Below are screenshots of the vmkernel port group properties which shows that we have pinned each port group to the correct corresponding physical NIC.

imageimage

Both vmkernel port groups are bound to the same iSCSI software initiator, in this case vmhba38, and all 4 array front end ports are configured as iSCSI targets.

imageimage

Typically this is a sound configuration, one that I have configured in many 4.x environments, but this was the first vSphere 5 environment. At first everything was fine, I was able to connect to 2 different volumes I had configured on the storage and formatted them VMFS, and started creating virtual machines.

After a few hours the host lost access to the storage. The only way I found to get it back was to either restart the host, or go into the vmkernel port group properties and remove/reset the “override switch failover order” settings.

The only place on the internet that I found anything that comes close to mentioning this is on the vSphere blog, here, which mentions issues with using different subnets for iSCSI but does not specifically say there could be disconnect issues after startup.

image

As it turns out there is an issue with the GA build (build 469512) of ESXi 5 when using this configuration that causes the disconnect (confirmed by VMware support).

The FIX is to apply the same patch that fixes the long iSCSI bootup times, express patch 1, found here. The build # for the express patch is built 515841.

The workaround (if you have not deployed the patch) is to break the vmkernel port groups out into 2 separate vSwitches.

EMC VNXe 3100: iSCSI Issues with VNXe OE MR1 SP3

A few weeks ago I upgraded a brand new VNXe 3100 array to at what was at the time the latest available version of the VNXe operating environment, MR1 SP3 (2.1.3). The update was successful so I put the array back in the box and shipped it out to the customer. A few days later I was configuring a similar VNXe 3100 for a different customer, and when logging in to the EMC support site I noticed that the latest available download was MR1 SP2, SP3 was gone.

So I pinged my buddy and VNXe guru Matt Brender (@mjbrender) on twitter asking him why SP3 was not available. In typical fashion he was on it right away and immediately pointed me to a communities post captured below, but promised to find more info.

image

He pinged me back later and told me that the issues were isolated specifically with iSCSI, and that if I were having issues to open a SR with EMC and reference emc283659.

 

Fast forward to yesterday, when I arrived onsite to rack the unit with SP3 and get started with the configuration. I created a 500GB iSCSI volume and published it to a vSphere 5 host, formatted it VMFS, and started creating VMs – all good.

As I was installing Windows 2008 R2 on what would become a vCenter VM, I got the following error:

image

This was odd because there are no other hosts currently in the environment, there is only this one single vSphere 5 host connected to the volume. I had no choice but to click OK here and the VM powered off, as expected when ESXi loses a lock on a vmdk.

Further investigation revealed that the vmkernel.log file was littered with all kinds of iSCSI command failures, aborts, retries, etc. Not good.

image

I went ahead and contacted EMC support as Matt suggested and they took some log files from the array to confirm the issue I was experiencing was what Matt referenced. Indeed it was. In addition they were able to give me some more specifics about the issue, which is found below:

“The VNXe will disconnect the iSCSI TCP connection any time more than 4 MB of data is queued or outstanding with the client on a given TCP connection. This can happen on a regular basis if the client is issuing a very large amount of I/O simultaneously, if the client is somehow slow in acknowledging receipt of data, or if the network is delaying packet delivery (network latency) or dropping packets or both. When this happens, the iSCSI host must stop and go through the process of re-logging in to the VNXe, and retransmitting I/O, which can have a large impact on performance.” – EMC Support

After confirming the issue EMC support assisted in installing the related hotfix, hotfix AR458173, and restarted the storage processors.

Since the restart I have not seen any errors in the vmkernel logs, so it appears the hotfix resolved the problem.

« Previous Entries