No clever title – ESXCLI

I have been missing in action for a few weeks. It is time to catch up for all the lost time. One topic I feel many people don’t know too much about is esxcli. I know how to do what I usually do with esxcli. There is a lot more there for us to explore.

First stop and take a look at the virtuallyGhetto article.

It can be run from the Service Console, the ESXi Tech Support Mode command line, or from the vMA. As William points out if you are running these command from the vMA you need to authenticate individually to each host. He goes on to list some articles that go over the most used case of esxcli, swiscsi.

A couple of quick examples I like to use:

esxcli nmp device setpolicy –device naa.6090a07800c2ea66b8c114050000c00d –psp VMW_PSP_RR

This command changes the policy for a storage device to another path selection policy. In this case it is Round Robin. This is great for when you are rebuilding ESX and the storage is already zoned. ESX will add the storage with the default PSP and changing a few dozen datastores on each host one at a time via the GUI can be VERY tedious.

Then how do I change the default PSP?

esxcli nmp satp setdefaultpsp –psp VMW_PSP_RR –satp VMW_SATP_DEFAULT_AA

This can be modified for different array types after the “—satp” tag or different path policies after the “–psp” tag.

For the VCAP-DCA4 exam I am studying for I wonder how much deeper than this they will go? I would feel most Data Center Administrators need to set up swisci settings and possibly change path policies. Anything I am missing? If you check out Duncan’s article here it will be great to know how to list what is available.

Equallogic, VAAI and the Fear of Queues

Previously I posted on how using bigger VMFS volumes helps Equallogic reduce their scalability issues when it comes to total iSCSI connections. There was a comment about does this mean we can have a new best practice for VMFS size. I quickly said, “Yeah, make em big or go home.” I didn’t really say that but something like it. Since the commenter responded with a long response from Equallogic saying VAAI only fixes SCSI locks all the other issues with bigger datastores still remain. ALL the other issues being “Queue Depth.”

Here is my order of potential IO problems on with VMware on Equallogic:

  1. Being spindle bound. You have an awesome virtualized array that will send IO to every disk in the pool or group. Unlike some others you can take advantage of a lot of spindles. Even then, depending on the types of disks some IO workloads are going to use up all your potential IO.
    Solution(s): More spindles is always a good solution if you have unlimited budget. Not always practical. Put some planning into your deployment. Don’t just buy 17TB of SATA. Get some faster disk and break your Group into pools and separate the workloads into something better suited to the IO needs.
  2. Connection Limits. The next problem you will run into if you are not having IO problems is the total iSCSI connections. In an attempt to get all of the IO you can from your array you have multiple vmk ports using MPIO. This multiplies the connections very quickly. When you reach the limit, connections drop and bad things happen.
    Solution: The new 5.02 firmware increases the total maximum connections. Additionally, bigger datastores means less connections. Do the math.
  3. Queue Depth. There are queues everywhere, the SAN ports have queues. Each LUN has a queue. The HBA has a queue. I would need to defer to a this article by Frank Denneman (a much smarter guy than myself.) That balanced storage design is best course of action.
    Solution(s): Refer to problem 1. Properly designed storage is going to give you the best solution for any potential (even though unlikely) queue problems. In your great storage design, make room for monitoring. Equallogic gives you SANHQ USE IT!!! See how your front end queues are doing on all your ports. Use ESXTOP or RESXTOP to see how the queues look on the ESX host. Most of us will find that queues are not a problem when problem one is properly taken care of. If you still have a queuing problem then go ahead and make a new datastore. I would also request Equallogic (and others) release a Path Selection Policy plugin that uses a Least Queue Depth algorithm (or something smarter). That would help a lot.

So I will repeat my earlier statement that VAAI allows you to make bigger datastores and house more VM’s per store. I will add a caveat, if you have a particular application that needs a high IO workload, give it a datastore.

Update Manager Problem after 4.1 Upgrade

A quick note to hopefully publicize a problem I had which I see is discussed in the VMware Community Forums already.

After building a new vCenter Server and Upgrading the vSphere 4.0 databases for vCenter and Update Manager. I noticed I could not scan hosts that were upgraded to 4.1. To be fair, by upgrading I mean rebuilt with a fresh install but with the exact same name and IP addresses. Seems that the process I took to upgrade has some kind of weird effect in the Update Manager Database. The scans fail almost immediately. I searched around the internet and found a couple of posts on the VMware Forums about the subject. One person was able to fix the problem by removing Update Manager and when reinstalling selecting the option to install a new database. I figured I didn’t have anything important in my UM database so I gave it a try and it worked like a champ.

Right now there is not any new patches for vSphere 4.1 but I have some Extension packages that need to be installed (Xsigo HCA Drivers). I wanted to note that I like the ability to upload extensions directly into Update Manager. This is a much cleaner process than loading the patches via the vMA for tracking and change control purposes.

Finding the Fusion OVFTool

The OVFtool is something I wished VMware Fusion had a while back and finally got a chance to use it the other day. I checked google and I found that it was located at:

/Library/Application Support/VMware Fusion/ovftool

As I looked for that path I was surprised it was not there. I upgraded from Fusion 2 to 3 to 3.1 and never recalled a chance or a place to add the OVFtool to my install. I could not find an independent download for the Mac OVFtool. I ended up re-installing the newest version of Fusion and I had to click “Advanced” during the install and turn on the OVFtool to install. Not sure if that is the best way, but that is how I got it to work. :)

Now that the path exists I was able to convert the OVF Appliance to be used on my Mac.

ovftool --help reveals a ton of options. To do a basic conversion though try this:


$mkdir /Users/username/Documents/Virtual\ Machines/ApplianceName
$/Library/Application Support/VMware Fusion/ovftool/ovftool ./Appliance.ovf /Users/username/Documents/Virtual\ Machines/ApplianceName

This will expand and convert the VM to be used with Fusion. Now just select open the VM in Fusion and play away.

Operational Readiness

One thing I am thinking about due to the VCDX application is operational readiness. What does it mean to pronounce this project or solution good-to-go? In my world it would be to test that each feature does exactly what it should be doing. Most commonly this will be failover testing, but could reach into any feature or be as big as DR plan that involves much more than the technical parts doing what they should. Some things I think need to be checked:

Resources

Are the CPU, Memory, Network and Storage doing what they should be? Some load generating programs like IOmeter can be fine to test network and storage performance. CPU busy programs can verify Resource Pools and DRS are behaving the way they should.

Failover

You have redundant links right? Start pulling cables. Do the links failover for Virtual Machines, Service Console, and iSCSI? How about the redundancy of the physical network, even more cable to pull! Also test that the storage controllers failover correctly. Also, I will make sure HA does what it is supposed to, instantly power off a host and make sure some test virtual machines start up somewhere else on the cluster.

Virtual Center Operations

Deploy new virtual machines, host and storage VMotion, deploy from a template, and clone a vm are all things we need to make sure are working. If this is a big enough deployment make sure the customer can use the deployment appliance if you are making use of one. Make sure the alarms send traps and emails too.

Storage Operations

Create new luns, test replication, test storage alarms and make sure the customer understands thin provisioning if it is in use. Make sure you are getting IO as designed from the Storage side. Making use of the SAN tools to be sure the storage is doing what it should.

Applications

You can verify that each application is working as intended within the virtual environment.

There must be something I am missing but the point is trying to test out everything so you can tell that this virtualization solution is ready to be used.

My Fun with the VMware Enterprise Administration and Design Exams

Sorry I have been missing for a few weeks. I know many were quite worried why I hadn’t blogged for a couple weeks (not really).

Back in February I sat for the Enterprise Administration Exam at PEX in Las Vegas. It was scheduled the day after the Super Bowl, what a bunch of distractions. Thankfully I passed and I want to give my experience so as to not violate any rules or anything I agreed to. This was a technical test. A lot of settings and configurations and information like that. Still multiple choice so at least you know the right answer is on the screen (hopefully, I did have one I thought none of these are right). The lab section was actually as fun as test taking could be. I wish there was more lab practical type things when it comes to these kinds of tests. Overall there is more intricate settings and config questions then you will find on the VCP exam.

At the end of April I took the Design Exam. This was a much different experience. I had a extremely hard time finding a study list of things that would help. Know the Exam Blueprint is all I would say. Also, this I think is where VMware can start finding out who does Architecture work and who may be an Administrator. I could say you could read every PDF on VMware.com and still not know how to pass this test unless you work with the solutions multiple times. The design drawing was a challenge, I wasted too much time reading the requirements document and ran of time, but I feel I was able to get a good portion of what I needed up on the page. Technically the interface was kind of quirky.

I felt both exams were challenging and but were fair to the Exam Blueprints. Nothing on there made me scream, “they didn’t say they would test on THAT!” The design exam needs some technical improvement (matching questions were buggy).

Now begins the harder and more involved process. The Design submission and hopefully an invitation to a defense.

VMware View – User Profile Options

All the technology and gadgets for managing desktops are worthless if your users complain about their experience with the desktop. Something I learned administering Citrix Presentation Server. Differing methods exist to keep the technical presentation of the desktop usable, for example the mouse being in sync and the right pixels show the right colors. What is also included in the user experience is a consistent environment where their personal data and settings are where they should be. Here are a few methods for managing those bits when using VMware View.

Mandatory Profiles
This profile is kept on the a central file share. The profile is copied to the machine on login, when the user logs out the changes are not kept. Great way to keep a consistent profile on kiosk type and data entry desktops. Where customization is not needed and most likely not wanted mandatory profiles are worth exploring. Main change is you set up the profile just like you want it then rename the NTUSER.dat to NTUSER.man. A lot exists on the internet about setting up man profiles.

Local Profiles
If you go through life never changing a thing in your Windows environment, you are using a Local Profile. Not to say you don’t change settings, save files or customize your background. You just have Windows running as the default. This is an option I will usually discourage because it is hard to backup data that is often kept in the local profile. VMware View will redirect user data to a User Data Disk (or whatever it is called today) on Persistent Desktop Pools. This is a good way to get the data on another VMDK. This introduces problems when looking at data recovery. There is solutions, but just something you will need to remember to look into.

Roaming Profiles
Roaming profiles is a great way to redirect current profiles to a central location. In theory this works great. In a View environment you can keep a local copy on a users desktop profile  and the changes are copied back and forth. I have often seen this work just great. Then from time to time, the profile will become corrupt, many times it does not unload correctly when users disconnect, or log out. Then you may have to pick through folders trying to find their “My Documents”. This is why I would suggest using this with Group Policy and Folder redirection which I will cover next.

Redirecting Folders
You may end up using a folder redirection group policy. This will move folders like the Desktop and My Documents for a user to a file server. This slims down the roaming profile as those locations are redirected to another location outside of the profile. This data is not copied from the machine to the server over and over. More information here.

Other Options
Immidio Flex Profiles
I really liked this option it was a way to combine mandatory profiles and a Roaming profile. This program would run some scripts on logon and log off to save files and settings. A really great paper on how to use it can be found here. Just like any great program that takes a new way to solve an annoying old problem, this is now not free.

RTO Virtual Profiles
I have never implemented this solution before. I have used it as part of a few training labs. I liked the feel. Now that VMware has purchased this software from RTO, the website redirects to a transition page. So I am looking for a way to test it in the lab, hoping the next set of bits of View includes RTO. Check this FAQ out for more information.

Maybe once it is built into View this will no longer be a serious issue. Profiles will be one of those things we tell stories to young padawan VM admins about, “We used to have to fight profiles, they were big and slow, and sometimes they would disappear!” Until that day…

VMware View and Xsigo

*Disclaimer – I work for a Xsigo and VMware partner.

I was in the VMware View Design and Best practices class a couple weeks ago. Much of the class is built on the VMware View Reference Architecture. The picture below is from that PDF.

It really struck me how many IO connections (Network or Storage) it would take to run this POD. Minimum (in my opinion) would be 6 cables per host with ten 8 host clusters that is 480 cables! Let’s say that 160 of those are 4 gb Fiberchannel and the other 320 are 1 gb ethernet. The is 640 gb for storage and 320 for network.

Xsigo currently uses 20 gb infiniband and best practice would be to use 2 cards per server. The same 80 servers in the above cluster would have 3200 gb of bandwidth available. Add in the flexibility and ease of management you get using virtual IO. The cost savings in the number director class fiber switches and datacenter switches you no longer need and the ROI I would think the pays for the Xsigo Directors. I don’t deal with pricing so this is pure contemplation. So I will stick with the technical benefits. Being in the datacenter I like any solution that makes provisioning servers easier, takes less cabling, and gives me unbelievable bandwidth.

So just in the way VMware changed the way we think about the datacenter. Virtual IO will once again change how we deal with our deployments.

Storage Design and VDI

Recently I have spent time re-thinking certain configuration scenarios and asking myself, “Why?” If there is something I do day to day during installs is this still true when it comes to vSphere? or will it still be true when it comes to future versions.
Lately I have questioned how I deploy LUNs/volumes/datastores. I usually deploy multiple moderate size datastores. In my opinion this was always the best way to fit in MOST situations. I also will create datastores based on need afterward. So will create some general use datastores then add a bigger or smaller store based on performance/storage needs. After all the research I have done and asking questions on twitter* I still think this is a good plan in most situations.
I went over a VMworld.com session TA3220 – VMware vStorage VMFS-3 Architectural Advances since ESX 3.0 and read this paper:
http://www.vmware.com/resources/techresources/1059
I also went over some blog posts at Yellow-Bricks.com and Virtualgeek.

An idea occurred to me when it comes to using extents in VMFS, SCSI Reservations/Locks, and VDI “Boot Storms”. First some things a picked up.
1. Extents are not “spill and fill” VMFS places VM files across all the LUNs. Not quite what I would call load balancing, since it does not take IO load into account when placing files. So in situations where all the VM’s have similar loads this won’t be a problem.
2. Only the first LUN in a VMFS span gets locked by “storage and VMFS Administrative tasks” (Scalable Storage Performance pg 9). Not sure if this implies all locks.

Booting 100′s of VM’s for VMware View will cause locking and even though vSphere is much better when it comes to how quickly this process takes. There is still an impact. So I am beginning to think of a disk layout to ease administration for VDI, and possibly lay the groundwork for improved performance. Here is my theory:

Create four LUNs with 200GB each. Use VMFS to extents to group them together. Resulting in an 800 GB datastore with 4 disk queues and only 1 LUN that locks during administrative tasks.

Give this datastore to VMware View and let it have at it. Since the IO load for each VM is mostly the same, and really at the highest during boot other tasks performed on the LUN after the initial boot storm will have even less impact. So we can let desktops get destroyed and rebuilt/cloned all day with only locking that first LUN. This part I still need to confirm in the LAB.

What I have seen in the lab is with same sized clones the data on disk was spread pretty evenly across the LUNs.

Any other ideas? Please leave a comment. Maybe I am way off base.

*(thanks to @lamw @jasonboche and @sakacc for discussing or answering my tweets)

ESX Commands – Summary

It took just about a year. Which shows I need more consistency with my blog (should have been about 1 month). I finally finished a brief explanation of each esxcfg command. My little self study for the VCDX, this is in no way exhaustive.

Make sure to check out other great resources out there:
Simon Long
Harley Stagner
Both good places to start.

Hopefully my VCDX compilation page can help.