Tag Archives: VHD

VHD versus NTFS alignment

This topic is guaranteed to bore most people.  Or, maybe I am wrong.  Are you the kind of person that loves to defrag your disk?  Are you always looking for new ways of speeding up your machine?  Are knobs and buttons something you love to tweak?  How about that registry cleaner?

No, I am not saying those things are bad.  Some people have the patience for this while others do not.  Usually the people that fiddle with the settings are eventually rewarded.  At the very least they can proclaim that they understand what the settings are really doing.

Thanks to a reader (thanks John!) I was made aware of the offset problem within VHDs.  Well, let me rephrase that.  I knew about it but I did not know its official name or that people had solved it.

John pointed me to this blog post about the offset problem.  There are already tools out there from NetApp to solve the issue but basically you need to be a NetApp customer in order to get access (at least for the tools that actually fix your images).  In truth, the problem affects most virtualisation users.

So, this is the part where the problem is dissected and understood.  Virtual disks still follow rules left over from physical disks.  Specifically, they have a “geometry” setting that indicates things like sectors per track.  Here is an example from my tool named vhddump of one the test VHDs laying around.

Geometry             3F10BA9E
Cylinders            9EBA
Heads                10
Sectors/Track        3F

This information is a field inside the VHD header.  The geometry value splits into the next three values based on interpreting the bytes.  Since VHDs are big endian (big bytes first instead the typical little endian on Windows).  vhddump is reporting the field backwards since it had not been important to preserve the byte order.

Anyways, you can see that our virtual disk has 3F sectors per track.  This translates to 63 sectors in decimal.  The geometry reported (also known as CHS) affects the alignment of partitions.  The rule with older versions of Windows is that the boot partition starts after the first track.  This is the output for the same VHD with vhddump:

Index(0) 80 01 01 00 07 FE FF FF 3F 00 00 00 B5 98 70 02
Index(0) BootFlag(80) Type(07) SectorStart(0000003F) SectorLength(027098B5)
Index(0) CHSStart(010100) CHSEnd(FEFFFF)
Start CHS  Head(01) Sector(01) Cylinder(0000)
End   CHS  Head(FE) Sector(3F) Cylinder(03FF)

Volume Count         1
Volume Index         0
FirstVolumeSector    3F

The master boot record starts at the first sector of the virtual disk.  This dump is from a Windows XP image.  The first sector of the first volume/partition is at 0x3f.

Why is this a problem?

Well the most obvious problem is that the sector 0x3f does not align with VHD blocks or even NTFS clusters.  Since it is an ‘odd’ sector, it is guaranteed never to align with anything inside the VHD.

This problem was first seen when exploring the clusters inside the VHD.  Instead of being neatly aligned with the VHD blocks, it was possible to have a cluster that spanned two blocks.  Even the sector bitmap showing the written clusters did not align on byte boundaries for a cluster.  Not only did it make it harder to correspond information, it was more wasteful to do more work for something that should have been aligned in the first place.

This kind of offset problem would show as a performance problem over time.  It is always more efficient to have alignment.

If people really understood this problem, they would probably insist that the partitions were aligned.  A simple example is a cluster that spans two blocks.  Not only is it a read/write hit to access two blocks, but it also potentially wastes space with the second block if there is nothing else there.

If clusters are aligned with VHD blocks, it is much easier to correlate the file data.  It makes sense that the disk should be aligned not with pretend physical settings but rather the VHD format itself.  Even though it is counter-intuitive, it might make sense to have the first partition start at a 2MB boundary.  Some space would be wasted before and after a given partition but the partition would be guaranteed to be isolated from the MBR area and the other partitions.

John had asked for a tool to fix this.  Unfortunately I do not have time right now to solve it.  There are other areas which are currently more important.  However, it would be fun to write such a tool.

Fast Creation for Fixed Size VHD

Search, and you shall find.  One of the many problems of dealing with VHDs is that they can take ages to create.  More specificly, the fixed VHDs can be very slow.  This is due to clearing the entire VHD with zeroes.  Creating any file that is gigabytes long is bound to be painful.

The Virtual PC Guy (Ben Armstrong from Microsoft) has come up with a solution.  It entails not zeroing out the file and creating the VHD footer at the end.  Very fast and just what most people want.  The only concern is security related to the VHD claiming deleted data since it was not cleared.  For most people this would not be a major concern under certain conditions (like a new disk).  However, this sounds like more a file system problem.  When files are deleted, they should be cleared then.  There might even be a NTFS option to do this.  Let me know, please?

Virtual PC Guy has also been nice enough to provide the binary and source for his tool.  This is a very kind gesture and I say thanks.

VHD Snapshots Revealed

Microsoft produced a series of videos about Hyper-V last year from the Program Managers.  Based on recent investigations, I found a good explanation of how snapshotting works.  The VHD snapshotting video is a bit casual but captures the essence of the engineering design.  

The implementation does seem a bit rough in places compared to competing products.  Persistence will pay off.

My overall biggest concern is that the snapshotting mechanism should have built into the VHD spec.  Currently the implementation is expressed as code that manipulates VHDs for the purpose of snapshots.  The difference is subtle but enough to make this a Hyper-V only way of looking at things.  Unfortunately this will lead to other vendors to consider doing their own snapshotting technology around the weakness of the native VHD format.

VHD Difference Disk

What is the difference?  With VHDs, it is a classification of virtual drive.  The other two types have already been covered and now it is time to briefly cover what makes the difference VHD disk interesting.

Why does it exist?  Perhaps the most obvious answer is that it could save space.  The difference disk is actually linked to a parent disk.  The parent disk represents a read-only copy of a VM.  Technically it does not need to be a full copy but let’s assume that it is to make it easy.  Once the parent and child (difference disk) are bound, the parent VHD is no longer allowed to change.  The child VHD has pointers to the parent VHD by using various name markers (relative/absolute, UNICODE/UTF-8).  The link is not guaranteed and obviously it is possible for an admin/user to break the connection.  It would be an easy mistake to make.  The difference file could be moved to another system which has no access to the parent file.

What lives in the child disk?  Only the written changes are kept in the virtual disk.  Also, the changes are marked on a sector bitmap which shows which sectors are coming from the child and which ones are coming from the parent.  From the operating system point of view, this is transparent.  However, the VM player is responsible for splitting up the requests between the two virtual disks (parent and child).  This also means that the two disks need to be opened when the VM is running.

The sector bitmap is actually at the front of the blocks in the VHD.  In a dynamic disk, the sector bitmap shows which sectors have been written.  For a difference disk, it shows the ownership of the sectors between child and parent.  

I have been playing with difference disks over the last couple of weeks and now understand the nature of how this fits together.  One key point is that this is happening at the sector level which would make it very hard to figure out which files had changed.

An annoying aspect of the sector bitmap is that it does not align with clusters.  Because the first volume sector happens at 0x3F, the first eight sector cluster happens from 0x3F to 0x46.  So, cluster zero maps to bits in three different bytes of the sector bitmap.  Life would have been a bit easier if the clusters aligned with the sector bitmap bytes.  Nevermind, this is really only annoying for people trying to correspond volume clusters to low-level sectors.

It is worth noting that the difference VHD disk has no intelligence about what is being written.  In other words, it is highly likely that data written which happens to be the same as before will still trigger usage in the difference disk.  Also of interest is that all VHD disks have no sense of what has been freed.  This means that even if written data is freed by the file system, it will still be retained in the VHD.  And finally, all data is treated equally so this means that even if the data is not worth keeping (temporary content) the VHD will do its best to hold onto it blindly.  It appears that the pagefile fails into this category.

The greatest value of the difference disk would come from a template model.  An admin could create a dynamic VHD disk for the work environment and then use the difference disk to create user copies.  The benefit would be space savings and potentially faster transfer for remote use (assuming the template is already there).  The missing piece is being able to update the template and have it take affect on the user difference disks.  By the current definitions/standards, this will not work.  The simple reason why is that it would be nearly impossible to merge the two together based on blocks changing on both the child and the parent.  Since the VHD format has no knowledge of files and directories, it has no way of knowing what to merge.

The difference disk seems similar to linked clone technology.  However, linked clone uses versioning which allows for the parent to move forward.  Unfortunately, even linked clones have no knowledge of how to merge with an updated parent.

Dynamic VHD Walkthrough

The VHD format is becoming more popular based on common use by Microsoft.  It has been said that Windows 7 will have built in support for VHD and will even allow a VHD to be booted.  As has been said a few times, the VHD specification is public which means that essentially anyone is allowed to program to it.

The format is fairly easy to understand and the specification, though short, covers what needs to be said.

However, having read the specification, certain things seemed a bit unclear.  The only way to get full clarity was to experiment with a real VHD and match it to the spec.

The first concept is that each VHD has a header and a footer.  Both happen to be identical for the sake of redundancy.  Most likely the footer was defined first and was projected to the front as well.  This is good news for getting key information up front.

This post will focus on Dynamic VHD files.  There are two other types (fixed and differencing) but dynamic is perhaps the most common.  Fixed is fixed.  Once you allocate a size, you are stuck with it.  It takes all the space specified without necessarily using any of it.  It is good for guaranteeing the space will be there but a bad citizen for disk space usage on the host.  Differencing is more advanced and essentially is used for parent/child disk relationships to create what could be called a linked clone.  The idea is that the difference disk builds on its parent and does not require all the data the parent has.  Dynamic disks are disks that allocate space on the fly based on usage.  There are rules about how big it can get and how the blocks are allocated but it appears the same as a fixed disk to the guest.

Continue reading

Virtual Hard Disk Specification

The Virtual Hard Disk Image Format Specification (VHD Spec) has been available from Microsoft since October 2006.  You can bypass registration and download straight from here.

The document is only seventeen pages long but manages to capture how it is set up.

The specification was originally created by Connectix and was gained by Microsoft from the acquisition in 2003.  Since then VHD has become more and more successful.  It is used by all Microsoft virtualization products (VirtualPC, Virtual Server, Hyper-V) and is gaining support from Citrix products as well (PVS and XenServer).  VHD looks to be the rising star for Windows based virtual machines.

Strangely, I had the opportunity to meet the creator of the original VHD specification at a BriForum 2008 dinner.  Unfortunately I did not get his business card.  His career path took some strange turns based on working at Connectix, being acquired by Microsoft, working for Microsoft, leaving to work at Calista, and then being acquired yet again by Microsoft.  He was part of the Calista contingent at BriForum 2008 in Chicago along with Nelly Porter.

The VHD format has some tricks up its sleeve.  Two of these are related to disk types.  There are three basic disk types.  They are:

·         Fixed hard disk image

·         Dynamic hard disk image

·         Differencing hard disk image

Fixed disk is pre-allocated to the size specified at creation.  Dynamic disk allocates on the fly based on a certain chunk size (for example 2MB).  Differencing disk is where two or more images are combined to form one virtual disk image.  Differencing would allow for a primitive cloning to take place.  It is similar in concept to linked clones but without the protection of the base not changing.

Being a virtual disk, it has no concept of files.  File interpretation is based solely on the file system code in Windows (NTFS).  This is another way of saying it is a block-based model.

Given the momentum of VHD within Citrix and Microsoft, there is a good chance that VHD will be used for more tasks.  Microsoft has acquired Kidaro, for example, which would imply that VHD will get a major push for transporting VMs around the extended enterprise (read mobile and home workers).  Given the nature of the Kidaro TrimTransfer technology, it should be possible to make this as painless as possible.

There is a competing model at Microsoft under the banner of Windows Imaging Format.  Instead of focusing on disks, it focuses on files.  It uses Single Instance Store (SIS) and is being used for Vista deployment.  It is an excellent model for deploying large sets of files especially with small variations between copies (instances).

Both of these formats allow for mounting (VHDMount and ImageX) into Windows.  VHDMount comes from Virtual Server 2005.

The hope is that VHD will become the standard over time.  There is another format called Open Virtualization Format (OVF) that is set to work across all platforms in the future.  This new format is intended to supercede all existing formats and make it easier to transport workloads between vendors.