Monthly Archives: December 2008

VHD Difference Disk

What is the difference?  With VHDs, it is a classification of virtual drive.  The other two types have already been covered and now it is time to briefly cover what makes the difference VHD disk interesting.

Why does it exist?  Perhaps the most obvious answer is that it could save space.  The difference disk is actually linked to a parent disk.  The parent disk represents a read-only copy of a VM.  Technically it does not need to be a full copy but let’s assume that it is to make it easy.  Once the parent and child (difference disk) are bound, the parent VHD is no longer allowed to change.  The child VHD has pointers to the parent VHD by using various name markers (relative/absolute, UNICODE/UTF-8).  The link is not guaranteed and obviously it is possible for an admin/user to break the connection.  It would be an easy mistake to make.  The difference file could be moved to another system which has no access to the parent file.

What lives in the child disk?  Only the written changes are kept in the virtual disk.  Also, the changes are marked on a sector bitmap which shows which sectors are coming from the child and which ones are coming from the parent.  From the operating system point of view, this is transparent.  However, the VM player is responsible for splitting up the requests between the two virtual disks (parent and child).  This also means that the two disks need to be opened when the VM is running.

The sector bitmap is actually at the front of the blocks in the VHD.  In a dynamic disk, the sector bitmap shows which sectors have been written.  For a difference disk, it shows the ownership of the sectors between child and parent.  

I have been playing with difference disks over the last couple of weeks and now understand the nature of how this fits together.  One key point is that this is happening at the sector level which would make it very hard to figure out which files had changed.

An annoying aspect of the sector bitmap is that it does not align with clusters.  Because the first volume sector happens at 0x3F, the first eight sector cluster happens from 0x3F to 0×46.  So, cluster zero maps to bits in three different bytes of the sector bitmap.  Life would have been a bit easier if the clusters aligned with the sector bitmap bytes.  Nevermind, this is really only annoying for people trying to correspond volume clusters to low-level sectors.

It is worth noting that the difference VHD disk has no intelligence about what is being written.  In other words, it is highly likely that data written which happens to be the same as before will still trigger usage in the difference disk.  Also of interest is that all VHD disks have no sense of what has been freed.  This means that even if written data is freed by the file system, it will still be retained in the VHD.  And finally, all data is treated equally so this means that even if the data is not worth keeping (temporary content) the VHD will do its best to hold onto it blindly.  It appears that the pagefile fails into this category.

The greatest value of the difference disk would come from a template model.  An admin could create a dynamic VHD disk for the work environment and then use the difference disk to create user copies.  The benefit would be space savings and potentially faster transfer for remote use (assuming the template is already there).  The missing piece is being able to update the template and have it take affect on the user difference disks.  By the current definitions/standards, this will not work.  The simple reason why is that it would be nearly impossible to merge the two together based on blocks changing on both the child and the parent.  Since the VHD format has no knowledge of files and directories, it has no way of knowing what to merge.

The difference disk seems similar to linked clone technology.  However, linked clone uses versioning which allows for the parent to move forward.  Unfortunately, even linked clones have no knowledge of how to merge with an updated parent.

Dynamic VHD Walkthrough

The VHD format is becoming more popular based on common use by Microsoft.  It has been said that Windows 7 will have built in support for VHD and will even allow a VHD to be booted.  As has been said a few times, the VHD specification is public which means that essentially anyone is allowed to program to it.

The format is fairly easy to understand and the specification, though short, covers what needs to be said.

However, having read the specification, certain things seemed a bit unclear.  The only way to get full clarity was to experiment with a real VHD and match it to the spec.

The first concept is that each VHD has a header and a footer.  Both happen to be identical for the sake of redundancy.  Most likely the footer was defined first and was projected to the front as well.  This is good news for getting key information up front.

This post will focus on Dynamic VHD files.  There are two other types (fixed and differencing) but dynamic is perhaps the most common.  Fixed is fixed.  Once you allocate a size, you are stuck with it.  It takes all the space specified without necessarily using any of it.  It is good for guaranteeing the space will be there but a bad citizen for disk space usage on the host.  Differencing is more advanced and essentially is used for parent/child disk relationships to create what could be called a linked clone.  The idea is that the difference disk builds on its parent and does not require all the data the parent has.  Dynamic disks are disks that allocate space on the fly based on usage.  There are rules about how big it can get and how the blocks are allocated but it appears the same as a fixed disk to the guest.

Continue reading