Category Archives: VHD

VHD tools: vhdcompact and regedit

It is terrible to be unsure of what to blog about.  The weight of choice tends to create a vacuum.  This is most obvious when months pass by before anything new is written.  There are certainly things that have happened in that time that are worth writing about.  However, it is sometimes difficult to pick the right one.

A topic dear to my heart this last year is changing VHDs while offline.  Indirectly I have blogged about VHDs and what is inside them.  What has not been discussed is what is really going on with work.

Recently things have changed a bit and the release of the code is becoming much more real.  It is time to spill the beans.

There are two specific challenges I was given for XenClient.  The first was the ability to compact the VHDs as much as possible.  The second was to be able to get/set registry settings.  Both tasks sound fairly simple but turned out to be very hard due to constraints.  First, the tools must run in Linux.  Second, these tools must be able to run with minimum requirements.

The compaction problem was solved quite some time ago.  The core issue is that VHDs have no knowledge of the file system.  Therefore, the VHD cannot safely delete any blocks that have been written to.  In order to delete a block, it requires knowledge of the NTFS usage which, until recently, did not exist.  The tool I wrote, called vhdcompact, corresponds knowledge of VHD and NTFS to determine which blocks can be safely removed.

Without too much effort, the tool can cut out upwards of 30% of a VHD.  Given that VHDs can grow into the tens of gigabytes, this is a large amount of storage.  The real interesting bit is that the disk loses no information since the blocks are not currently being used by anyone anyways.  This only applies to dynamic and difference disks.

The registry editing can only happen once the tools understand NTFS.  Thankfully LIBNTFS handled this in Linux.  Unfortunately no one, that I know of, has created a registry editor for VHDs.  So, again luckily, the registry format has been documented by sources outside Microsoft.  I created code to support registry editing without any Windows code.  It has taken quite some time to get it to the state it is today but it is now possible to make changes in the registry from Linux.

I can hear a few of you say “Why would you do this?”.

The answer is that we need to make some minor adjustments to the VHD before it boots.  It is too late after the VHD has booted.  It is also important for us to be able to save/restore sections of the registry from the hypervisor code.  Without Windows being there, we have to depend solely on ourselves.

It does feel like I am letting the cat out of the bag but the truth is that this stuff is probably old news to most.

One of the side effects of a project like this is that it is now possible to correspond blocks to files.  This means that given any sector in the VHD, we can determine which file owns it.  This, historically, was a difficult task.  Now, it seems like a minor detail in a much larger problem space.

In theory it would be possible to re-parent a difference disk with this kind of information.  From a IT point of view, this enables the admin to update the base with the user being able to keep their changes, even it includes application installations.  At one point this seemed to be a very important part of the project.  Now it seems like a very futuristic and potentially improbable dream.  The potential for conflict is always there and it is probably the methods of handling the conflicts that are more important than the actual re-base.

I am working towards making the tools available on the Internet.  There is some hope that CDN can host items.  I would like to see Citrix Labs have tools listed so that the VHD tools could be posted there.

There are many more tools that have been written along the way.  Some of the more useful ones are strictly diagnostic.  For example, it is possible to dump the entire file directory tree from a VHD.

The point I want to clarify before going is that the tools are completely self enclosed.  In other words, vhdcompact does not use any NTFS mount code to determine the VHD content.  It directly reads the VHD metadata and determines where the NTFS information is.  It essentially knows everything about the VHD and only uses the operating system to open/reads/writes/close on the VHD file itself.

As a side benefit of how the code was created, it also supports doing the same things in Windows.  This was largely due to me being more comfortable building and debugging with Visual Studio.  The code is in C++ and is very generalized so it works with either platform.  The abstraction is good enough that the code will run exactly the same way for the same VHDs.

It has been a focus for me for more than a year now.  I am happy with things are turning out and soon the tools will find their way into the real world.  I just decided that it is worth writing a few more posts about what is going on with them in the next few weeks.

VHD versus NTFS alignment

This topic is guaranteed to bore most people.  Or, maybe I am wrong.  Are you the kind of person that loves to defrag your disk?  Are you always looking for new ways of speeding up your machine?  Are knobs and buttons something you love to tweak?  How about that registry cleaner?

No, I am not saying those things are bad.  Some people have the patience for this while others do not.  Usually the people that fiddle with the settings are eventually rewarded.  At the very least they can proclaim that they understand what the settings are really doing.

Thanks to a reader (thanks John!) I was made aware of the offset problem within VHDs.  Well, let me rephrase that.  I knew about it but I did not know its official name or that people had solved it.

John pointed me to this blog post about the offset problem.  There are already tools out there from NetApp to solve the issue but basically you need to be a NetApp customer in order to get access (at least for the tools that actually fix your images).  In truth, the problem affects most virtualisation users.

So, this is the part where the problem is dissected and understood.  Virtual disks still follow rules left over from physical disks.  Specifically, they have a “geometry” setting that indicates things like sectors per track.  Here is an example from my tool named vhddump of one the test VHDs laying around.

Geometry             3F10BA9E
Cylinders            9EBA
Heads                10
Sectors/Track        3F

This information is a field inside the VHD header.  The geometry value splits into the next three values based on interpreting the bytes.  Since VHDs are big endian (big bytes first instead the typical little endian on Windows).  vhddump is reporting the field backwards since it had not been important to preserve the byte order.

Anyways, you can see that our virtual disk has 3F sectors per track.  This translates to 63 sectors in decimal.  The geometry reported (also known as CHS) affects the alignment of partitions.  The rule with older versions of Windows is that the boot partition starts after the first track.  This is the output for the same VHD with vhddump:

Index(0) 80 01 01 00 07 FE FF FF 3F 00 00 00 B5 98 70 02
Index(0) BootFlag(80) Type(07) SectorStart(0000003F) SectorLength(027098B5)
Index(0) CHSStart(010100) CHSEnd(FEFFFF)
Start CHS  Head(01) Sector(01) Cylinder(0000)
End   CHS  Head(FE) Sector(3F) Cylinder(03FF)

Volume Count         1
Volume Index         0
FirstVolumeSector    3F

The master boot record starts at the first sector of the virtual disk.  This dump is from a Windows XP image.  The first sector of the first volume/partition is at 0x3f.

Why is this a problem?

Well the most obvious problem is that the sector 0x3f does not align with VHD blocks or even NTFS clusters.  Since it is an 'odd' sector, it is guaranteed never to align with anything inside the VHD.

This problem was first seen when exploring the clusters inside the VHD.  Instead of being neatly aligned with the VHD blocks, it was possible to have a cluster that spanned two blocks.  Even the sector bitmap showing the written clusters did not align on byte boundaries for a cluster.  Not only did it make it harder to correspond information, it was more wasteful to do more work for something that should have been aligned in the first place.

This kind of offset problem would show as a performance problem over time.  It is always more efficient to have alignment.

If people really understood this problem, they would probably insist that the partitions were aligned.  A simple example is a cluster that spans two blocks.  Not only is it a read/write hit to access two blocks, but it also potentially wastes space with the second block if there is nothing else there.

If clusters are aligned with VHD blocks, it is much easier to correlate the file data.  It makes sense that the disk should be aligned not with pretend physical settings but rather the VHD format itself.  Even though it is counter-intuitive, it might make sense to have the first partition start at a 2MB boundary.  Some space would be wasted before and after a given partition but the partition would be guaranteed to be isolated from the MBR area and the other partitions.

John had asked for a tool to fix this.  Unfortunately I do not have time right now to solve it.  There are other areas which are currently more important.  However, it would be fun to write such a tool.

Fast Creation for Fixed Size VHD

Search, and you shall find.  One of the many problems of dealing with VHDs is that they can take ages to create.  More specificly, the fixed VHDs can be very slow.  This is due to clearing the entire VHD with zeroes.  Creating any file that is gigabytes long is bound to be painful.

The Virtual PC Guy (Ben Armstrong from Microsoft) has come up with a solution.  It entails not zeroing out the file and creating the VHD footer at the end.  Very fast and just what most people want.  The only concern is security related to the VHD claiming deleted data since it was not cleared.  For most people this would not be a major concern under certain conditions (like a new disk).  However, this sounds like more a file system problem.  When files are deleted, they should be cleared then.  There might even be a NTFS option to do this.  Let me know, please?

Virtual PC Guy has also been nice enough to provide the binary and source for his tool.  This is a very kind gesture and I say thanks.

VHD Snapshots Revealed

Microsoft produced a series of videos about Hyper-V last year from the Program Managers.  Based on recent investigations, I found a good explanation of how snapshotting works.  The VHD snapshotting video is a bit casual but captures the essence of the engineering design.  

The implementation does seem a bit rough in places compared to competing products.  Persistence will pay off.

My overall biggest concern is that the snapshotting mechanism should have built into the VHD spec.  Currently the implementation is expressed as code that manipulates VHDs for the purpose of snapshots.  The difference is subtle but enough to make this a Hyper-V only way of looking at things.  Unfortunately this will lead to other vendors to consider doing their own snapshotting technology around the weakness of the native VHD format.

VHD Documentation for Windows 7

If you look hard enough on the web, you are bound to find something good. It might not be what you started with but the distraction is worth it.  This time it is the official “unofficial” documentation from Microsoft for Windows 7 for VHD support.  This is key information about how to program to the VHD subsystem which is built into Windows 7.  

There is a great summary about Win7 VHD and here is the diagram from that section:

Don’t miss the actual API reference.  Key in mind that Microsoft is allowed to change the API before shipping Windows 7 but after that it is set in stone.  They have tried hard to match it up with the existing Windows API model so it should be very comfortable to most of you.

There are not that many API overall but some of them sound very promising.  For example, there are APIs related to creating new VHDs, compacting existing ones, and merging child VHDs with parents.  It also has the ability to grow VHDs apparently.

All this was brought to my attention by LeeL at work by referring me to an article about using the new VHD API with Windows 7.  This article brings it all together and shows examples of how to use the new API.  Things are moving along quickly with Windows 7 and the VHD support.  It’s good to see this much focus being put on VHDs for the sake of better management and flexibility.  The easier it is to manipulate, the easier it is to deploy and use.

VHD FAQ

Microsoft has published a new FAQ on VHDs. There is some surprising information in here and it is very recent.  There are guidelines for using VHDs with Windows 7.

It was unexpected advice to use Fixed VHDs.  The reasons sound good but at the same time very wasteful.  It might be more from the angle of supporting a local non-virtualized Windows.  Dynamic VHDs would make more sense in a data centre where the power and disk space is highly managed.

I searched for subjects related to pagefiles and found this gem.  The interesting news with mounted/booted VHD drives in Windows 7 is:

 

Are paging files supported in VHDs, and doesn’t that affect the performance of systems using native VHD boot?

Windows does not support locating paging files on virtual disks of attached VHDs. This would include pagefile.sys, hiberfile and crashdump files. Native VHD boot performance would not meet our system responsiveness goals if the paging file were located inside the VHD. If Windows 7 starts using native VHD boot, the operating system locates space on the host volume outside the VHD file for a paging file. The paging file can be approximately 2-4GB or more in size, depending on how much physical RAM memory is configured on the system. Plan the host volume free disk space to support the VHD file and paging file required for native VHD boot. If the host volume for the VHD does not have enough free space for a paging file, Windows attempts to locate the paging file on another volume with sufficient free space available. Note that when Windows is running in a Hyper-V virtual machine, a paging file is created inside the VHD because the virtual disk is used as a normal system volume.
 

VHDs seem to be gaining traction having been used extensively by Microsoft and also documented.

VHD Difference Disk

What is the difference?  With VHDs, it is a classification of virtual drive.  The other two types have already been covered and now it is time to briefly cover what makes the difference VHD disk interesting.

Why does it exist?  Perhaps the most obvious answer is that it could save space.  The difference disk is actually linked to a parent disk.  The parent disk represents a read-only copy of a VM.  Technically it does not need to be a full copy but let’s assume that it is to make it easy.  Once the parent and child (difference disk) are bound, the parent VHD is no longer allowed to change.  The child VHD has pointers to the parent VHD by using various name markers (relative/absolute, UNICODE/UTF-8).  The link is not guaranteed and obviously it is possible for an admin/user to break the connection.  It would be an easy mistake to make.  The difference file could be moved to another system which has no access to the parent file.

What lives in the child disk?  Only the written changes are kept in the virtual disk.  Also, the changes are marked on a sector bitmap which shows which sectors are coming from the child and which ones are coming from the parent.  From the operating system point of view, this is transparent.  However, the VM player is responsible for splitting up the requests between the two virtual disks (parent and child).  This also means that the two disks need to be opened when the VM is running.

The sector bitmap is actually at the front of the blocks in the VHD.  In a dynamic disk, the sector bitmap shows which sectors have been written.  For a difference disk, it shows the ownership of the sectors between child and parent.  

I have been playing with difference disks over the last couple of weeks and now understand the nature of how this fits together.  One key point is that this is happening at the sector level which would make it very hard to figure out which files had changed.

An annoying aspect of the sector bitmap is that it does not align with clusters.  Because the first volume sector happens at 0x3F, the first eight sector cluster happens from 0x3F to 0×46.  So, cluster zero maps to bits in three different bytes of the sector bitmap.  Life would have been a bit easier if the clusters aligned with the sector bitmap bytes.  Nevermind, this is really only annoying for people trying to correspond volume clusters to low-level sectors.

It is worth noting that the difference VHD disk has no intelligence about what is being written.  In other words, it is highly likely that data written which happens to be the same as before will still trigger usage in the difference disk.  Also of interest is that all VHD disks have no sense of what has been freed.  This means that even if written data is freed by the file system, it will still be retained in the VHD.  And finally, all data is treated equally so this means that even if the data is not worth keeping (temporary content) the VHD will do its best to hold onto it blindly.  It appears that the pagefile fails into this category.

The greatest value of the difference disk would come from a template model.  An admin could create a dynamic VHD disk for the work environment and then use the difference disk to create user copies.  The benefit would be space savings and potentially faster transfer for remote use (assuming the template is already there).  The missing piece is being able to update the template and have it take affect on the user difference disks.  By the current definitions/standards, this will not work.  The simple reason why is that it would be nearly impossible to merge the two together based on blocks changing on both the child and the parent.  Since the VHD format has no knowledge of files and directories, it has no way of knowing what to merge.

The difference disk seems similar to linked clone technology.  However, linked clone uses versioning which allows for the parent to move forward.  Unfortunately, even linked clones have no knowledge of how to merge with an updated parent.

Dynamic VHD Walkthrough

The VHD format is becoming more popular based on common use by Microsoft.  It has been said that Windows 7 will have built in support for VHD and will even allow a VHD to be booted.  As has been said a few times, the VHD specification is public which means that essentially anyone is allowed to program to it.

The format is fairly easy to understand and the specification, though short, covers what needs to be said.

However, having read the specification, certain things seemed a bit unclear.  The only way to get full clarity was to experiment with a real VHD and match it to the spec.

The first concept is that each VHD has a header and a footer.  Both happen to be identical for the sake of redundancy.  Most likely the footer was defined first and was projected to the front as well.  This is good news for getting key information up front.

This post will focus on Dynamic VHD files.  There are two other types (fixed and differencing) but dynamic is perhaps the most common.  Fixed is fixed.  Once you allocate a size, you are stuck with it.  It takes all the space specified without necessarily using any of it.  It is good for guaranteeing the space will be there but a bad citizen for disk space usage on the host.  Differencing is more advanced and essentially is used for parent/child disk relationships to create what could be called a linked clone.  The idea is that the difference disk builds on its parent and does not require all the data the parent has.  Dynamic disks are disks that allocate space on the fly based on usage.  There are rules about how big it can get and how the blocks are allocated but it appears the same as a fixed disk to the guest.

Continue reading

Cluster Map

Defragmentation on Windows XP

 

One aspect of volume management is knowing which clusters are free and which ones are used.  This is typically something managed solely by the operating system but it is sometimes possible to get a glimpse of how things align.  Microsoft published a few interfaces a few years ago that were once considered undocumented.  The set of API targets being able to defrag a disk.  The cluster map is gathered using FSCTL_GET_VOLUME_BITMAP.  A cluster is the most basic unit of the file system.  It is defined by what is specified in the boot sector of the volume.  Windows apparently always uses a sector size of 512 bytes with the option of different cluster sizes (multiples of the sectors).  The two fields in the boot sector are “sectors per cluster” and “sector size”.  The boot sector has this information at offset 0x0B for “sector size” (WORD) and offset 0xD for “sectors per cluster” (BYTE).

The cluster size typically corresponds to the size of the disk.  The larger the disk, the larger the cluster size.  My main 250GB drive has a cluster size of 4K.  Originally the drives were small enough to have the sector size and cluster size match (512 bytes).

Back to FSCTL_GET_VOLUME_BITMAP.  When the information is successfully returned from the IOCTL, it reveals the cluster pattern for the volume.  The structure returned is VOLUME_BITMAP_BUFFER which is effectively a bitmap of used/free clusters.  Each byte in this “Buffer” corresponds 8 clusters.  The lowest bit represents the first cluster of that byte.  Just today I figured that if you had 64 bytes of bitmap data, it would correspond to 2MB of data with 4K clusters.  

The actual output of the bytes shows an idea of where the used and free space is concentrated.  As expected, most of the early parts of the disk are used while the last parts are usually free.  There is also hints of fragmentation since there is gaps between sections of data which probably used to be files.

It is actually possible to gather free/used cluster counts from the bitmap by throwing the data through a counter that changes the byte patterns to actual count pairs.  I wrote a program that scanned the whole bitmap using each nibble to match against pre-programmed arrays.  So, put in 0xF and get back 4 used 0 free.  Put in 0×6 and get 2 used and 2 free.  You get the idea.  Originally I had thought of doing it against the byte but was not looking forward to entering the 256 combinations.

I keep on thinking of defrag programs from the past (like Norton) that show the cluster map (from a high level view) and moving files around.  Now it seems fairly simplistic given the amount of clusters involved.  It also seems a bit risky given the temporary nature of the free/used bitmap.

The point there is that the amount of free/used clusters is always changing based on system activity.  A snapshot using the IOCTL is just a picture in time and does not guarantee that things are still the same.  Even Microsoft recommend to assume that you might not get the free clusters you want for a defrag operation so you better be prepared to try again.

The actual information lives inside NTFS in a metadata file called $Bitmap.  It is MFT record number 6 (reserved and for all time the same).  $Bitmap cannot be directly read from any Windows program since it is only intended for the file system.  Obviously Microsoft does not want anyone to change this file.  It would play major havoc on Windows most likely.  

The cluster map in $Bitmap is in theory the basis of what is returned from the IOCTL.  However, based on not being able to do both at exactly the same instant means that they could vary.  The exception to this would be if you could freeze Windows somehow.

Speaking of freezing Windows, the only way to do this successfully is to access the information when nothing is changing.  The easiest way is to access the volume when it is not booted from.  As long as no running program is changing the non-boot drive, it should be possible to get an accurate snapshot that will stay good over time.

Coming from a VHD angle, you could mount the VHD and then use the IOCTL.  Or, you could spend a lot of time understanding the NTFS format along with the VHD format to go get the $Bitmap file yourself.  Difficult, but entirely possible.

Having come to the end of this post, it seems that this topic might be a bit tangential to what most of you might be interested in.  Let’s assume that it is really meant for the tinkerers out there that like to know where the disk space is really being used.  Please expect a few more words about this area in the coming weeks.

Blocks Versus Files

This topic presents an interesting problem.  A disk is made up of sectors which are arranged as clusters by the file system.  Both NTFS and FAT use a cluster model to clump together sectors into bigger chunks.  The cluster model has been around since the original DOS and still runs strong today.  The boot sector of the volume contains how many sectors of a certain size belong to one cluster.  On my Vista system the clusters are 4K (8 sectors of 512 bytes each).  This can vary for USB Flash Drives and smaller hard drives.  My flash drive reports a cluster size of 32K (64 sectors/cluster).  All of this is fine but then the question becomes why should I care?

The answer becomes more relevant when virtualization comes into the picture.  For a VM, the disk is virtual and is actually a file within another file system (most of the time).  Microsoft and Citrix use the VHD format for the VM files.  The VHD specification is public knowledge since Microsoft has documented as of a couple of years ago.  Given that there is a VHD file, everything needed by the operating system is there.  However, it becomes very difficult to manage this information from the outside.  Yes, there are ways to mount VHD drives within a native operating system, but this process is not necessarily easy to automate.  Well, at least not for everyone.

Then a new factor enters the equation.  Since the outside tools cannot see inside the VHD to understand what Windows is actually using, it becomes very difficult to do any kind of analysis or consolidation.  Microsoft does have a solution for compressing a VHD with Virtual PC 2007.  Unfortunately, there are many steps and it involves executing code both inside and outside the VM.  Wouldn’t it be nice if this could be managed completely from the outside?  Wouldn’t it be nice if every cluster (block) was paired with a file?

This sounds difficult and overall the problem is very tough.  The benefits however would be huge.  Basically any file operation performed on the inside could potentially be performed on the outside.  This would include things like defragmentation and shrinking the VHD to get rid of the blank chunks.  It could also include peering into the VHD to see what is there and even the hope of doing updates.

Other possible ventures would include merging virtual disks and even creating virtual disks out of multiple virtual disks.  It is possible to focus on the files instead of the blocks, it would much more possible to have base and delta disks which would both be allowed to change but yet form a cohesive volume to the user.  It is good to dream.

The sources of information look promising.  Microsoft has published APIs related to defragmenting disks which can locate a file on disk.  The API also allow for cluster relocation.  Beyond this, there are projects for Linux to understand NTFS.  Those teams have done much to discover the structure of NTFS and have included this knowledge in their programs and their documentation.  With these kind of guidelines, with patience, NTFS starts to open up and new things become possible.

There is a bit of vagueness about going on here.  It is still too early to talk about in detail.  However, it does seem that specific tasks are within reach which did not look so possible before.  Combining the knowledge of VHD with NTFS to form new tools looks incredibly attractive.