Tag Archives: NTFS

VHD versus NTFS alignment

This topic is guaranteed to bore most people.  Or, maybe I am wrong.  Are you the kind of person that loves to defrag your disk?  Are you always looking for new ways of speeding up your machine?  Are knobs and buttons something you love to tweak?  How about that registry cleaner?

No, I am not saying those things are bad.  Some people have the patience for this while others do not.  Usually the people that fiddle with the settings are eventually rewarded.  At the very least they can proclaim that they understand what the settings are really doing.

Thanks to a reader (thanks John!) I was made aware of the offset problem within VHDs.  Well, let me rephrase that.  I knew about it but I did not know its official name or that people had solved it.

John pointed me to this blog post about the offset problem.  There are already tools out there from NetApp to solve the issue but basically you need to be a NetApp customer in order to get access (at least for the tools that actually fix your images).  In truth, the problem affects most virtualisation users.

So, this is the part where the problem is dissected and understood.  Virtual disks still follow rules left over from physical disks.  Specifically, they have a “geometry” setting that indicates things like sectors per track.  Here is an example from my tool named vhddump of one the test VHDs laying around.

Geometry             3F10BA9E
Cylinders            9EBA
Heads                10
Sectors/Track        3F

This information is a field inside the VHD header.  The geometry value splits into the next three values based on interpreting the bytes.  Since VHDs are big endian (big bytes first instead the typical little endian on Windows).  vhddump is reporting the field backwards since it had not been important to preserve the byte order.

Anyways, you can see that our virtual disk has 3F sectors per track.  This translates to 63 sectors in decimal.  The geometry reported (also known as CHS) affects the alignment of partitions.  The rule with older versions of Windows is that the boot partition starts after the first track.  This is the output for the same VHD with vhddump:

Index(0) 80 01 01 00 07 FE FF FF 3F 00 00 00 B5 98 70 02
Index(0) BootFlag(80) Type(07) SectorStart(0000003F) SectorLength(027098B5)
Index(0) CHSStart(010100) CHSEnd(FEFFFF)
Start CHS  Head(01) Sector(01) Cylinder(0000)
End   CHS  Head(FE) Sector(3F) Cylinder(03FF)

Volume Count         1
Volume Index         0
FirstVolumeSector    3F

The master boot record starts at the first sector of the virtual disk.  This dump is from a Windows XP image.  The first sector of the first volume/partition is at 0x3f.

Why is this a problem?

Well the most obvious problem is that the sector 0x3f does not align with VHD blocks or even NTFS clusters.  Since it is an ‘odd’ sector, it is guaranteed never to align with anything inside the VHD.

This problem was first seen when exploring the clusters inside the VHD.  Instead of being neatly aligned with the VHD blocks, it was possible to have a cluster that spanned two blocks.  Even the sector bitmap showing the written clusters did not align on byte boundaries for a cluster.  Not only did it make it harder to correspond information, it was more wasteful to do more work for something that should have been aligned in the first place.

This kind of offset problem would show as a performance problem over time.  It is always more efficient to have alignment.

If people really understood this problem, they would probably insist that the partitions were aligned.  A simple example is a cluster that spans two blocks.  Not only is it a read/write hit to access two blocks, but it also potentially wastes space with the second block if there is nothing else there.

If clusters are aligned with VHD blocks, it is much easier to correlate the file data.  It makes sense that the disk should be aligned not with pretend physical settings but rather the VHD format itself.  Even though it is counter-intuitive, it might make sense to have the first partition start at a 2MB boundary.  Some space would be wasted before and after a given partition but the partition would be guaranteed to be isolated from the MBR area and the other partitions.

John had asked for a tool to fix this.  Unfortunately I do not have time right now to solve it.  There are other areas which are currently more important.  However, it would be fun to write such a tool.

Cluster Map

Defragmentation on Windows XP

 

One aspect of volume management is knowing which clusters are free and which ones are used.  This is typically something managed solely by the operating system but it is sometimes possible to get a glimpse of how things align.  Microsoft published a few interfaces a few years ago that were once considered undocumented.  The set of API targets being able to defrag a disk.  The cluster map is gathered using FSCTL_GET_VOLUME_BITMAP.  A cluster is the most basic unit of the file system.  It is defined by what is specified in the boot sector of the volume.  Windows apparently always uses a sector size of 512 bytes with the option of different cluster sizes (multiples of the sectors).  The two fields in the boot sector are “sectors per cluster” and “sector size”.  The boot sector has this information at offset 0x0B for “sector size” (WORD) and offset 0xD for “sectors per cluster” (BYTE).

The cluster size typically corresponds to the size of the disk.  The larger the disk, the larger the cluster size.  My main 250GB drive has a cluster size of 4K.  Originally the drives were small enough to have the sector size and cluster size match (512 bytes).

Back to FSCTL_GET_VOLUME_BITMAP.  When the information is successfully returned from the IOCTL, it reveals the cluster pattern for the volume.  The structure returned is VOLUME_BITMAP_BUFFER which is effectively a bitmap of used/free clusters.  Each byte in this “Buffer” corresponds 8 clusters.  The lowest bit represents the first cluster of that byte.  Just today I figured that if you had 64 bytes of bitmap data, it would correspond to 2MB of data with 4K clusters.  

The actual output of the bytes shows an idea of where the used and free space is concentrated.  As expected, most of the early parts of the disk are used while the last parts are usually free.  There is also hints of fragmentation since there is gaps between sections of data which probably used to be files.

It is actually possible to gather free/used cluster counts from the bitmap by throwing the data through a counter that changes the byte patterns to actual count pairs.  I wrote a program that scanned the whole bitmap using each nibble to match against pre-programmed arrays.  So, put in 0xF and get back 4 used 0 free.  Put in 0×6 and get 2 used and 2 free.  You get the idea.  Originally I had thought of doing it against the byte but was not looking forward to entering the 256 combinations.

I keep on thinking of defrag programs from the past (like Norton) that show the cluster map (from a high level view) and moving files around.  Now it seems fairly simplistic given the amount of clusters involved.  It also seems a bit risky given the temporary nature of the free/used bitmap.

The point there is that the amount of free/used clusters is always changing based on system activity.  A snapshot using the IOCTL is just a picture in time and does not guarantee that things are still the same.  Even Microsoft recommend to assume that you might not get the free clusters you want for a defrag operation so you better be prepared to try again.

The actual information lives inside NTFS in a metadata file called $Bitmap.  It is MFT record number 6 (reserved and for all time the same).  $Bitmap cannot be directly read from any Windows program since it is only intended for the file system.  Obviously Microsoft does not want anyone to change this file.  It would play major havoc on Windows most likely.  

The cluster map in $Bitmap is in theory the basis of what is returned from the IOCTL.  However, based on not being able to do both at exactly the same instant means that they could vary.  The exception to this would be if you could freeze Windows somehow.

Speaking of freezing Windows, the only way to do this successfully is to access the information when nothing is changing.  The easiest way is to access the volume when it is not booted from.  As long as no running program is changing the non-boot drive, it should be possible to get an accurate snapshot that will stay good over time.

Coming from a VHD angle, you could mount the VHD and then use the IOCTL.  Or, you could spend a lot of time understanding the NTFS format along with the VHD format to go get the $Bitmap file yourself.  Difficult, but entirely possible.

Having come to the end of this post, it seems that this topic might be a bit tangential to what most of you might be interested in.  Let’s assume that it is really meant for the tinkerers out there that like to know where the disk space is really being used.  Please expect a few more words about this area in the coming weeks.

Blocks Versus Files

This topic presents an interesting problem.  A disk is made up of sectors which are arranged as clusters by the file system.  Both NTFS and FAT use a cluster model to clump together sectors into bigger chunks.  The cluster model has been around since the original DOS and still runs strong today.  The boot sector of the volume contains how many sectors of a certain size belong to one cluster.  On my Vista system the clusters are 4K (8 sectors of 512 bytes each).  This can vary for USB Flash Drives and smaller hard drives.  My flash drive reports a cluster size of 32K (64 sectors/cluster).  All of this is fine but then the question becomes why should I care?

The answer becomes more relevant when virtualization comes into the picture.  For a VM, the disk is virtual and is actually a file within another file system (most of the time).  Microsoft and Citrix use the VHD format for the VM files.  The VHD specification is public knowledge since Microsoft has documented as of a couple of years ago.  Given that there is a VHD file, everything needed by the operating system is there.  However, it becomes very difficult to manage this information from the outside.  Yes, there are ways to mount VHD drives within a native operating system, but this process is not necessarily easy to automate.  Well, at least not for everyone.

Then a new factor enters the equation.  Since the outside tools cannot see inside the VHD to understand what Windows is actually using, it becomes very difficult to do any kind of analysis or consolidation.  Microsoft does have a solution for compressing a VHD with Virtual PC 2007.  Unfortunately, there are many steps and it involves executing code both inside and outside the VM.  Wouldn’t it be nice if this could be managed completely from the outside?  Wouldn’t it be nice if every cluster (block) was paired with a file?

This sounds difficult and overall the problem is very tough.  The benefits however would be huge.  Basically any file operation performed on the inside could potentially be performed on the outside.  This would include things like defragmentation and shrinking the VHD to get rid of the blank chunks.  It could also include peering into the VHD to see what is there and even the hope of doing updates.

Other possible ventures would include merging virtual disks and even creating virtual disks out of multiple virtual disks.  It is possible to focus on the files instead of the blocks, it would much more possible to have base and delta disks which would both be allowed to change but yet form a cohesive volume to the user.  It is good to dream.

The sources of information look promising.  Microsoft has published APIs related to defragmenting disks which can locate a file on disk.  The API also allow for cluster relocation.  Beyond this, there are projects for Linux to understand NTFS.  Those teams have done much to discover the structure of NTFS and have included this knowledge in their programs and their documentation.  With these kind of guidelines, with patience, NTFS starts to open up and new things become possible.

There is a bit of vagueness about going on here.  It is still too early to talk about in detail.  However, it does seem that specific tasks are within reach which did not look so possible before.  Combining the knowledge of VHD with NTFS to form new tools looks incredibly attractive.

Determining Volume Cluster Size

On Monday there was a need to determine the cluster size of a NTFS volume.  Searching the web led to the discovery of a few different techniques but nothing that could be absorbed easily into a program.  One technique called for creating a very small file and then looking at the file properties for the space used on disk.  The second technique used the FSUTIL tool (built into Windows).  There was even a third technique which allowed for capturing the output of FSUTIL into Visual Basic to use the cluster size.

Why worry about cluster size in the first place?  Well, normally, you wouldn’t.  It is something that is for those of you that like to fine tune your performance and storage space.  The quick analysis is that having bigger clusters is more efficient for larger files (less fragmentation and faster load with less overhead) but small files can waste heaps of space.  Basically files that do not use the full cluster are going to take up space that other files could have used.  It’s a delicate balance of wants.  Most likely it would be difficult to prove what the optimal cluster size is.  But, before we go to far, it is currently difficult to determine cluster size from a program.

After learning of this problem, the search began for a magic FSCTL Ioctl to the file system to figure this out.  It did not look very promising until my co-worker Anil pointed out that maybe the Win32 GetDiskFreeSpace function might do the trick.

To my surprise, GetDiskFreeSpace did exactly what was needed.  It does not return explicitly the cluster size but it does return (sectors per cluster) and (bytes per sector).  A simple multiplication and the answer is there.  The funny thing is that this function is considered deprecated since it cannot support greater than 2GB volumes.  However, in this case it was extremely useful and not affected by the limit.

The next step was to build a simple command line tool that would exploit GetDiskFreeSpace.  The new tools is called ClusterSize (how creative is that?) and can be run against any volume in the system.  The default (no parameters) is to figure out the C: drive cluster size.  You can specify any other drive on the command line.

For example:

clustersize

clustersize d:\ 

Because it is not possible to post executables from WordPress, here is the source instead.  It is fairly easy and should build under Visual Studio without too much trouble.

ClusterSize source 

Here’s an example of the output from trying it against a USB Flash Drive on my system:

Determining cluster size for volume f:
Volume(f:) ClusterSize(32768) SectorsPerCluster(64) BytesPerSector(512)

At first it did not include the ability to report on the sectors and sector size.  It seemed kind of dull not to report them after the initial runs.

This certainly is not the most exciting topic but it is fun to share new minor discoveries.  This is the first time that Citrixblogger has source using PDF straight from VisualStudio using PDFCreator.  It is much more accurate than trying to post straight into the blog directly.  It even keeps all the pretty colours as well.