Category Archives: Tools

Git Over It

git-logo

Git is a tool used extensively for open source projects.  It is a way of distributing source control instead of the typical central repository.  History has it that Linus Torvalds himself developed the original tool.

Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server.

Why is it called git?  The theory is that Linus named it after himself. :)  However, no one seems to know for sure.

Why git?  It all started as a dispute between the Linux developers and BitKeeper.  The BitKeeper tools had been free to the community until the company charged the developers with reverse engineering their product.   The result was that developers would need to pay to continue using BitKeeper.  Given the “free” nature of Linux development this obviously did not go down well.  Linus led the project to develop an alternative and the result is git.

The overall good news is that git has been widely adopted even though it was only created starting in 2005.  There are several good features it has compared to the traditional source control solutions.

  • Offline changes
  • Free (always helps)
  • Fairly simple commands
  • Distributed source management
  • Open source (for the hardcore)

To get a better introduction, visit the git homepage.  As a set of examples of this homepage:

Git Is...There is also a section on quick start.  Git has two paths for creating a repository.  Either it is created from a clone from another server or it is created from scratch.

GitQuickStartAnything in parentheses is meant to be replaced with specific actions (like a directory or file).  To keep it simple, it is best to experiment with a locally created depot.  It is worth doing an easy walk through of the creation.

  1. git init
  2. git add .
  3. git commit -m “My message”

“git init” creates the initial depot for this current project.  It creates a subdirectory (usually hidden) called .git which contains all the information that git needs to keep to manage the source.  Think of it as being where everything is kept to keep git happy.

“git add .” instructs git to find every single file under the current directory and add it to the staging area to be prepared for committing. It will only add files that are new or changed when used with the “.” which is a great trick.  If there are files that you do not want to automatically add, you can always change the “.gitignore” file to skip over them.  Typically the object and binaries need to be skipped if the tree is actually built.  Doing the add is really just a sign of intent.  It does not actually change anything in the git tree.  Also of note, if you change any files that have been added, you need to add them again if you want to capture the changes in the staging area.

There is nothing to fear with commitment.  Commit with “git commit” just means that you want to capture all the changes into one thing.  This is where it updates the local git tree and it is seen as a kind of “snapshot” of where the code is at.  Keep in mind that a commit means nothing to anyone else.  A commit is just a local thing.  This enables git to be used offline and without the need for a central server ever.

However, git can also be used in a distributed team.  In fact, that was the original intent.  That is the point of the other path for creating a depot on the local machine.

Working off of a remote depot is in this order:

  1. clone a remote depot (git clone)
  2. make changes to the local files (git add)
  3. commit the changes (git commit)
  4. send the patch to someone that cares (git format-patch)

This is not the only way to do it.  It is also possible to submit the changes back to the remote depot assuming you have been authorized.

Coming from a world of PVCS and Perforce, git can take some getting used to.  It seems uncommon for git to be used in commercial products in Citrix (except for Xen products).  XenApp and XenDesktop are based on Perforce.

There are so many different places you can go to find out more about git.  Here are some examples:

Linus Torvalds (git creator) speaks about git at Google Talks.

Thanks goes to Michael Wookey in Citrix Labs Sydney for being such a great git advocate.

Blocks Versus Files

This topic presents an interesting problem.  A disk is made up of sectors which are arranged as clusters by the file system.  Both NTFS and FAT use a cluster model to clump together sectors into bigger chunks.  The cluster model has been around since the original DOS and still runs strong today.  The boot sector of the volume contains how many sectors of a certain size belong to one cluster.  On my Vista system the clusters are 4K (8 sectors of 512 bytes each).  This can vary for USB Flash Drives and smaller hard drives.  My flash drive reports a cluster size of 32K (64 sectors/cluster).  All of this is fine but then the question becomes why should I care?

The answer becomes more relevant when virtualization comes into the picture.  For a VM, the disk is virtual and is actually a file within another file system (most of the time).  Microsoft and Citrix use the VHD format for the VM files.  The VHD specification is public knowledge since Microsoft has documented as of a couple of years ago.  Given that there is a VHD file, everything needed by the operating system is there.  However, it becomes very difficult to manage this information from the outside.  Yes, there are ways to mount VHD drives within a native operating system, but this process is not necessarily easy to automate.  Well, at least not for everyone.

Then a new factor enters the equation.  Since the outside tools cannot see inside the VHD to understand what Windows is actually using, it becomes very difficult to do any kind of analysis or consolidation.  Microsoft does have a solution for compressing a VHD with Virtual PC 2007.  Unfortunately, there are many steps and it involves executing code both inside and outside the VM.  Wouldn’t it be nice if this could be managed completely from the outside?  Wouldn’t it be nice if every cluster (block) was paired with a file?

This sounds difficult and overall the problem is very tough.  The benefits however would be huge.  Basically any file operation performed on the inside could potentially be performed on the outside.  This would include things like defragmentation and shrinking the VHD to get rid of the blank chunks.  It could also include peering into the VHD to see what is there and even the hope of doing updates.

Other possible ventures would include merging virtual disks and even creating virtual disks out of multiple virtual disks.  It is possible to focus on the files instead of the blocks, it would much more possible to have base and delta disks which would both be allowed to change but yet form a cohesive volume to the user.  It is good to dream.

The sources of information look promising.  Microsoft has published APIs related to defragmenting disks which can locate a file on disk.  The API also allow for cluster relocation.  Beyond this, there are projects for Linux to understand NTFS.  Those teams have done much to discover the structure of NTFS and have included this knowledge in their programs and their documentation.  With these kind of guidelines, with patience, NTFS starts to open up and new things become possible.

There is a bit of vagueness about going on here.  It is still too early to talk about in detail.  However, it does seem that specific tasks are within reach which did not look so possible before.  Combining the knowledge of VHD with NTFS to form new tools looks incredibly attractive.

Windows Disk Management

It can be frustrating when the right information is not available.  In Windows there are tools designed to help determining disk configuration but for whatever reason, they are fairly hidden.  Perhaps this is intentional to protect the system from the user.  It would not be hard to make a mistake that could potentially disrupt the entire machine.  For those that are more curious than wanting to change things, it really does not need to be so hidden.

This post is going to take you on a quick tour of the “Disk Management” tool present in Vista.  You can get to the tool through the Control Panel if you really pay attention.  To make it easier, there are screen captures of the decision points.

Continue reading

QVT – Query Intel VT Feature Program

Things are rolling along today.  After determining that the Intel program was a bit heavy for just figuring out whether or not Intel VT is there, it was discovered that the CPUID instruction could be used.  Intel has documentation about CPUID that makes it fairly easy to use.

Hidden inside the documentation is a flag that shows that the processor can handle Intel VT.

In this document are several other features.  It is a good map between the internal technical names and the eventual product feature names.  The summary is that if you can examine this flag, the program will know which way the VT support goes.

So, here is a program that does just that:

QVT source

The program was tested against older and newer machines and it appears to work fine.  The only catch is that it will not work against AMD or other processors.  This program could be used as a framework to build other programs to determine Intel feature set.  For example, it could be determined if the system supports TXT or 64-bit support.  There is a commented out wprintf that could be used to show the flags and with this information it would be possible to map against the CPUID documentation.

Eventually it will be possible to store the executables somewhere to allow for download.  So far that kind of solution has not been obvious yet.

If you are interested in learning more, please read the CPUID page at Wikipedia.

Intel Processor Identification Utility

There is a common problem where it is difficult to determine what your CPU can really do.  Given the widespread addition of features in the last few years, it is often a mystery what your system can really accomplish.  One area of huge importance is whether or not the Intel CPU can support VT (Virtualization Technology).  Many programs (like XenServer) depend on this new feature to be able to virtualize Windows.

Today brought the discovery that Intel provides a program called “Processor Identification Utility” on their web site for download.  As part of this program it identifies the processor as well as features like VT.  Given that they created the cores and are updating the PIU (Processor Identification Utility) to match, it should be a good place to start.

After having downloaded and installed the program (which seemed a bit complex for a query program), the execution was easy.  From here the main features are identified.  To show my ignorance, I had no idea that my work system was a quad (thought it was a dual).  It also showed that VT was there.

Now, this presents a problem.  Either my copy of VirtualPC 2007 is wrong (which tried to use VT) or the BIOS had it disabled.  Most likely VT has been disabled on the board by the configuration.

Regardless, the tool is useful enough to recommend using for determining Intel processor features.

Determining Volume Cluster Size

On Monday there was a need to determine the cluster size of a NTFS volume.  Searching the web led to the discovery of a few different techniques but nothing that could be absorbed easily into a program.  One technique called for creating a very small file and then looking at the file properties for the space used on disk.  The second technique used the FSUTIL tool (built into Windows).  There was even a third technique which allowed for capturing the output of FSUTIL into Visual Basic to use the cluster size.

Why worry about cluster size in the first place?  Well, normally, you wouldn’t.  It is something that is for those of you that like to fine tune your performance and storage space.  The quick analysis is that having bigger clusters is more efficient for larger files (less fragmentation and faster load with less overhead) but small files can waste heaps of space.  Basically files that do not use the full cluster are going to take up space that other files could have used.  It’s a delicate balance of wants.  Most likely it would be difficult to prove what the optimal cluster size is.  But, before we go to far, it is currently difficult to determine cluster size from a program.

After learning of this problem, the search began for a magic FSCTL Ioctl to the file system to figure this out.  It did not look very promising until my co-worker Anil pointed out that maybe the Win32 GetDiskFreeSpace function might do the trick.

To my surprise, GetDiskFreeSpace did exactly what was needed.  It does not return explicitly the cluster size but it does return (sectors per cluster) and (bytes per sector).  A simple multiplication and the answer is there.  The funny thing is that this function is considered deprecated since it cannot support greater than 2GB volumes.  However, in this case it was extremely useful and not affected by the limit.

The next step was to build a simple command line tool that would exploit GetDiskFreeSpace.  The new tools is called ClusterSize (how creative is that?) and can be run against any volume in the system.  The default (no parameters) is to figure out the C: drive cluster size.  You can specify any other drive on the command line.

For example:

clustersize

clustersize d:\ 

Because it is not possible to post executables from WordPress, here is the source instead.  It is fairly easy and should build under Visual Studio without too much trouble.

ClusterSize source 

Here’s an example of the output from trying it against a USB Flash Drive on my system:

Determining cluster size for volume f:
Volume(f:) ClusterSize(32768) SectorsPerCluster(64) BytesPerSector(512)

At first it did not include the ability to report on the sectors and sector size.  It seemed kind of dull not to report them after the initial runs.

This certainly is not the most exciting topic but it is fun to share new minor discoveries.  This is the first time that Citrixblogger has source using PDF straight from VisualStudio using PDFCreator.  It is much more accurate than trying to post straight into the blog directly.  It even keeps all the pretty colours as well.