Checking for Managed Assemblies without Loading them

One of the nice thing about our PowerShell Provider for BizTalk, is its capability to infer the kind of resources you want to deploy without specifying an explicit Item Type.

Specifically, the provider currently detects managed assemblies, BizTalk artifacts assemblies and in-process COM components.

One of the earliest attempts I made to detect whether an assembly was managed, was to actually try and load it via Reflection and catch a potential System.BadFormatException.

This worked quite well, but had a negative impact on the performances of the provider over time.

Indeed, the provider is designed to run inside PowerShell, that is a shell that the user might be running inside a window for potentially long period of times. And the negative side-effect of the naive approach mentioned above, is that once loaded, and assembly is never unloaded until the end of the process lifetime.

Well, more appropriately said, once loaded inside an AppDomain, an assembly is never unloaded until the AppDomain is teared down.

This meant that the more you deployed resources with the PowerShell provider, the more assemblies got loaded into the PowerShell process space without any reclaiming the memory.

Loading Assemblies in Separate AppDomains

The solution is obvious.

It involves creating a temporary AppDomain in which to perform the check on the specified assembly, gather the result, and unload the AppDomain.

The problem, however, is that once you create extra AppDomains inside your process space, you need to take care of such things as marshalling, calling code through proxies, etc. In fact, you are effectively using a mini-RPC mechanism built into .NET.

Here how it goes:

Creating Custom AppDomains

First you have to create an AppDomain, and resolve the assembly you want to load inside it.

private static Assembly AppDomain_AssemblyResolve(object sender, ResolveEventArgs args)
{
    try
    {
        Assembly assembly = Assembly.Load(args.Name);
        if (assembly != null)
        return assembly;
    }
    catch (System.Exception /* e */)
    {
    }

    return null;
}

private static AppDomain CreateAppDomain(string appDomainName)
{
    AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(AppDomain_AssemblyResolve);

    AppDomainSetup appDomainSetup = new AppDomainSetup();
    appDomainSetup.ApplicationBase = AppDomain.CurrentDomain.BaseDirectory;

    AppDomain domain = AppDomain.CreateDomain(
          appDomainName
        , new Evidence(AppDomain.CurrentDomain.Evidence)
        , appDomainSetup
        , new PermissionSet(PermissionState.Unrestricted));

   return domain;
}

Notice that the AppDomain_AssemblyResolve method is specified as the target of the ResolveEventHandler delegate in the first line of the CreateAppDomain method. This is absolutely critical to the way separate AppDomains work.

Recall that a request to load an assembly into a separate AppDomain is performed from the main process. The .Net Framework will make sure a new AppDomain gets created with the specified parameter and will work its magic to ensure that the delegate gets called in the context of the separate AppDomain. At this stage, when the delegate runs, the separate AppDomain has a chance to load inside its context the assembly specified in its ResolveEventArgs argument.

The code shown here is very simple, since I know that what gets passed into the ResolveEventArgs argument will be the fully qualified name of the assembly I want to load. In more complex scenarios, it might be necessary to perform additional work in order to actually resolve the assembly name in order to load it in the AppDomain.

Here is the full code of the client function that wraps both functions shown above:

public static bool IsManagedAssembly(string path)
{
    AppDomain domain = null;

    try
    {
        domain = CreateAppDomain("IsManagedAssembly");

        ManagedAssemblyHelper proxy = 
            (ManagedAssemblyHelper) domain.CreateInstanceFromAndUnwrap(
                  Assembly.GetExecutingAssembly().Location
                , typeof(ManagedAssemblyHelper).FullName);

        return proxy.IsManagedAssembly(path);
    }
    finally
    {
        AppDomain.Unload(domain);
    }
}

Actual Processing

The actual check as to whether an assembly is managed is done in the ManagedAssemblyHelper class, a proxy of which has been created and used in the preceding steps. The function is straightforward, and is following the approach initially used as mentioned in the introductory text of this post:

public class ManagedAssemblyHelper : MarshalByRefObject
{
    public bool IsManagedAssembly(string path)
    {
        try
        {
            Assembly.Load(GetRawFileContents(path));
            return true;
        }
        catch (BadImageFormatException /* e */)
        {
            return false;
        }
    }
    private byte[] GetRawFileContents(string path)
    {
        using (System.IO.FileStream stream = new FileStream(path, FileMode.Open, FileAccess.Read))
        {
            byte[] bytes = new byte[stream.Length];
            stream.Read(bytes, 0, bytes.Length);
            return bytes;
        }
    }
}

Notice that the class derives from MarshalByRefObject. This enables the .Net Framework to provide a proxy to the object to the AppDomain that requested it, instead of creating a copy of the object. By default, objects that cross an AppDomain boundary are marshalled by value which would be inefficient and unnecessary in our case.

Simply, the assembly is loaded inside the AppDomain. If this works at all, we conclude that this is indeed a managed assembly. Otherwize, we catch the BadImageFormatException and return false.

It seems that the only way to load an assembly is to actually feed the function with the raw bytes that comprise the COFF-based image of the assembly. This is done with the simple GetRawFileContents helper function.

You can check out the complete working code on the Source Code section of the CodePlex project.

This entry was posted in Tips. Bookmark the permalink.