A Custom FlatFile Schema Resolver and Disassembler Pipeline Component

Recently, I have been tasked with architecting a BizTalk solution capable of receiving and processing an arbitrary number of flat file messages on a single receive location. My first reaction was to allow for a sequence of FlatFile Disassembler components in the receive pipeline, each responsible with disassembling a particular format.

However, this solution introduces some processing overhead. Furthermore, the number of flat file formats to receive is not known beforehand, so this solution requires a new deployment each time a new flat file format is introduced.

I then set out to create a unique custom pipeline component, whose task would be to resolve at runtime and select an appropriate flat file schema to use for disassembling. This component would be parameterized with a known list of Schema Resolver components and would only require a configuration update in case a new flat file is introduced.

Writing such a component is not difficult. There is a Schema Resolver Component Sample available from Microsoft. But I wanted my component to be easy to configure, user-friendly and flexible.

In this post, the first in a series of articles, I will walk you through writing such a component. Along the way, we will see how to implement a plug-in type of architecture, how to efficiently implement the IProbeMessage interface and avoid associated common pitfalls, as well as provide a nice user-friendly configuration interface for configuring the pipeline component at design-time.

Runtime Schema Resolution

At the heart of the design of this custom component lies the concept of dynamic schema resolution. As the name suggests, this is a mechanism whereby a message is received and its associated flat file schema is resolved and selected at run time.

In this solution, the customer is required to supply an implementation of custom class whose purpose is to read portions of the incoming stream and decide which flat file format will be used to eventually disassemble the message.

As far as I know, this is a unique concept in BizTalk, for which there is no native support in the product. Therefore, I have created the following interface for this purpose:

public interface IResolveSchema
{
    void GetClassID(out Guid guid);
    DisassemblerProperties Resolve(IPipelineContext pContext, IBaseMessage pInMsg);
}

The first method, GetClassID, will prove useful when implementing a custom plug-in like system for centralizing the discovery of available resolvers. This will be the subject for a next post.

The most important method is Resolve which, as its name implies is the raison d’être for any class that needs to somehow determine a flat file schema at runtime.

This method is the one that will be called by the custom disassembler on each instance of a Resolver component as part of the IProbeMessage interface implementation.

This method is responsible for carrying out any processing required for determining the flat file schema to use for disassembling the message. When a message is recognized, this method will return an instance of the DisassemblerProperties class that holds, at its name suggests, properties that represent the flat file schema as deployed in BizTalk Server.

The DisassemblerProperties class goes like this:

public sealed class DisassemblerProperties
{
  private SchemaWithNone headerSpecName_ = new SchemaWithNone(String.Empty);
  private SchemaWithNone documentSpecName_ = new SchemaWithNone(String.Empty);
  private SchemaWithNone trailerSpecName_ = new SchemaWithNone(String.Empty);

  public static DisassemblerProperties Empty = new DisassemblerProperties();

  #region Construction

  internal DisassemblerProperties()
  {
  }

  public DisassemblerProperties(string documentSpecName)
   : this (String.Empty, documentSpecName, String.Empty)
  {}

  public DisassemblerProperties(string headerSpecName, string documentSpecName, string trailerSpecName)
    : this (new SchemaWithNone(headerSpecName), new SchemaWithNone(documentSpecName), new SchemaWithNone(trailerSpecName))
  {}

  public DisassemblerProperties(SchemaWithNone documentSpecName)
    : this (new SchemaWithNone(String.Empty), documentSpecName, new SchemaWithNone(String.Empty))
  {}

  public DisassemblerProperties(SchemaWithNone headerSpecName, SchemaWithNone documentSpecName, SchemaWithNone trailerSpecName)
  {
    headerSpecName_ = headerSpecName;
    documentSpecName_ = documentSpecName;
    trailerSpecName_ = trailerSpecName;
  }

  #endregion

  #region Attributes

  public SchemaWithNone HeaderSpecName
    { get { return headerSpecName_; } }

  public SchemaWithNone DocumentSpecName
    { get { return documentSpecName_; } }

  public SchemaWithNone TrailerSpecName
    { get { return trailerSpecName_; } }

  #endregion

  #region Equality Operators

  public static bool operator ==(DisassemblerProperties left, DisassemblerProperties right)
  {
    return
      ReferenceEquals(left, right) ||
      (
        left.HeaderSpecName == right.HeaderSpecName &&
        left.DocumentSpecName == right.DocumentSpecName &&
        left.TrailerSpecName == right.TrailerSpecName
      );
  }

  public static bool operator !=(DisassemblerProperties left, DisassemblerProperties right)
  {
    return !(left == right);
  }

  public override bool Equals(object obj)
  {
    return (this == (DisassemblerProperties) obj);
  }

  public override int GetHashCode()
  {
    return 
        HeaderSpecName.GetHashCode() &
        DocumentSpecName.GetHashCode() &
        TrailerSpecName.GetHashCode()
        ;
  }

  #endregion
}

Notice that this very simple class is responsible for holding three properties, that will be used to determining the header, document and trailer portions of the flat file respectively. Most of the noise around this class has to do with implementing equality and comparison, as well as representing an invariant Empty state.

The class makes use of a poorly documented but very useful SchemaWithNone class. Here, it is only used to hold key pieces of information about a schema but it can do much more.

Typically, implementations of this interface will be responsible for reading portions of the incoming message stream so as to be able to recognize its contents. One key property of a robust pipeline component, however, is to handle incoming messages in a streaming manner. In order to do that, the contents of the incoming message stream must be read at most once in the entire sequence of pipeline components.

However, each Resolver component is free to read as little or as much as necessary from the supplied stream in order to recognize the message. Therefore, I will introduce a helper class whose job is to keep track of any portion in the original message stream that has already been read.

Additionaly, this helper class will make the job of implementing a custom Resolver component easier.

Let’s call this helper class SchemaResolverBase:

public abstract class SchemaResolverBase : IResolveSchema, IBaseComponent
{
  MarkableForwardOnlyEventingReadStream stream_ = null;
  IPipelineContext pContext_ = null;
  IBaseMessage pInMsg_ = null;
  DisassemblerProperties properties = DisassemblerProperties.Empty;

  #region IResolveSchema Implementation

  void IResolveSchema.GetClassID(out Guid classID)
  {
    object[] attrs = GetType().GetCustomAttributes(typeof(GuidAttribute), false);
    System.Diagnostics.Debug.Assert(attrs.Length == 1);
    System.Diagnostics.Debug.Assert(attrs[0] is System.Runtime.InteropServices.GuidAttribute);
    classID = new Guid((attrs[0] as System.Runtime.InteropServices.GuidAttribute).Value);
  }

  DisassemblerProperties IResolveSchema.Resolve(IPipelineContext pContext, IBaseMessage pInMsg)
  {
    pContext_ = pContext;
    pInMsg_ = pInMsg;
    stream_ = new MarkableForwardOnlyEventingReadStream(pInMsg.BodyPart.GetOriginalDataStream());

    pInMsg.BodyPart.Data = stream_;
    pContext.ResourceTracker.AddResource(stream_);

    return ResolveSchema(stream_);
  }

  #endregion

  #region Attributes

  public IPipelineContext PipelineContext
  { get { return pContext_; } }

  public IBaseMessage Message
  { get { return pInMsg_; } }

  #endregion

  #region Overrides

  public virtual DisassemblerProperties ResolveSchema(MarkableForwardOnlyEventingReadStream stream)
  {
    try
    {
      stream.MarkPosition();
      return Resolve(stream);
    }
    finally
    {
      stream.ResetPosition();
    }
  }

  public abstract DisassemblerProperties Resolve(Stream stream);

  #endregion
}

Notice that this helper class implements the IResolveSchema interface, defined earlier.

The GetClassID method is implemented by looking up though reflection the presence of a System.Runtime.InteropServices.GuidAttribute attribute on the declaration of the class that implements custom schema resolution.

The Resolve method first stores its arguments in easily retrievable data members for later use. Then, it wraps the original message stream in an instance of the MarkableForwardOnlyEventingReadStream class. Notice that this additional stream is registered with the BizTalk Resource Tracker in order to protect it from being garbage collected too soon.

We’ve seen this class and this pattern already when we covered how to determine the type of an incoming XML message.

Finally, the Resolve method delegates its work to a simplified Resolve overload whose sole purpose is to mark the current position of the stream on entry and reset it on exit. In between, the real work is handed off to yet another, abstract, Resolve overload that will be supplied by custom implementations.

Eventually, custom Schema Resolver implementations will only deal with plain-old vanilla System.IO.Stream streams, and will not have to deal with buffering reads in order to satisfy the requirements of a robust streaming pipeline component.

Okay, that’s a lot for a single post. But bear with me; next time, we’ll see how to centralize custom resolvers and make their discovery easier, and then we will look into eventually implementing our custom pipeline component.

This entry was posted in Pipeline Components. Bookmark the permalink.

6 Responses to A Custom FlatFile Schema Resolver and Disassembler Pipeline Component

  1. Pingback: Adding Installation and Uninstallation Logic to a .Net Component | A Technical Perspective

  2. Pingback: Custom Schema Resolve Disassembler Implementation | A Technical Perspective

  3. Pingback: Custom Schema Resolve Disassembler Implementation | A Technical Perspective

  4. mike says:

    Really helpful for better understanding of underlying logic.

    Mike
    mcse exams

  5. Tim Hennessy says:

    Great article, I’m interested in taking it a step further. Could you post or send me the sample code for this to get started?

    Tim

    • Hi Tim,

      Thanks for your feedback. The source code for this project is all there for the taking. You can check the various articles from this blog for the base pipeline component class. This post is part of a series which walk you through all that’s needed to implement the component.

      Please, tell me what specific areas you have a difficulty with, and I’ll be happy to help and provide guidance.

      Maxime.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s