Streaming Download from CRM in BizTalk

Recently I’ve been working on some projects which involve integration with Dynamics CRM Online.  While CRM seems a pretty cool system to use there are some complexities that you need to deal with from an integration perspective to be able to effectively integrate with it.  This complexity mainly comes because CRM provides a highly configurable system for your users so the trade off is that when you integrate with it there will be some complexities around generic data and services which you need to deal with.  In this article I will discuss one scenario we had to deal with and how we ended up with a relatively cool solution which was reusable for a number of other scenarios we had.

Ok so to begin with the requirements were that we needed to download some extracts of information from Dynamics CRM Online and then pass this data to various partners as part of a daily process.  Fortunately to do this CRM has a feature called FetchXml which lets you build a query in CRM’s query language and then execute it and return the result set.  This is a really handy way to get data out of CRM.  There are some links below to find out more if your not familiar with this.

Although this initially might sound quite straightforward there are still some pretty challenging aspects to this integration interface which we could do with simplifying if we want to make life easier.

Challenges

A summary of the key challenges facing us from a BizTalk perspective are:

1. Paged Data

When we extract data using the FetchXml interface there is a fetch count parameter but with any kind of sizable extract we are going to need to pull this out in pages.  Also we are unlikely to know how many records there are, and they could be changing or adding/removing records at this time so executing a query to get the record count and another to get the data might not work.

The next challenge with paging is that it doesnt sit too well with a BizTalk adapter.  Out of the Box the WCF adapters dont deal with paged result sets.  I could potentially extend WCF to make calls out to CRM and merge together all of the pages but this then brings us to the next problem which would be buffering the data in memory.  A lot of data in an extract could be a problem if its buffered.  This would also potentially rule out the possibility of using an orchestration to keep pinging calls (in line or via a send port) to keep requesting the next page and aggregating them.

We need to find an effective solution which is able to handle the paging pattern we will need to support from CRM.

2. Very generic schema

The next problem is that the data returned from CRM is in a very generic format.  Lets take a look at an example:

<Entity 
	xmlns:i="http://www.w3.org/2001/XMLSchema-instance" 
	xmlns="http://schemas.microsoft.com/xrm/2011/Contracts">
	<Attributes xmlns:d2p1="http://schemas.datacontract.org/2004/07/System.Collections.Generic">
		<KeyValuePairOfstringanyType>
			<d2p1:key>fullname</d2p1:key>
			<d2p1:value 
				xmlns:d4p1="http://www.w3.org/2001/XMLSchema" 
				i:type="d4p1:string">Bob Jones</d2p1:value>
		</KeyValuePairOfstringanyType>
		<KeyValuePairOfstringanyType>
			<d2p1:key>contactid</d2p1:key>
			<d2p1:value xmlns:d4p1="http://schemas.microsoft.com/2003/10/Serialization/" i:type="d4p1:guid">{This would be a guid}</d2p1:value>
		</KeyValuePairOfstringanyType>
		<KeyValuePairOfstringanyType>
			<d2p1:key>donotsendmm</d2p1:key>
			<d2p1:value xmlns:d4p1="http://www.w3.org/2001/XMLSchema" i:type="d4p1:boolean">false</d2p1:value>
		</KeyValuePairOfstringanyType>
		<KeyValuePairOfstringanyType>
			<d2p1:key>modifiedon</d2p1:key>
			<d2p1:value xmlns:d4p1="http://www.w3.org/2001/XMLSchema" i:type="d4p1:dateTime">2014-11-10T10:46:06Z</d2p1:value>
		</KeyValuePairOfstringanyType>
	</Attributes>
	<EntityState i:nil="true" />
	<FormattedValues 
		xmlns:d2p1="http://schemas.datacontract.org/2004/07/System.Collections.Generic">
		<KeyValuePairOfstringstring>
			<d2p1:key>donotsendmm</d2p1:key>
			<d2p1:value>Send</d2p1:value>
		</KeyValuePairOfstringstring>
		<KeyValuePairOfstringstring>
			<d2p1:key>modifiedon</d2p1:key>
			<d2p1:value>11/10/2014 10:46 AM</d2p1:value>
		</KeyValuePairOfstringstring>
	</FormattedValues>
	<Id>{This would be a guid}</Id>
	<LogicalName>contact</LogicalName>
	<RelatedEntities xmlns:d2p1="http://schemas.datacontract.org/2004/07/System.Collections.Generic" />
</Entity>

As you can see this data is very generic in structure and while there are CRM schemas available for BizTalk any mapping we do from this data is going to be painful with digging into those generic collections all of the time, and also any changes around entity or attribute names are going to have an impact on our BizTalk mapping which is going to be painful to troubleshoot.

Ideally we want to simplify this data when it is downloaded so we can make out lives easier downstream.

3. Message Wrapping

When we download data from CRM via this interface we will get a collection of Entity nodes with their inner content.  This will be the same regardless of the type of download.  This makes things tricky because in BizTalk we are happiest if we can uniquely identify different types of message.  This makes it easier for us to implement routing/identification/parsing/mapping of these messages.  In the solution we would like to wrap the xml data we extract from CRM in XML tags so we can uniquely identify the message.

4. Optional Attributes

When you send a FetchXml request to CRM you will get the attributes supplied in the request returned for the entities in the result set.  One complication is that a lot of the attributes are not required.  If an entity has no value for an attribute then the returned XML does not contain that attribute for the given entity.

 

Introducing the CRM Streaming Download

With the above in mind I spent some time trying to workout a good way to make the BizTalk solution effective for this project.  In the end I decided that if I could create a solution where the data could be downloaded from CRM in a streaming fashion then I could plug this into a BizTalk pipeline and I would be able to download a sizable amount of data with a minimal effect on the overall performance of the system.  Before I started getting into this a quick refresher from Yossi Dahan’s old Streaming Pipeline Component Whitepaper and then in at the deep end.

The high level solution of what I want to do is as follows:

  1. I will use the scheduled task adapter which on trigger will inject a file into BizTalk which contains the FetchXml request.
  2. A send port will subscribe to the FetchXml request coming from the receive port and in that send port I will house my CRM Download Pipeline Component
  3. When the pipeline component executes it will read the existing stream to retrieve the FetchXml message which will be used as the request to CRM
  4. The pipeline component will then set up a custom stream I have created called the CrmDownloadStream which will be given a connection string from the pipeline configuration and also the XML snipet to go at the start and end of the message.  The configuration will also supply a value for the fetch count which will determine how many records are extracted on each call to CRM
  5. The pipeline will then replace the message body with this new stream and add it to the Context resource tracking
  6. As BizTalk processes the message and the stream is read then the custom CrmDownloadStream will make calls out to CRM via the Organisation Service and download the data which will be returned.  Each time the stream reaches the end of the already downloaded data it will call back out to CRM to see if there are anymore records to retrieve.  Note I will also implement a flattening pattern to simplify the data which I will discuss later.
  7. BizTalk will then use the File adapter and as the stream is read the data will be written to disk.
  8. From here BizTalk can pick up the newly written file and it will be in a nice XML format with a namespace so the message type can be identified and the data can be easily mapped and routed

 

Design Decisions

In this solution I made a few key design decisions which are worth taking a moment to discuss.

1. Why write the message to disk then read it back in again

I think the pipeline could be used on both the receive or send side of BizTalk and it would normally work ok.  In practice though I have found that sometimes when calls are made out to CRM Online we would get an error where the ACS instance under CRM would respond with an error which said “no endpoints listening”.  If the download is executed on the receive side then you will not automatically get a retry mechanism so you will need to monitor for this error and resume the instance yourself.  Although the error does not happen that frequently it did hit us a couple of times so I chose to put the download on the send side where we for free get a retry mechanism which we can configure to handle this error and will cover us for most cases.   We could just write the message to a locked down file location and then have BizTalk pick it up again when the write was complete and then we are in the normal BizTalk world of dealing with an XML file from disk.

2. Flattening the CRM Xml

I decided that I wanted this component to flatten the XML from CRM before I had to process it in BizTalk.  You can see above the complex xml that would be painful to work with.  In the component I wrote a class called FlattenedXmlFormatter which had the job of taking a CRM Entity object and converting it to XML.  The easiest way of doing this is to use the DataContractSerializer but this gives you the nasty XML above.  In my FlattenedXmlFormatter I decided that I would go through the entities and related entities within this object and then convert then to an element name which would be a concatenation of the entities logical name and the attributes name.

As an example the fullname attribute of the contact object would come out as <contact.fullname>.

In addition to converting the data I also needed to deal with the fact that there are a number of different data types in the Entity XML which can be formatted differently.  The formatter will convert these values too.  This can be seen in the FormatAliasedValue and FormatAttributeValue methods.

 

Walk Through

Now lets take a walk through of the code.

1. The CrmWrapper

The CRMWrapper is responsible for managing the interaction with CRM.  It will be instantiated with a connection string to CRM and a fetchXml request and the XML you would like to wrap your message.  This class will then be used by the CrmDownloadStream to make calls out to CRM to get data.  This will be done by calling the GetNext method.  The get next method will include the start XML the first time it is called and put the end XML on just after the last data has been downloaded from CRM.  The data it returns is a byte array which can be read from the stream.

This class also handles the paging from the CRM request side of things.

I am not going to go too much further into this class within this article as its relatively complex but the code can be downloaded at the end of this article if you want to look at it.

2. The CrmDownloadStream

As I mentioned the bulk of the work is done in the CrmDownloadStream.  When this is initially setup if has a reference to a CrmWrapper object allowing it to interact with CRM.  In the CrmDownloadStream it is very similar to the example from Yossi’s whitepaper where it does the read of data but also holds any additional bytes which are beyond what the streams read method has requested in a temporary buffer so that the next read will read from the temp buffer and then only once the temp buffer is cleared will it call back out to CRM to get more data.

This approach is quite useful because it will let you tune the balance between how much data you pull back from CRM per page to balance the memory usage versus the latency of making more calls to CRM.

In the Pipeline Components Test Project there is a unit test of the CrmDownloadStream which demonstrates its use.  You can see this in the picture below.

Test

In this code I am setting up the CrmWrapper and the FetchXml request and use them to create my CrmDownloadStream.  I then use a FileStream to demonstrate how I could read the CrmDownloadStream and write the data from it in chunks to a file.  If you run this test then you would end up with the output file containing the resulting data from your download.

Im not going to go too much more into the internals of the CrmDownloadStream code to keep this article a bit shorter but the code is at the bottom.

 

3. The CrmDownload Pipeline Component

The below picture shows the key code from the pipeline component.  In here you can see it is not too dissimilar to the test above.  The key differences are that this time we have obtained the FetchXml request from the original BizTalk stream and we have grabbed the properties for the CrmWrapper from variables which are populated by the Pipeline Component configuration when it is used within BizTalk.

Pipeline Component

 

The final bit of the code in the Pipeline Component replaces the stream with out new CrmDownloadStream and makes sure that BizTalk will manage the resources then allows the message to return.  When BizTalk subsequently starts reading the stream as described in the whitepaper in this streamed fashion then it will call out to CRM and the download will be performed in a way that is optimized to minimize the amount of data buffered in memory.

 

4. BizTalk Bits

In the sample you will see a project called AppFx.BizTalk.Apps.Sample.Dynamics.Crm which contains a send pipeline.  If you deploy the two projects to BizTalk (they are already strong named) and then setup some ports you should easily be able to see this in action.

I set up a receive port which simply took a message from Disk and the send port subscribed to anything from the receive port.  In the send port I used the CrmDownloadTransmit pipeline.  This pipeline contained nothing but our CrmDownload pipeline component.  The send port would then write the message out to disk.

5. The Output File

In the picture below you can see an example of the output file that would be created by BizTalk using our super new CrmDownload approach.  (Note that I have removed all but the first entity to make it easier to read).

Output

 

In the XML you can see that the approach to flatten the data has made it much easier to read this XML but also it would now be much easier to create a schema for the data too and you can easily identify each attribute too.

One tip to note here is that when I created a schema you still have the challenge that the elements under the Attributes node and Formatted Values note are optional and not really in a guaranteed order.  When I defined the schema under each of these nodes I put an xs:Choose element and made each of the sub elements have a maxOccurances of 1 and a minOccurances of 0.  This allows the schema to deal with the order and optional challenges.

Conclusion

In conclusion this was a very useful component and approach which we were able to reuse for any data we wanted to download from CRM.  For each new interface we would simply use a new scheduled task receive and make it inject different FetchXml depending upon the data we wanted to download.  Once we had the output from CRM in our nice flattened structure we were able to do all of the usual BizTalk things and work in the typical BizTalk patterns to do whatever processing was required.

 

Code Sample

The code for this sample can be found at the following location:

Download