The Hoju Saram

Wednesday, October 11, 2006

Trying to create a “Domino” like data router in .Net

For all articles relating to Domino to .Net conversion please see this post

One of the greatest features of IBM Domino is that any document (record) is routable to anywhere, because the mail router and the data-engine are integrated. This means that any database can receive mail “out-of-the-box”. It is literally 1 line of code to e-mail a document from one database to another (this includes all rich-text, attachments, OLE Objects). Also Domino does not store data using a defined schema, so you can send a document from one database to another (or from an e-mail client to a database) even if the data fields that are in the document are not defined in the receiving database.

This means , for example, that I can send an e-mail from gmail/hotmail to a Domino database and it will happily store all of the e-mails contents ( SMTP data fields, MIME, Attachments ) , even though it may not even have any field definitions or forms that relate to e-mail defined in the database.

As Domino has all of these features, building an e-mail based document workflow systems with it is a breeze. (These features also mean that many people with a Relational Database background are scared of Domino).

Trying to re-create these features in Microsoft products is quite difficult, but to migrate e-mail based workflow applications from Domino to .Net (without significantly changing business practices) I need a system of this type in place. Lets look at the basic workflow design at a data level so I can better define what is trying to be achieved. At the data level an e-mail based workflow system needs be able to:

1 ) Receive an e-mail from an internal or external person/system.
2 ) Allow custom fields to be added to the e-mail so it can be workflowed ( these might be fields like “Assigned To”, “Workflow Status” , “Approvers Comments”, “Editor_Role”, “Reader_Role” etc.
3 ) Provide for this “Extended E-mail” to be routed to other databases for further workflow(s) if required.

Domino provides all of these features in a single server and a part of the core product, but with Microsoft Products I will at least need.

1 ) An Exchange server – to host the mail box that receives the e-mail.
2 ) An SQL Server – to store all of the extended data.
3 ) IIS Server – to host the .Net Application that will allow the user to interact with the data.

Already you can see there is a large install and management overhead on this; but Domino also provides 7 levels of Integrated Security “out-of-the-box”. Exchange , SQL Server , IIS and the custom .net all have separate non-integrated security models so there is also quite a lot of work setting all of this up, especially in the area of the .net application where I may need to write a custom security and role provider.(That alone is days of work).

As I said earlier Domino stores data without a schema, so adding a new field to any document can be done by the designer or user at run time. This is not true for Exchange. I am no expert on Exchange programming, but whenever I have asked anyone about adjusting the schema of the Exchange Store to add additional fields, the conversation always begins with a big long groan. If I want to migrate a workflow application then this application must be able to perform the same tasks as its previous version. Therefore it must be able to search, add new records, update existing records and delete the data in its datastore. It must also provide multiple levels of programmatically adjustable access to that data. This is certainly not something Exchange can provide so the only solution to this is to push all of the Exchange data into SQL so it can be properly integrated into the .Net Applications Environment.

Again all of these feature are provided “out-of-the-box” with Domino. Domino provides a full-text search engine (including binary attachment searching) , the ability to update/create data in the same data store where the data is indexed and routed to; and Domino provides granular, programmable security.

Every document workflow system is different so the custom data fields and business logic that are added to an e-mail once it gets into a workflow system is dependent on that systems’ requirements. At this stage I am simply trying to allow for e-mail -> datastore routing. So I will focus on:

a) Determining how to get the data from Exchange.
b) Creating a generic data schema for SQL to allow for the storage of e-mails delivered to an Exchange mail-box.
c) Creating an Exchange => SQL Server routing service to deliver the e-mails into SQL so they can then be accessed and workflowed by various .Net workflow systems. This may include multiple e-mail addresses per system.

Getting the Data from Exchange.

Exchange has a number of ways the that data can be accessed. From a development point of view I believe that WebDav is the easiest approach. This is the way that Outlook Web Access gets data from Exchange, so this is the approach I have taken. WebDav also requires no additional API, I can simply use .Net to create WebDav calls (using System.Net.HttpWebRequest) and process the xml response from the Exchange server.

The functions I need to be able to use are:

a) Get a list of e-mails in an in-box.
b) Retrieve the contents of an e-mail, including file attachments.
c) Delete an e-mail from an in-box.

As all mail is going to be routed into SQL, there is no point keeping the e-mail in Exchange, so the process will be . Look for new e-mail => retrieve and convert to SQL= > delete from Exchange.

SQL Schema

The are a large number of fields that are available from the Exchange datastore via WebDav. But for the e-mail based workflow applications I currently have in Domino I only need the following information.

a) who sent the e-mail
b) who the e-mail was to, including cc, bcc
c) when the e-mail was sent
d) the subject of the e-mail
e) the body of the e-mail
f) the file attachments of the e-mail.

WebDav also exposes a number of other fields that are useful for uniquely identifying the mail message, and threading information in the case the e-mail is part of a e-mail conversation. I decided to also included these in the schema. (Useful for e-mail enabled discussion systems)

With these data requirements I came up with two SQL tables that can be added to any SQL database to allow the router service to deliver an e-mail messages to the database from an Exchange mailbox. The SQL for these tables ( Inbox and Inbox_Attachments ) is below.

CREATE TABLE [dbo].[Inbox](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Folder] [varchar](10) COLLATE Latin1_General_CI_AS NULL,
[EntryID] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[MessageID] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[DAVhref] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[DateSent] [datetime] NULL,
[FromName] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[FromEmail] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[ToEmail] [varchar](800) COLLATE Latin1_General_CI_AS NULL,
[CCEmail] [varchar](800) COLLATE Latin1_General_CI_AS NULL,
[BCCEmail] [varchar](800) COLLATE Latin1_General_CI_AS NULL,
[Subject] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[TextBody] [text] COLLATE Latin1_General_CI_AS NULL,
[hasattach] [varchar](50) COLLATE Latin1_General_CI_AS NULL,
[ConversationTopic] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[Subjectprefix] [varchar](255) COLLATE Latin1_General_CI_AS NULL,

CONSTRAINT [PK_Inbox] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]


CREATE TABLE [dbo].[Inbox_Attachments](
[ID] [int] IDENTITY(1,1) NOT NULL,
[InboxID] [int],
[AttachNum] [int] NULL,
[FileName] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[ContentType] [varchar](255) COLLATE Latin1_General_CI_AS NULL,
[FileData] [image] NULL
CONSTRAINT [PK_ Inbox_Attachments] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Building the Router Service.

The router service needs poll each Exchange mail box that it is made aware of and see if there is new mail. If there is, it needs to convert that mail into sql, run that sql against the corresponding sql database and then delete the e-mail from the mail-box.

As the polling needs to be continuous and possibly across hundreds of mail-boxes I have decided to implement this as a windows service application. The service uses a config file that contains the following information for each mail-box –sql pair.

a) The Exchange server for where the mailbox resides
b) The Exchange user name and password for connecting to the exchange mail-box via WebDav
c) The connection string to the SQL database

Using this information the service polls Exchange at a set interval to see if there is new mail in each of the mailboxes. This is achieved from .net with the following code:

//Create an XML Document to store the emails in
System.Xml.XmlDocument xmlDoc = new XmlDocument();

XmlNodeList xmlNodeList;

//Create a http request
HttpWebRequest request =(HttpWebRequest)WebRequest.Create("http://" + this.ex_server + "/exchange/" + this.ex_user + "/Inbox/");

//Set the webdav method and content type
request.Method = "PROPFIND";

//Get the data back as XML
request.ContentType = "xml";

//Put in the credentials to connect to the Exchange mailbox
request.Credentials = new NetworkCredential(this.ex_user,this.ex_password);

//Get the response in a response object
WebResponse response = request.GetResponse();

//Create a Streamreader to read the response object appropriately
System.IO.StreamReader reader =
new System.IO.StreamReader(response.GetResponseStream());

//Load the xml response data into an XML Document
xmlDoc.LoadXml(reader.ReadToEnd());

//Fill the NodeList to process the response
xmlNodeList = xmlDoc.GetElementsByTagName("a:response");

An example of the xml that is returned from the Exchange for this call is viewable here. Once the xmlNodeList is filled then the router can process the information and download each individual e-mail. I have made the source code for the routerworker class that I use to do all of the work available. Please note that I use strongly typed datasets through business objects and Microsoft Enterprise libraries for data access.

Conclusion

With this Exchange => SQL e-mail router it will now be possible to move forward in the conversion of Domino Applications to .Net. This gives me the ability to send an e-mail message to an Exchange mail-box and have it automatically converted and stored in SQL. Once these messages are in SQL, additional data can be added to them and custom business logic can act on them through .Net applications.

This work does however point out how much functionality is lost by moving away from Domino to Microsoft technologies. It also shows that to replace Domino applications requires multiple servers and development tools from Microsoft. All of this functionality is built into Domino. It take 3 servers and days and days of coding to attempt to replace it. There are still many areas that need to be covered such as replacing domino’s secure web object store, but that can be left for another post.

Labels: ,

6 Comments:

Post a Comment

<< Home