Thursday, March 13, 2014

ECM & Business Intelligence: Integration example with Alfresco, Pentaho, CmisSync

Is your company using a server to store all of the company's documents (contracts, manuals, etc)?
If yes, this document server contains a large part of your company's information.
It makes it a good target for Business Intelligence: Using analytics tools to get useful insight about how your company works.

Let's do it with this scenario:
- Employees all have a "Expenses" folder on their computers, where they put their company expenses receipts (picture of train ticket when visiting a client in another city, HTML file of the command confirmation page after buying a cartridge for the company's printer, PDF receipt of a license purchase, etc).
- Employees enter the amount paid for each of these items, to be reimbursed by the company at the end of the month.

We can implement this system easily with Alfresco, Pentaho, CmisSync, thanks to the CMIS standard.

Part 1: Gathering receipt data (aka ECM part)

First, define a "aegif:expense" type on your document server (replace "aegif" with your company's name). Define an "amount" property for this type. (This can be done on any modern document management server, for instance Alfresco or NemakiWare)

Second, how to gather the receipts from each user's laptop?

→ Have each employee use CmisSync, which synchronizes automatically a folder between their local "expenses" folder and their remote "expenses" folder.

On the server side, have a folder rule that applies the "expense" type to any file that the employee puts in this folder.

How can an employee specify the amount associated to each expense file?

→ By using the context menu provided CmisSync Business. This context menu allows employees to edit document metadata. So, for each picture of a receipt they can enter the amount (by the way, this task can easily be outsourced at this point).

Now, all of the data is in your document server.

Part 2: Analyzing the data (aka BI part)

Let's analyze this data with Pentaho, a Business Intelligence suite.
More precisely, we will use Pentaho Data Integration (also known as Spoon)

All modern document management servers offer a CMIS API, which Pentaho can use thanks to the CMIS Input plugin. This plugin can be installed very easily from PDI's Marketplace (free).

Set up a transformation with the CMIS input feeding a table output as seen below:

Set up the CMIS Input step to select the documents you want data about. In our example, we want all docume

The table gets filled with all of the documents' metadata: filename, size, and of course the "amount" property we have defined.

You can then use that data with an OLAP cube and perform drill-downs, or generate reports. To close the loop, you may even want to set up a job to upload generated reports back to the document server.


Bridges are needed between the ECM and Business Intelligence worlds, and the CMIS standard is the most effective way to build them.

I want to thank Francesco Corti for developing Pentaho's CMIS Input plugin and making it Open Source, and Jeff Potts for his answers about the CMIS implementation provided by Alfresco.

I recently did a demo on this topic at the Japan Pentaho User Group, here are my slides (partly in Japanese).
Nicolas Raoul

1 comment: