Friday, May 30, 2008

Methods for Evaluating CMS and ECM Platforms

I’ve been evaluating nearly a dozen different CMS and ECM platforms, both OSS and commercial, for a little over a month to support an upcoming project. The experience has been very interesting, especially in regards for the need to carefully test the platform programmatically and in requirements definition.

A “Content Management System” and an “Enterprise Content Management” platform have a few important characteristic differences. Most CMS vendors focus of producing a platform that acts as both a web site framework and a secure document repository. The focus in this case is more on the the website than the document repository. ECM vendors focus more heavily on the document repository, methods for accessing it and may have some limited web platform support.

Security and search functionality are critical components in both platform types, though their goals are incredibly different and the implementations profoundly impact performance. The goal of security is to filter out content that the user may access from the general content. Search lets the user search his accessibly pool of content for specific keywords, dates and document characteristics.

Dynamic content types are also a pretty common feature. This is merely the ability to create a database representation of the content item (aka file) and attach a series of metadata fields. You may want to create a “Header” document type that goes at the top of all content pages. You may also want to have several different header documents that vary in some definable way and you may want to rotate through them using some criteria that matches the user’s profile. You could accomplish this by creating a custom header document type and associating a series of keywords that would be used as search criteria for matching a header to a user’s profile.

Full text search and document conversions are somewhat common too, especially in commercial applications. Full text search can search within Word, Powerpoint and a number of other document types. The document conversion feature converts a source document into a completely different format, such as converting a Word document to PDF or HTML.

Workflow is another common feature. It provides the automation necessary to ensure that all those people that must review a document have done so. This is similar to a bug tracking system which changes the owner of a bug when the resolution state changes. Like the bug gets automatically reassigned to QA when a developer marks the bug as fixed.

The origin of these two technologies is relatively interesting. ECM platforms predate the mass adoption of the web. There are 15 year old ECM platforms still being sold, though their names have changed. The focus of the ECM is more toward document management at the desktop while CMS is focuses on web delivery.

ECM is very similar to a source control system such as CVS, SVN and VSS with rich search, workflow and security features. User’s check documents out of the system when editing and then perform a check-in. The desktop integration is often tight, to the point where it may directly integrate with Windows Explorer. Version information, author and revision information are all tracked upon check-in too. The only noticeable difference from most typical source control is that ECM relies on file locking when editing.

ECM platforms often have a very extensive API for integrating it with a number of different languages and applications. Some have native language support and a web services interface. As a result, an ECM is much more of a back-end application similar to a database.

ECM also has extensive security that can map down to individuals within a corporations organizational structure. Sharing documents and restricting access to the right people is usually pretty important. The ECM platforms generally provide access control lists or similar mechanisms. However, the ECM implementation deals with very large volumes of files, like several hundred thousand to several million distinct documents with multiple revisions. Security can really kill a large system’s performance yet many ECM vendors perform very well. They made security and performance a priority.

CMS platforms are more focused on document consumption and usually have a smaller scale. This is not to imply that a CMS can’t handle a million documents. Throw enough hardware at a problem and you can solve anything. However, the goals of most CMS platforms is not as grand as the ECM.

A CMS may still have some measure of workflow, security, back-end and API support but it is far more limited than an ECM. The big difference is that the ECM may include a mediocre customizable user interface that you supposedly would expose to external users while the CMS has a highly customizable front-end that actually looks good.

Security though is often an after-thought or highly limited. Drupal and Joomla! have very limited notions of public, private and registered-user content but you have to install a plug-in to get a richer ACL structure. The problem is that then search and security become separated such that the search results must later be filtered by security (or visa versa). Security strategies that perform these two distinct security operations are often pretty slow. The performance hit is hardly noticeable if you have just a few dozen documents though but get a few thousand or million and you are pretty screwed.

The ECM platforms unify search and security, sometimes resulting in a single query being passed to the DB, which requires less ECM code executing and gives you faster results. This is a generalization of course, there are more than a dozen ECM vendors out there and each have different performance characteristics, but all are geared toward large repositories with complex security schemes.

Again, the goal of the CMS is to present content while the ECM is focused on managing it. The CMS UI is often highly customizable through mechanisms like skins, themes, and in-application UI editor functions. There is also a notion of communities, so depending upon your login you get dropped into a different community which has a different look and feel than other communities on the same server. The CMS may expose deeper enhancements by letting the user modify the actual source of the CMS or through a plug-in model. ECM’s sometimes expose a kind of cheesy custom scripting language that sort of works but it is really limited and primitive when compared with the ability to alter the source of the CMS.

Something important to note is that while CMS and ECM platforms are competitive, they are also complementary. It is often easy to modify the CMS to work with the ECM. The effect is that document loading, search and some security are delegated from the CMS to the ECM. The work is non-trivial, especially if you must expose a security management UI to the web user. This UI would have to merge the CMS and ECM models and almost certainly must be custom written. Still, that’s a lot less work than writing an entire CMS or ECM. The resulting application has the UI and user management features of a CMS and the strong document management of an ECM.

The evaluation process of a CMS or ECM is obviously requirements driven. Specifically, you need to have a good idea of the number of documents and the security model you need to support. Failing to figure out these two requirements will lead you to selecting the much less expensive and more impressive looking CMS platforms. Why? The CMS platforms perform very well with small numbers of documents and simpler security models.

The evaluation process is pretty straightforward but your order of operations and methodology should vary according to your requirements. The model I followed was similar to this:

  • Preparation
    • Get the number of documents that the final implementation would support.
    • Understand the security model from a business perspective.
    • Download a variety of CMS platforms.
    • Download or get a trial version of the ECM platforms. Note: Most ECM platforms are expensive and are generally not worth the hassle of evaluation until you have eliminated the OSS CMS platforms.
    • Write a lot of code to test the API features.
  • Evaluation
    • Build a command line, parameter driven, application to upload a collection of content items. The application should track the time it takes to upload each document. Dump the timing data out in CSV form so that you can load it in Excel and verify that you do not see a significant linear performance degradation.
    • Upload thousands of documents in a flat folder and broken up into folders. For example, upload 1000o files into a single folder and then create 10 folders and upload 1000 files into each. You may see significant performance differences between the two schemes. It may also be worth while to create 1000 folders and add 10 documents to each. Be sure to check when the file actually becomes available. Many systems perform an indexing process against the keywords and sometimes the content itself, which can delay the availability of the content item.
    • Add the ability to upload documents and associate one or more keywords from a list you specify. Test the time for the upload process and how long it takes before the CMS existing search functions can find the content item by keyword. You also are testing how long it takes for a search operation to finish.
    • Add security binding features to your document upload tool that follows the model you think you may have to implement. Be sure to create a truly representative model in terms of users and number of security descriptors your final application would use. Generally, creation of security groups, ACL’s or whatever the security mechanism is pretty easy to do manually though it is tedious. You are again testing the upload time, keyword time, security, indexing and user specific search for content items that the user should and should not be able to see. Remember that the speed of the results of search in this system will be representative of the actual production system.
    • Do another upload of content using 2 or more machines. Not all CMS and ECM systems actually handle multiple clients performing updates. You are checking for the performance degradation as a result of multiple uploading tools. You are also checking that the CMS does not crash nor corrupt your document database. This sounds like it wouldn’t be a problem, however, some CMS platforms actually don’t handle this scenario.
    • Create a command line application that can perform document search by keywords and update and delete documents. This is a very crucial component as you want to measure search operations. You will also want to modify existing documents and metadata to verify that indexing and document availability work correctly. You will also need to check that deletions work as some systems actually don’t handle large deletions all that well and can corrupt your database.
    • Assuming that you’ve done everything mentioned above, try upload a very large number of documents, hundreds of thousands over the course of a weekend or a vacation. Large document uploads and the resulting indexing process may take a very long time. This will help you to gauge system requirements and give you an idea of how long a large import on a production system may take.


These tests sound incredibly tedious and lengthy, arguably they are excessive too. It depends on how critical the CMS or ECM is to your company. A CMS or ECM can become a core component of your application and it should be able to scale up to your production level requirements. The challenge is that you may not find out about the platform’s limitations for months, or even a year after it has gone into production. By that time it is too late to make major changes beyond getting more and more hardware to “solve” the scalability problem. However, early testing can eliminate a lot of these problems.

Addendum

I’ve included a list of various CMS, ECM and portal platforms. I’m including portal systems because the often include CMS or ECM functionality.

  1. Portals
    • Liferay
    • JBoss Application Server
    • Apache Portals
  2. CMS
    • See OpenSourceCMS.com for an exhaustive list.
    • Drupal
    • Joomla!
    • Mambo
    • PHP-Nuke
    • Plone
    • XOOPS
    • Zope
  3. ECM
    • Alfresco
    • Apache Jackrabbit
    • EMC Documentum
    • IBM FileNet
    • Interwoven TeamSite
    • Microsoft SharePoint
    • Oracle Universal Content Management
    • Nuxeo

2 comments:

Anonymous said...

Which one did you choose in the end and Why?

Anonymous said...

Did you produce an ROI based analysis as well?

Post a Comment