Friday, January 25, 2008

Data Modeling a Maze

Couple of weeks back, my friend took me to a maze. I was lost in a couple of minutes and was getting really frustrated after a while. I wasn't sure what algorithm they had used to construct the maze.

The only algorithm that I knew was the "Wall follower". All you have to do is it to follow either your right-hand or left hand touching the wall and you will reach either the exit or the entrance. I did take the longest path, but eventually reached the EXIT.

This algorithm would work only if all the walls are connected to form a loop. From that point, I was quite fascinated with the algorithms associated with the maze. There are also a few other efficient algorithms like Tremaux's algorithm. Visit Think Labyrinth for more fun.

After that, I thought of simulating a maze. Unfortunately, I am quite inept at programming languages, so decided to do what I know best. I thought of creating a simple data model for a Wall Follower Maze. It turned out to be quite an interesting problem. It took me around 30 minutes to come up with a decent logical data model, that would work for quite some scenarios.

So, the first model that I came up with is shown below. (Click on the picture to enlarge)



Let me give a quick explanation of the model

  1. Design Co-ordinate: Super-type Entity
  2. Entry, Exit and In-Maze Coordinate: Sub-Type Entities of DESIGN CO-ORDINATE which contains the co-ordinates of the location, where a player has to take a decision.
  3. Decision: Entity which holds whether to turn LEFT,RIGHT,UP,DOWN or ABORT.
  4. Decision Map: Entity which holds the map of a START-COORDINATE, DECISION TAKEN(whether to move left,right,up,down (or) abort) and an END-COORDINATE (the co-ordinate where he lands after he takes the decision).
  5. Player: Entity which holds information about the player of the maze.
  6. Movement: Entity which tracks the movement of the player.

There is one interesting phenomenon happening in this model. If an intelligent player has to play this model, this model would work, because the association between MOVEMENT-PLAYER-DECISION MAP has been modeled as an Identifying relationship. It means if a player, tries to navigate the same path twice, the system will spit out an error, (simulating an INTELLIGENT player, because he would never do the same mistake again).

But if a DUMB player had to play this maze, then this model wouldn't work, because a DUMB player would make the mistake of traversing a path with no fruits, again and again. So the association should be made as a Non-Identifying relationship.

How can I model both the scenarios at one-shot, without introducing redundancy in the entities or associations?

One of the ways to model both the scenarios is to track both their movements as 2 different MOVEMENT entities and include the constraint in the INTELLIGENT PLAYER MOVEMENT's entity. But this introduces an extra associative entity. It's easy for a toggle situation, Yes or No, DUMB or INTELLIGENT.

But, if I were to model differing levels of intelligence, how would I do it in a data model, without writing any procedural code to do it? How can E-R data models be efficiently designed for Fuzzy Logic Systems?

I found this as an interesting exercise to showcase that E-R data models are still long way from being truly a self sufficient tool.

We need a modern day E.F.Codd


Wednesday, January 23, 2008

Kalido's Business Information Modeler

Today, I received an update from Kalido on the Business Information Modeler Engine. This is what Kalido claims about the product.

"Kalido Business Information Modeler provides a graphical design interface that can be used to develop and refine business requirements for new and existing information. Instead of modeling data and their structures, the Kalido Business Information Modeler allows you to model the actual parts of your business; customers, products, assets, transactions, even people – and define how you want to see information in context. Even better, the Kalido Business Information Modeler can be used to change and update your model directly against the Kalido Dynamic Information Warehouse, allowing you ultimate flexibility in meeting the information needs of your business. The Kalido Business Information Modeler dramatically improves your ability to meet the needs of your business when it requires it – not when how it’s stored determines it".
The product is due for March 2008. I am waiting to experiment on the new features it claims. I will be evaluating the product on the following questions.
  1. Can an in-house data warehouse be easily migrated into Kalido?
  2. Will the business layer completely abstract the data layer?
  3. Is it just a visual aid for creating/maintaining your data model?
  4. Will the data in the warehouse be used by the tool to help the modeler provide real-time feedback on the errors and the inconsistencies of the new model that he plans to implement?
I will be writing more on this interesting product after I get a practical hands-on. Visit www.kalido.com for more details.

Sunday, January 20, 2008

Oracle snaps up BEA systems

Oracle has recently purchased BEA systems for 8.5 billion USD. One of the reasons that I think could be behind the motive of this purchase is that they wanted to get hold of the large customer base that BEA had. This will also help Oracle to compete with IBM tightly in the middle-ware space.

This move also helps Oracle's customers to move into a subscriber based model. Oracle claims that the vision of the acquisition is expected to accelerate innovation by bringing together two companies with a common vision of a modern service-oriented architecture (SOA) infrastructure and will further increase the value that Oracle delivers to its customers and partners.

Oracle has eliminated a strong commercial rival and forayed itself into into the enterprise middle-ware market edging to be the market leader.

Friday, January 18, 2008

Styles of MDM framework

MDM can be essentially fitted into 3 styles of framework

1. Registry based approach: The MDM contains a reference to the actual data stores and doesn't contain the data itself. It has pointers to the respective source systems for the attributes it hosts. Data governance and integrity are left to the source systems to handle. Quick way to setup a MDM. The registry will decide where to pick up the data from at run-time. I haven't seen many companies implementing this model.

2. Centralized Hub: The master data is integrated from different applications, cleansed, standardized, corporate-governed, secured, authorized and published to different subscribers from one central repository. This hosts the entire MDM data. Takes a long time to setup, but its one of the most efficient systems of integrating master data.

3. De-Centralized Regional Hub: This is similar to the previous implementation, but the corporate data is maintained in a global MDM hub and the regional/ business MDM requirements are maintained in a local/regional hub. This clearly differentiates the corporate from the regional needs.

Choosing which model to vote for is one of the key elements for the success of an MDM project.

Thursday, January 10, 2008

MDM - Part 3 - Kalido MDM

This is one of my favorite tools, having worked on it for quite some time now. Let me provide an unbiased opinion on Kalido based on the "key capabilities of MDM" post that I had written.

The first point on Data Governance can be omitted here, as it is more of a process oriented practice.

Does Kalido control the flow of good quality data into the MDM repository? Yes. Kalido provides association rules (1:N, M:N,optional,mandatory), data-type verification, deletion anomalies, data length verification and custom validation formula. Is this enough for an MDM tool to host clean master data independently? Probably no. But Kalido still wins in this sector. Because it covers most of the important validation checkpoints.

Kalido is truly a flexible data modeling tool. It can model time-variant hierarchies, ragged hierarchies, depth-less hierarchies,super-type/sub-type relationships and having done all this, its quite easy to change from one model to the other. This is because it has quite a generic modeling mechanism and most companies which are into heavy-duty acquisitions and mergers prefer it. Kalido completely wins here. I have rarely faced a scenario in Kalido where I wasn't able to model one. Kalido also provides you features for moving the models during the migration process.

The MDM component of Kalido isn't a master in integrating with different heterogeneous sources. As of now, it can accept only text files. It expects the ETL tool to convert the data into the CSV/XML format.

You can define 'sophisticated' work flows to move a piece of data between states. One can provide action items(like email notifications), events triggering the work-flow and the different states of transition. Editing of data, raising an issue/change request is possible with this tool. So Kalido wins again.

Through Access Control Lists, Kalido implements security. ACLs dictate which sets of Users can access the data (at an instance of entity level) and what data they can access.

Probably the one area, which I am not thoroughly convinced is the Search & UI. It has a decent hierarchy browser and a neat search feature. Though it has .Net compatibility, certain basic UI features (like changing the font of the text if the data belongs to one particular market) is cumbersome.

Kalido truly lacks in the Data enrichment area. They currently don't have pre-built vanilla models, which might be useful for certain Master data like Product, Customer.

I haven't truly tested Kalido on a distributed network. Hence cant comment on it.

Overall, Kalido is an effective MDM solution

Saturday, January 05, 2008

MDM - Part 2 - Key capabilities of an MDM framework

Last year, one of my colleagues, was deployed to a leading FMCG company's workplace to understand their global reference data and consolidate it. She started off with interviewing the data management heads from various countries and after 6 weeks of tough grind, she came up with a very good logical data model. But when she started materializing her E-R model into the tool, she started facing problems. On further investigation with many of my colleagues who have worked on MDM implementations and also with my own experience, I have collated a few key points that are essential for the smooth running of an MDM engine.

Note: Broadly they have been categorized as Must-to-have(bold red) and Nice-to-have(bold green).

  1. Data Governance and Stewardship: Identifying the right people to own the right data. This team is responsible for setting up the security access, correcting the erroneous data, defining the work-flow and acting on the notifications and submitting a report on the usage of the data.
  2. Data Quality Management: Bad data is as good as not having the data at all. Processes and frameworks constantly working on the business rules, to furnish out sanity to the master data is a must. This is one of the most complex points in the whole MDM cycle. Does the tool possess adequate data validation techniques or does it rely on the ETL tool?
  3. Flexible Data Modeling Capability: The tool should be as adaptive as the business process. A flexible data model to quickly prototype and develop is the ideal tool for such an implementation.
  4. Integration Engine Maturity: The Data Integration drivers that get shipped along with the tool play a key role in the tool evaluation. Look for a tool which has good integration capability. Some of the tools stop with a flat file feature; though this might be be enough to start developing your repository, there might be added ETL effort if your sources are completely heterogeneous in nature.
  5. Work-flow enabled Authorization Model: How does the authorization and the publishing of data happen? Is it through mails or through a sophisticated work-flow engine? Based on my acquaintances with the tools in the market, I have found that much of the MDM analyst's time in occupied in composing mails about the next action items that have to be taken on the data. This is where a tool with a 'cool' work flow feature takes its upper hand.
  6. Security & Access control: Can the users of the Indian market control Australian customers? Probably yes, maybe no. Security and access driven capability of the MDM system is a must for an organization trying to consolidate its world-wide master information.
  7. Search & UI Customization: In this search-driven world, (thanks to Google), a tool without search capabilities is a failure written all over it. The UI should be customizable and the framework should have inherent APIs to achieve the same.
  8. Data Enrichment: Some of the tools have the means to integrate with the market research data vendors to enrich their data. A good example could be enriching the customer data for D&B related fields. Though this is not a MUST feature, it certainly is a feature for tool differentiation.
  9. Service Oriented in Nature: SoA utilizes loosely coupled, reusable, and interoperable software services to support business process requirements. Though this is not very specific to the tool, it is more of a framework question - Can the MDM solution easily get positioned into the SoA architecture? For example, if the tool has capability to talk to different sources, integrate the data, present the data as services; YES it has capabilities to marry SoA.
  10. Distributed system: This probably is one of the last items to be ever evaluated. If your master data runs into Tera bytes, then this feature of the tool might be worth visiting.
These 10 points sum up the different capabilities/components of an MDM solution. There are few other points like cost and platform dependency which I would like to place it to the discretion of the organization's policies.

Thursday, January 03, 2008

MDM - Part 1 - An Introduction

In continuance with my recent post on "MDM War", I would like to take you all into this enchanting world of MDM with a brief introduction.



In my own words , MDM (Master Data Management) is the single place where any kind of reference data would be maintained globally for an organization. All transactions and business processes would lookup to the services of MDM for their operations. Some of the important tpes of master data , that an organization would be maintaining are



  • Product

  • Employee

  • Customer

  • Location

  • Supplier/Vendor

As you can see, these entities can standalone and are independent of the business processes, that an organization would participate in. MDM allows the companies to consolidate the master objects which might be residing in silos, harmonize, enrich, and federate one common view of the organization's data to the businesses seamlessly. MDM, as misunderstood , is not THE TOOL which will do this magic. It still relies on people and processes to solve the puzzle. It provides the framework to achieve it without much fuss.


MDM is one application for the organization and not one for each business unit, though some of the services might be business-unit wise. For example, HR department wouldn't be interested in the Product Data and Sales Department wouldn't be much keen in the Employee's salary.

The supply chain in the picture gives a better example of how the different businesses in an organization would like to view the master data (Click the picture for better clarity)



In a nutshell, the key capabilities of an MDM tool are



  1. Master Data Integration

  2. Master Data Consolidation

  3. Master Data Quality Validation

  4. Master Data Enrichment (optional)

  5. Work flow based Data Maintenance

  6. Master Data Publishing


Any tool, which doesn't provide these features fails to provided a complete MDM suite. And one important thing, MDM has nothing to do with data warehousing.

Wednesday, January 02, 2008

MDM War

A market research firm estimates the market size of MDM related products to peak $ 1 billion by end of 2008. Though MDM is relatively a new zone of investment for most of the organizations, the immense value behind such a venture has been pro-actively noticed. Some of the companies which have their MDM product suite are

  • SAP - SAP Netweaver MDM
  • Kalido - Kalido MDM
  • IBM - IBM MDM (WPC, WCC)
  • Microsoft- Stratature
  • Oracle - Universal Customer Master.
One common aspect that could be found in these line of products are that most of them have been acquired. SAP launched its first MDM set of products in 2002 and had to quickly withdraw because of operational issues. Then it acquired A2i in 2004 and then repackaged A2i's product as its own. IBM acquired DWL Inc, to get hold of PIM (Product Information Management) and CDI (Customer Data Integration). Microsoft acquired Stratature and Oracle (with Customer Data Hub's not-so-good success) gained inroads into Siebel's UCM, post-acquisition.

These acquisitions have also been in line with the product suite that the companies already own. Oracle's merger with Siebel clearly will put itself on top, in the CRM and Customer Data Management space, while SAP can leverage its ERP customer base to bundle the MDM cake.

In the following weeks, I shall be understanding each of the tools in detail to find out which one has the killer technology and experience to be crowned the "MDM Maharaja". Or will they be all a bunch of Sultans aiming to just take a share of the $1 billion jewel?