Tuesday, June 21, 2011

As an architect...

I have been playing the role of a DW Architect for a large financial services organization over the past 6 months. When I reflect back on my duties and responsibilities, one aspect came out very strong that gets missed out in the job description.

"Leaves his Unique Selling Proposition trace in the project"

There are often a lot of other crap that the organizations look for like..Coordination between the business and IT....Owner of data models....Provides strategic direction to the IT team...etc...

Do you think Emperor Shah Jahan would have given a big job description to his architect, Ustad Ahmad Lahauri, when he set out to build the Taj Mahal? He described the Taj Mahal as

Should guilty seek asylum here,
Like one pardoned, he becomes free from sin.
Should a sinner make his way to this mansion,
All his past sins are to be washed away.
The sight of this mansion creates sorrowing sighs;
And the sun and the moon shed tears from their eyes.
In this world this edifice has been made;
To display thereby the creator's glory.
If you read the last line, it stresses the need to display the creator's glory. An architect should leave behind his glory after he is long gone from the project. That should be the true job description. It should be the one thing that the people would still talk about after he has resigned. The rest of the responsibilities are just enablers for the ultimate glory.

Personal Analytics - Excel & Qlikview

Everyday, we take decisions, either at work or at home or during transit. How do we take our decisions? When we take decisions at work, we use "saved" data to help us out. But at home, we rely on our primary memory to help us out. We also rely on reviews on the web to help us take our decision. We rely on "Facebook" likes. But somehow, I am still not convinced that this will help us to make a decision that will be the closest to the best. So I decided to maintain my own personal analytics database. My first "personal" problem was finding the right health insurance for my dad. So I decided to make my own BI environment to help me take this decision.
  1. I used Excel for feeding & maintaining data. I profiled attributes of all kinds for this.
  2. I used Qlikview for BI analytics.
It is turning out to be a great exercise for me to find which insurance is the right one for my dad.

Tuesday, June 14, 2011

Metadata Management - Scratching your own itch

Meta-data management is always a complex problem, because its about capturing "data about data". I don't think DW as an industry leader is anywhere near on making sure organizations are cleansed of bad data. We still have bad data and if it is about capturing finer details about this "bad data", its even more bad.

My current customer has this huge problem of not knowing how the data elements get mapped between different hops in the data warehouse (Staging, DW, Datamarts, Business Objects Universe and ultimately the "Requirements"). There were lots of discussions carried on what kind of tools to procure and the profiles of meta-data architect to recruit. We were getting nowhere.

We decided to scratch our own itch. We started this exercise 3 months back in our past time to start documenting the data lineage in a simple "denormalized" spreadsheet. It took us time. Layer by layer, source by source, we did it. After 3 months, we had a full blown spreadsheet which captured the complete data lineage and the business rules implemented in the DW layers.

When we reflected back, we realized a couple of "eye-openers"

1. Don't invest upfront in meta-data. Ask your existing team to start documenting in the easiest and most flexible manner possible. Probably a spreadsheet. Take it one step at a time.

2. And start small. Get the data first and then think about tools. Check if the data helps you to make your job any easier or the business user's job any productive. If not, the meta-data program is not for you.

Tuesday, June 07, 2011

Design Documentation - Batons in a relay race

Documentation is always like a "baton" in a relay race. You care for it, only when you pass it over to somebody else, till that time you don't even bother of the existence of such an entity. If you have to make a baton interesting to a running athlete, you need to stuff it with something that is interesting to him; probably some sort of energy drink, which he will consume it whenever he becomes exhausted. The beauty of the baton design is that it has to be light enough so that its not an extra burden to the athlete and designed in such a manner so that it enables for a quicker passover between the athletes.

I had always stayed away from "Documentation", because I never found a usage for it except during audit or knowledge transition. And even in audit, nobody cares for the quality of the document; the auditors just check for the existence of the document.

So to make some sense of this complex phase in an SDLC, I decided to derive the "Baton" analogy to documentation, because just like a baton, without any documentation, I can never say that somebody completed a relay race. To make documentation competitive, I decided to do the following: Make it -

1. Light - so that its light and easy to carry.
2. Interesting - so that the consumer opens it and uses it frequently (and)
3. Do its job - so that the transition/passover is easy

Light
Why is iPad 2 thinner and lighter than iPad? Simple. They moved from 2 thicker batteries to 3 thinner batteries. They made the whole shell using carbon fiber. Not that I understand the material composition of carbon fibers to comment on it, but if you reduce the content, it becomes lighter. So essentially, I will try to make my document as light as possible. "Fewer pages" will be my KPI. So I decided to budget a page count for every kind of document and focus to convey whatever I wanted to convey within that page count. And I have a motherhood rule "Don't cross 15 pages of content".

Interesting
For anything to be interesting, it should be useful. Information in the document should be useful and should in turn help the consumer/creator of the document. So to make it interesting, I decided to add 3 simple sub-sections for every section that I created - What, Why & How ? So, if I have to design Change Data Capture in my ETL (some data warehousing terminologies) process, then I add What is CDC; Why should I use CDC?; And how should I enable CDC?

Do its job
It should do its job. It should enable transition. So, if the successor wants to understand what the predecessor did, the document should be able to convey. It shouldn't be lost.

So if a design document addresses the above 3 philosophies, it has met its purpose of existence.