The Data Modelling Trilogy: Data Modelling Agility or Anarchism?

As with most ‘new kids on the block’, Agile doesn’t provide the silver bullet solution to all one’s design and development ills. In essence it demands one still does the same sort of things albeit in a different way… oh, so that means we still have to do data modelling does it? Whether leaping off a Waterfall or navigating the eddies of Agile, one is still going to have to spirit (yes that’s ‘spirit’ not ‘sprint’) a data design out of somewhere.

It is worthy of note that the biggest data related projects of all time, yes that’s data warehouse projects, were Agile before the word was invented. The word "sprint" has replaced "phase", and team members are standing rather than sitting, but let’s not kid ourselves, nothing has really changed. If a data warehouse project can be data-Agile, any project can.

Data Warehouse projects tend to treat data as rather important. Okay so the BI-evangelists still worship the graphs that the latest whizzo tool can deliver, but they still can’t evade the radioactivity being picked up on their gigo-counter (deliberately misspelt) which tells them that garbage-in means garbage-out. Even the best BI tools won’t solve fundamental data issues (although they will deliver the incorrect information in a far more interesting and insightful way!). Meanwhile, over in the non-BI world, a world still susceptible to fundamental data issues, data loses nothing of its value just because a project has gone Agile.

If one accepts the premise that data is just as important as it’s ever been, then someone had better start working out how to ‘do’ data in an Agile project (much more detail of this in my next post). ‘Doing data’ really boils down to sorting out what data is required, how it relates to other data, what it means and how it’s going to be stored. All of this tends to coalesce around a data model (there are ways of answering these questions without a data model but models tend to provide the necessary rigour).

As highlighted in my previous post, there are so many types of data model it is difficult to get the collaboration that underpins Agile because no one seems to be talking the same language. So let’s define a language – a challenge because, firstly, the murky depths of data modelling can sound like Klingon to many and, secondly, blogs are meant to be brief.

The deepest of the murky depths is lowest level; the DDL (data definition language). At this level one needs either (a) a Database Administrator who understand the DBMS (database management system) and all its alchemy in great detail or (b) a package expert. It’s all about implementing a data model into the DBMS – hence the name Implementation Data Model. It’s all very specific to the DBMS or Package being used.

The above represents about the only level there is universal agreement about (even if the name might vary, the level still exists)!  Most frameworks recognise the need to start (as in the highest level) with a business level view – a level that gets the appellation of ‘Business’, ‘Domain’, ‘Enterprise’, ‘Corporate’ or ‘Conceptual’. Of course these terms are a mixture of scope (‘Domain’, ‘Enterprise’, ‘Corporate’) and level (‘Business’, ‘Conceptual’). Whatever the names, the key benefit of this highest level model is to grasp the nature of the data within the scope of the project (‘Conceptual Data Model’) and how it relates to the rest of the business (‘Corporate Data Model’).

Squeezed between the highest and lowest level are ‘Logical’ and ‘Physical’ Data Models, the former adopting a purest approach that tries to expose the true meaning of the data, the latter starting to worry about what the data will be used for (without caring one jot what DBMS or package it might end up in). Mind numbing terms like Normalization and Abstraction are used to articulate the differences between these levels but the reader can relax since this blog is getting too lengthy.

Leave a comment