In the previous article, The Challenge with Agile, Part 1, the point was made that many Agile initiatives perceive data modeling and architecture compliance as time-consuming activities incompatible with agile development. The jeopardy of disregarding modeling/architecture is a portfolio of dis-integrated point solutions that proliferates redundant, semantically inconsistent data. Moreover, such projects are burdened with redundant data extraction and transformation.
Modeling and architecture discipline actually accelerates agile development as will be illustrated in this Part 2, Model-Driven Agile.
Any BI program, Agile or otherwise, should begin with an inventory of the Systems-of-Record (SOR) that are pertinent to the organization’s scope of integration. An appropriate technical subject matter expert and business subject matter expert should be secured for each SOR.
How Agile Initiatives Benefit from the SOR Inventory: At the beginning of an agile project, when the vision and business value is being defined, this SOR inventory will ensure that the required data are available and data sources selected for the project are, indeed, the most reliable and not a ‘downstream’ pseudo-source laden with latencies and artifacts.
Baseline SOR Physical Data Models
For each SOR database, a baseline physical data model should be reverse engineered using a standard data modeling tool. Operational databases may have hundreds, even thousands, of tables. The baseline model may initially appear as an incomprehensible ‘hairball.’ However, with the help of the technical and business SMEs, a sub model can be created including only the source-of-record tables. Excluding non-SOR tables may eliminate a significant percentage (e.g., 30-70%) of tables. Further segregation of SOR tables into functional subject areas can substantially simplify the usability of the baseline models.
How Agile Initiatives Benefit from the SOR Physical Data Models: As the agile team focuses on establishing the release roadmap, product backlog, and sprint backlog, knowing which source systems and which SOR tables are relevant is a major time saver. As the project metrics are defined, modeling tools allow one to select the metric-pertinent transaction (fact) tables and easily determine the related tables as candidate dimensions. Creating functional (subject area) sub models is simple and allows the agile team to triage the tables that are (1) critical, (2) fairly important, and (3) nice to have.
The first layer of the shared data architecture is an Acquisition Layer (sometimes referred to as “Stage”). Acquisition tables’ structure and contents closely match that of the SOR tables (i.e. little or no data transformation). Periodic extractions capture changes (inserts, updates, and deletes) inserted as new Acquisition rows to preserve historical changes. Since there is no data transformation from source-of-record to acquisition, they require minimal design and relatively small development effort (cost) per table. Contrary to some Agile approaches (e.g., Scrum), it is sometimes beneficial to focus a sprint or two up-front to pull a breadth of data into Acquisition. The priority of acquired SOR databases and SOR tables should be driven by the pipeline of agile initiative needs.
How Agile Initiatives Benefit from the Data Acquisition Layer: The first sprint or two of an agile project should focus on coverage and comprehensiveness of required source data. User testing should focus on reconciling row counts and checksums to ensure all required source data have been acquired. Bringing in the breadth of data and focusing on a specific vertical from end-to-end will prevent issues from arising later.
Leveraging shared Acquisition tables removes the burden of data extraction and change capture from individual agile sprints/releases. Conversely, by extracting whole tables without transformation, multiple agile sprints/releases can leverage the work product without reconciling diverse requirements.
Whether the Acquisition layer is built as a shared service for agile initiatives or individual initiatives that contribute to a shared Acquisition layer, compliance with an architecture standard of comprehensive, non-transformed acquisition tables as a first layer reduces the individual and collective effort required for projects to acquire their data.
Typically, Acquisition tables are fairly normalized and granular following the structure of their SOR tables. BI projects usually require denormalized (dimensional) and aggregated data. This can be rapidly rendered with relational views (and materialized views) as well as flexible, in-memory tools, and even spreadsheet pivot features. While these tools may not be appropriate as an end solution, they are ideal for early sprint prototypes without the burden of complex ETL. Providing business users multi-dimensional aggregate views of data in the early releases of an agile initiative accelerates the identification of data quality and data integration issues.
Subsequent layers of the shared architecture are dedicated to standardizing data content and unifying data structures across various source systems. This is tantamount to transforming source-centric data to comply with a unified data model. This unified data model is derived from top-down requirements gathered from user interviews and group workshops, as well as bottom up profiling of source data. The Data Integration Layers represent enrichment of the Acquisition data that is built out incrementally and opportunistically.
How Agile Initiatives Benefit from Data Integration Layers: As agile teams discover data quality and data integrity issues, they are faced with unilaterally resolving those issues with project-centric ETL or collaborating with data modeling/architecture initiatives that provide integrated data as a shared service.
As with the shared Acquisition Layer, agile teams that collaborate in a model-driven shared architecture benefit by leveraging enriched data from previous sprints/releases.
The earlier suggestion that the first agile sprint be based on views of Acquisition tables presumed an initiative early in the BI program before a substantial portion of integrated data was in place. As the BI program matures, there is increasing likelihood that data required for an agile initiative has been modeled and architected by prior projects. Sprints of late cycle initiatives can focus on data delivery and opportunity-specific features rather than data sourcing and transformation.
Data modeling and architecture are not incompatible with agile development. On the contrary, data modeling techniques and a shared modular architecture can accelerate agile development by reducing the number and complexity of required sprints. Leveraging shared data and ETL services will dramatically shorten the delivery time of new BI capabilities and ensure that those solutions are based on enterprise-sanctioned data.
Reverse engineering baseline physical data models and segregating SOR and non-SOR tables for every SOR database may be a considerable effort. It is not suggested that this initiative be done comprehensively before any agile initiatives are initiated. Rather, it is suggested that this be done opportunistically with priorities based on the pipeline of agile projects. Projects later in the pipeline will benefit from the leverage of prior SOR modeling efforts accelerating their development.
Are you ready to discuss your project?
Let's chat about it, we look forward to helping and becoming your data partner.
About the author:
Dr. Robert Conway is the founder and principal consultant of Information Engineering Associates. He has built successful DW/BI programs for many organizations in diverse industries. He offers public and onsite workshops on RAPID® Architecture/Methodology and Data Modeling skills. www.InfoEngAssc.com