ds-blog-icon.png

THE DATA DOWNLOAD

As leaders in Enterprise Data Management and Business Intelligence, we pride ourselves on being able to deliver successfully on our projects as well as provide high-quality content to our readers.

No One Likes a Monday Morning Data Architect: 7 Steps to Avoid the Data Warehouse Hangover

Posted by DeVon Doman on Nov 21, 2016 9:18:36 AM

Many consultants have come across unexpected data conditions while deep into the development or testing phase of a data integration effort. Unfortunately, the need for a major shift in architecture or approach is sometimes identified soon after a system is implemented into production. Upon discovering this type of situation, you likely experience a sinking feeling, knowing the potential negative consequences. You imagine blown budgets, missed deadlines, loss of credibility with the business, and unpredictable results. You begin analyzing what caused the problem, how it could have been avoided, and who is at fault

Planning Ahead

At Datasource Consulting we live by the philosophy, “The best way to survive an accident is to avoid it in the first place.” In practice, this means rather than dealing with unexpected roadblocks as they arise, it is better to plan for and address areas of potential problems very early in the development process. This approach helps avoid common issues before they happen and establishes an understanding and agreement on actions to take should a problem occur. After all, no one wants to inform the project sponsor that the new system, recently purchased and implemented, needs an overhaul.

BI_meeting_social_1200x628.jpg

Team Discussions

Datasource Consulting recommends a discussion with technical and business teams very early in the development effort, typically within the first two to three weeks. The discussion should focus on areas proven to be “problem producers” for data integration efforts There are several additional topics to review and understand to avoid future surprises. These are not trivial and can take 2-3 hours to complete. However, the investment in time and effort up front can prevent hours of effort from both technical and business teams later in the development cycle.

The following are focus areas to discuss at the start of the development effort.

  • Data Formats: Consistent data formatting avoids problems with table joins, arithmetic, and data presentation issues.
    • How should leading or trailing zeroes and spaces be handled?
    • What are the rules for numeric lengths and decimal places?
    • Are there any data masking or encryption requirements
  • Default Values: Do not leave this decision up to developers, as they will likely repeat what they have previously done, leading to inconsistency and unpredictable results.
    • What happens if an element contains a null value or all spaces?
    • How are codes or hierarchies to be converted or used?
  • Standardization/Conformance: Standardizing data values on time zone, units of measure, and currency avoids an incorrect result when adding numbers. Non-standard product, customer, or vendor names, etc. produces results requiring users to perform arithmetic on their own in order to view consolidated, clean, reliable summaries.
    • Do names, addresses, customers, vendors, products, etc. need to be standardized, merged or augmented?
    • Should dates and times be converted to a specific time zone?
    • Should monetary values be stored as a single currency?
    • Should numeric values be standardized to a specific unit of measure?
  • Volumetrics: Understanding data volumes and volatility influences decisions associated with data architecture, data processing, and overall system performance.
    • How much data will be stored or transferred?
    • How often is data to be processed?
    • What are the change volatility, growth rate, retention and archival rules, and compression algorithms?
  • Configuration Management: Understanding the computing environment, migration paths and configuration management expectations avoids the chaos and increases the predictability and quality of data, code, and data structure migration.
    • What are the required computing environments?
    • What are the procedures for moving data or code from one environment to another?
    • What are the required approvals for migrating code or data from one environment to another?
    • How much data will be available in each computing environment?
  • User Base and Security: Understanding who will be accessing the data impacts the architecture of data, analytics, usage patterns, and design.
    • What is the process to grant users access to each environment?
    • How many users will access the data?
    • Are there any restrictions for data access?
    • What types of users or user groups will be accessing the data?
    • What are the regulatory requirements that must be met?
  • Miscellaneous Common Considerations
    • Are there any existing data standards that must be followed?
    • What are the documentation standards?
    • Is there any existing documentation?
    • What are the known data issues and “pain points”?

Reaching a Consensus

Your team should reach and document a mutual decision for most discussion points in the focus areas. Completing this step early helps solidify an architectural foundation and avoid most potential “uh-oh” moments later in the development cycle.

It’s also important to reach a common understanding of general approaches and recommendations with associated considerations. The goal is not to come to a final decision, but to ensure both technical and business-focused employees have a cohesive understanding of the project goals. Below are examples of discussion points that can enable more efficient and rational decision-making should situations arise regarding the discussed approaches later in the development cycle.

  • The need for and value of data profiling
  • The value of standardized codes and hierarchies
  • Recommended approach for handling data warehousing concepts such as Type 1 and Type 2 dimensions, “stubbed” dimensional data, and not found/not applicable dimension records
  • Specific approaches to take given different possible situations such as failing processes, no data, and unexpected data volumes

While eliminating all development cycle surprises is not a practical goal, especially in the world of data, Datasource Consulting has found that investing time and effort early in the development cycle to determine approaches and recommendations saves time, effort, embarrassment, and many “uh-oh” moments throughout the solution development process.

Want to have a PDF version of this blog to print or save? Click to download below.

Download Blog  

Topics: Data Warehousing, Blog, Data Architecture

Written by DeVon Doman