Enterprise Data Management - Key Definitions of industry terms
At Datasource Consulting, it is fairly common that our friend and even some colleagues will ask us about some of the Enterprise Data Management terms we use on a regular basis. It also seems that everyone is aware of how fast the Enterprise Data Management industry is moving and how quickly new terms pop up.
Below, you will find definitions to some of the Enterprise Data Management terms that we come across frequently and have a meaningful tie-in to the Business Intelligence and Data Integration space.
Agile BI/Data Warehousing
A blend of agile project management principles practiced within the BI/Data Warehousing space. The approach addresses a need to increase program agility by reducing the time intervals necessary to deliver value to the business.
As with entities, it is important to define all attributes clearly. The same rules apply. By comparing an attribute to a definition, you should be able to tell if it fits. However, you should be aware of incomplete definitions.
The date on which the ACCOUNT was opened. A further definition of what "opened" means is needed before the definition is clear and complete.
Attribute definitions generally should have the same basic structure as entity definitions, including a description, examples, and comments. The definitions should also contain, whenever possible, validation rules that specify which facts are accepted as valid values for that attribute.
An analytic database is a database technology specifically designed to support business intelligence (BI) and analytic applications, typically as part of a data warehouse or data mart solution. Types of Analytic databases include: Columnar databases, data warehouse appliances, in-memory databases, massively parallel processing (MPP) databases, and online analytical processing (OLAP) databases.
Discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics rely on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight.
A collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. The terms, Volume, Velocity, and Variety are often used to describe Big Data. Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, manage, and process within a tolerable elapsed time. Big data sizes range from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, a new platform of “big data” tools has arisen such as the Apache Hadoop Big Data Platform.
BI - Business Intelligence
The ability for an organization to take all its capabilities and convert them into knowledge. The process, architecture, technologies and tools that help companies transform their data into accurate, actionable and timely information and disseminate that information across the organization. It includes, but is not limited to, data modeling, data warehousing, data marts, metadata, master data management, data cleansing, predictive analytics, reporting, analysis, alerts, dashboards and scorecards. BI evolved from the decision support systems which began in the 1960s and developed throughout the mid-1980s. In 1989, Howard Dresner (later a Gartner Group analyst) proposed “business intelligence” as an umbrella term to describe “concepts and methods to improve business decision making by using fact-based support systems.” It was not until the late 1990s that this usage was widespread. Some use the term, Business Intelligence to describe the overall solution, while others use it to only describe the front-end reporting aspects of an overall solution. A resolution to the proper use is not clear.
BI In the Cloud
Business intelligence software or solutions architected for multitenant operation on a public or private cloud. Typically, such solutions are licensed on a subscription basis. Some of the goals or perceived benefits include lower initial investment, scalability, reduced IT footprint, and reliable recurring expense.
BPM - Business Performance Management
A set of management and analytic processes that enable the management of an organization’s performance to achieve one or more pre-selected goals. Synonyms for BPM include “corporate performance management (CPM)” and “enterprise performance management (EPM)”. BPM includes three main activities: selection of goals, identification of metrics for the goals, actions to be taken in response to metrics.
A business process is basically a set of related activities. Business processes are roughly classified by the topics of interest to the business. To extract a candidate list of high potential business processes necessitates prioritization of requirements. Examples of business processes are customers, profit, sales, organizations, and products.
When we refer to a business process, we are not simply referring to a business department. For example, consider a scenario where the sales and marketing department access the orders data. We build a single dimensional model to handle orders data rather than building separate dimensional models for the sales and marketing departments. Creating dimensional models based on departments would no doubt result in duplicate data. This duplication, or data redundancy, can result in many data quality and data consistency issues.
Business rules are an integral part of the data model. These rules take the form of relationships, role names, candidate keys, defaults, and other modeling structures, including generalization categories, referential integrity, and cardinality. Business rules are also captured in entity and attribute definitions and validation rules.
For example, a CURRENCY entity defined either as the set of all valid currencies recognized anywhere in the world, or could be defined as the subset of these which our company has decided to use in its day to day business operations. This is a subtle, but important distinction. In the latter case, there is a business rule, or policy statement, involved.
This rule manifests itself in the validation rules for "currency-code." It restricts the valid values for "currency-code" to those that are used by the business. Maintenance of the business rule becomes a task of maintaining the table of valid values for CURRENCY. To permit or prohibit trading of CURRENCYs, you simply create or delete instances in the table of valid values.
The attributes "bought-currency-code" and "sold-currency-code" are similarly restricted. Both are further restricted by a validation rule that says "bought-currency-code" and "sold-currency-code" cannot be equal. Therefore, each is dependent on the value of the other in its actual use. Validation rules can be addressed in the definitions of attributes, and can also be defined explicitly using validation rules, default values, and valid value lists.
A database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data, like most relational DBMSs. This has advantages for data warehouses, customer relationship management (CRM) systems, and library card catalogs, and other ad-hoc inquiry systems where aggregates are computed over large numbers of similar data items. Examples of databases with columnar-oriented storage capabilities include: Aster Data Systems, Greenplum, Hive Intelligence, Infobright, Paraccel, SAP HANA, Teradata, and Vertica.
Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system.
There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system. The data requirements are initially recorded as a conceptual data model which is essentially a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders. The conceptual model is then translated into a logical data model, which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step in data modeling is transforming the logical data model to a physical data model that organizes the data into tables, and accounts for access, performance and storage details. Data modeling defines not just data elements, but also their structures and the relationships between them.
Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. The use of data modeling standards is strongly recommended for all projects requiring a standard means of defining and analyzing data within an organization, e.g., using data modeling:
- to assist business analysts, programmers, testers, manual writers, IT package selectors, engineers, managers, related organizations and clients to understand and use an agreed semi-formal model the concepts of the organization and how they relate to one another
- to manage data as a resource
- for the integration of information systems
- for designing databases/data warehouses (aka data repositories)
Data modeling may be performed during various types of projects and in multiple phases of projects. Data models are progressive; there is no such thing as the final data model for a business or application. Instead a data model should be considered a living document that will change in response to a changing business. The data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time. Whitten (2004) determined two types of data modeling.
- Strategic data modeling: This is part of the creation of an information systems strategy, which defines an overall vision and architecture for information systems is defined. Information engineering is a methodology that embraces this approach.
- Data modeling during systems analysis: In systems analysis logical data models are created as part of the development of new databases.
Data modeling is also used as a technique for detailing business requirements for specific databases. It is sometimes called database modeling because a data model is eventually implemented in a database.
Multi-dimensional databases generally have hierarchies or formula-based relationships of data within each dimension. Consolidation involves computing all of these data relationships for one or more dimensions, for example, adding up all Departments to get Total Division data. While such relationships are normally summations, any type of computational relationship or formula might be defined.
Synonyms: Roll-up, Aggregate
See: Formula, Hierarchical Relationships, Children, Parents
A group of data cells arranged by the dimensions of the data. For example, a spreadsheet exemplifies a two-dimensional array with the data cells arranged in rows and columns, each being a dimension. A three-dimensional array can be visualized as a cube with each dimension forming a side of the cube, including any slice parallel with that side. Higher dimensional arrays have no physical metaphor, but they organize the data in the way users think of their enterprise. Typical enterprise dimensions are time, measures, products, geographical regions, sales channels, etc.
Synonyms: Array, Hypercube, Multi-dimensional Structure.
Dashboards and Scorecards
A simple to understand user interface that displays a graphical representation of current and historical trends of an organization’s Key Performance Indicators (KPIs). The goal is to facilitate quick and informed business decisions by briefly viewing the Dashboard or Scorecard. Dashboards tend to monitor the performance of operational processes, whereas scoreboards tend to chart the progress of tactical and strategic goals.
Models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use. A Data Architecture often defines the target state of data and the planning needed to achieve the target state.
The process of locating critical data throughout an enterprise. This includes developing an understanding of the nuances of the data such as key structures, data cleanliness, data history, source of record, data format, protocols to obtain data, etc.
A key component of data warehousing and business intelligence, data integration (DI) focuses on combining data residing in different sources and providing users with a unified view of these data. DI is often achieved by either data virtualization or extract, transform & load (ETL) technologies.
Whereas a data warehouse combines databases across an entire enterprise, data marts are usually smaller and focus on a particular subject or department. Some data marts, called dependent data marts, are subsets of larger data warehouses.
The data mart typically contains a subset of corporate data that is valuable to a specific business unit, department, or set of users. This subset consists of historical, summarized, and possibly detailed data captured from transaction processing systems (called independent data marts), or from an existing enterprise data warehouse (called dependent data marts). It is important to realize that the functional scope of the data mart's users defines the data mart, not the size of the data mart database.
Data mining (DMM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns using tools such as classification, association rule mining, clustering, etc. Data mining is a complex topic and has links with multiple core fields such as computer science and adds value to rich seminal computational techniques from statistics, information retrieval, machine learning and pattern recognition.
In software engineering, the term data model is used in two related senses. In the sense covered by this article, it is a description of the objects represented by a computer system together with their properties and relationships; these are typically "real world" objects such as products, suppliers, customers, and orders. In the second sense, covered by the article database model, it means a collection of concepts and rules used in defining data models: for example the relational model uses relations and tuples, while the network model uses records, sets, and fields.
Overview of data modeling context: Data model is based on Data, Data relationship, Data semantic and Data constraint. A data model provides the details of information to be stored, and is of primary use when the final product is the generation of computer software code for an application or the preparation of a functional specification to aid a computer software make-or-buy decision. The figure is an example of the interaction between process and data models.
Data models are often used as an aid to communication between the business people defining the requirements for a computer system and the technical people defining the design in response to those requirements. They are used to show the data needed and created by business processes.
According to Hoberman (2009), "A data model is a way finding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment.”
A data model explicitly determines the structure of data. Data models are specified in a data modeling notation, which is often graphical in form.
A data model can be sometimes referred to as a data structure, especially in the context of programming languages. Data models are often complemented by function models, especially in the context of enterprise models.
The ANSI/SPARC three level architecture. This shows that a data model can be an external model (or view), a conceptual model, or a physical model. This is not the only way to look at data models, but it is a useful way, particularly when comparing models.
A data model instance may be one of three kinds according to ANSI in 1975.
- Conceptual schema : describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model. The use of conceptual schema has evolved to become a powerful communication tool with business users. Often called a subject area model (SAM) or high-level data model (HDM), this model is used to communicate core data concepts, rules, and definitions to a business user as part of an overall application development or enterprise initiative. The number of objects should be very small and focused on key concepts. Try to limit this model to one page, although for extremely large organizations or complex projects, the model might span two or more pages.
- Logical schema : describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things.
- Physical schema : describes the physical means by which data are stored. This is concerned with partitions, CPUs, table spaces, and the like.
The significance of this approach, according to ANSI, is that it allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model. In each case, of course, the structures must remain consistent with the other model. The table/column structure may be different from a direct translation of the entity classes and attributes, but it must ultimately carry out the objectives of the conceptual entity class structure. Early phases of many software development projects emphasize the design of a conceptual data model. Such a design can be detailed into a logical data model. In later stages, this model may be translated into physical data model. However, it is also possible to implement a conceptual model directly.
Data Modeling (dimensional)
A set of techniques and concepts used in data warehouse design. It is considered to be different from entity-relationship modeling (ER).
The accuracy of data stored, reported and analyzed by an organization in relation to the exactness, correctness and completeness of the data:
• Exactness refers to how well the data that is stored matches to the original source of that data (i.e., a customer reports its name as ABC, Inc. and it is stored as ACB)
• Correctness refers to how close the data that is stored or reported matches to the business meaning of that data. (i.e., a customer master field stores the customer type of a customer as P, and a report classifies it as “pertinent” instead of “public”)
• Completeness is the degree to which the necessary parts or elements of the data exist in the source system(s) (i.e., customer phone number – 800-921-103x)
Process of abstracting disparate systems (databases, applications, file repositories, websites, data services vendors, etc.) through a single data access layer (which may be any of several data access mechanisms). Companies like Composite, Informatica, Oracle, and IBM have solutions in the data virtualization space.
The study of the visual representation of data, meaning “information that has been abstracted in some schematic form, including attributes or variables for the units of information”. There are a wide variety of technologies employed to achieve data visualization such as: Tableau, SAS, R, Google Fusion Tables, Many Eyes, Color Brewer, etc.
Data Warehouse Appliance
An integrated set of servers, storage, operating system(s), DBMS and software specifically pre-installed and pre-optimized for data warehousing (DW). Most DW appliance vendors use massively parallel processing (MPP) architectures to provide high query performance and platform scalability. Technology vendors include Netezza, DATAllegro, Greenplum, HP Neoview, and XtremeData.
Data warehouses support business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining. Although data warehouses are built on relational database technology, the design of a data warehouse database differs substantially from the design of an online transaction processing system (OLTP) database.
Contrast with Datamart.
A dimension is sometimes referred to as an axis for business analysis. Time, Location and Product are the classic dimensions.
A dimension is a structural attribute of a cube that is a list of members, all of which are of a similar type in the user's perception of the data. For example, all months, quarters, years, etc., make up a time dimension; likewise all cities, regions, countries, etc., make up a geography dimension.
A dimension table is one of the set of companion tables to a fact table and normally contains attributes or (fields) used to constrain and group data when performing data warehousing queries.
Dimensions correspond to the "branches" of a star schema.
The DW components differ not only by content of data but also by the way they store the data and by whom it can be accessed.
Staging area: For handling data extracted from source systems. There can be data transformations at this point and/or as the data is loaded into the data warehouse. The structure of the staging area depends on the approach and tools used for the extract, transform, and load (ETL) processes. The data model design affects not only performance, but also scalability and ability to process new data without recreating the entire model.
Data warehouse: This is the area, also called the system of record (SOR) that contains the history data in 3NF and is typically not accessed for query and analysis. Use it for populating the summary area, analytical areas, and the dependent data marts.
Summary area: This area contains aggregations. Structures are usually derived from the data warehouse where one or more attributes are at the higher grain (less detail) than in the data warehouse. These are constructed for high performance data analysis where low level detail is not required.
Analytical area: Contains multidimensional (MD) structures, such as the star schema, snowflakes, or multi-star schemas, constructed for high performance data analysis.
E/R modeling is a design technique in which we store the data in highly normalized form inside a relational database.
The E/R model basically focuses on three things, entities, attributes, and relationships. An entity is any category of an object in which the business is interested. Each entity has a corresponding business definition, which is used to define the boundaries of the entity - allowing you to decide whether a particular object belongs to that category or entity.
A disadvantage of an E/R model is that it is not as efficient when performing very large queries involving multiple tables. In other words, an E/R model is good at INSERT, UPDATE, or DELETE processing, but not as good for SELECT processing.
EDW (Enterprise Data Warehouse)
An enterprise data warehouse is one that will support all, or a large part, of the business requirement for a more fully integrated data warehousing environment that has a high degree of data access and usage across departments or lines of business. That is, the data warehouse is designed and constructed based on the needs of the business as a whole. Consider it a common repository for decision-support data that is available across the entire organization, or a large subset of that data.
We use the term Enterprise here to reflect the scope of data access and usage, not the physical structure.
The enterprise data warehouse may also be called a Hub and Spoke data warehouse implementation if the control is logically centralized even if the data is spread out and physically distributed
Entity data model (EDM) refers to a set of concepts that describe data structure, regardless of its stored form. This model uses three key concepts to describe data structure: entity type, association type and property. EDM supports a set of primitive data types that define properties in a conceptual model.
ETL – Extract Transform & Load
A process in database usage and especially in data warehousing that involves: Extracting data from outside sources (E), Transforming it to fit operational needs (which can include quality levels) (T), Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse) (L). It often involves converting the source data into a form required for the target, which may involve filtering, sorting, joining, translating, deriving, transposing, summarizing and/or denormalizing. With regard to business intelligence (BI), it is generally used to populate an operational data store (ODS), data warehouse (DW) and/or a data mart.
A fact table consists of the measurements, metrics or facts of a business process.
The Fact Table is located at the center of a star schema, surrounded by dimension tables.
A formula is a virtual hypercube calculated on the fly from other measures. It is generally not stored in the database. For example the formula Average Price is dimensioned according to Time and Style and has a decimal type.
From the user's point of view there is no difference between a stored measure and a formula. Both are defined by dimensions and by type.
An open-source software framework that supports data-intensive distributed applications. It enables applications to work with thousands of computational independent computers and petabytes of data. Hadoop was derived from Google’s MapReduce and Google File System (GFS) papers. Many organizations are using Hadoop to run large distributed computations such as Facebook, Yahoo, Amazon.com, LinkedIn, Twitter, eBay, Apple, Netflix, etc.
An alternative to in-database or disk-based business intelligence, in-memory analytics increases speed, performance, and reliability when querying data. Queries and data reside in the server’s random access memory (RAM). The technology has grown along with the adoption of 64-bit architectures, which can support larger amounts of RAM, and consequently larger data sets. Technology vendors include Tableau, SAP HANA, IBM TM1, Oracle TimesTen, Spotfire, QlikView,
The application of Lean manufacturing principles to the BI/DI/Data Warehouse space. A set of principles that can be used to become more efficient and effective while still focused on delivering value within a BI organization. The principles focus on generating additional value by accomplishing more with existing resources and eliminating waste. Waste in BI programs is defined as any activity, task, process, mapping, object, code, report or data that absorbs resources but creates no incremental value to the customer.
The positions of a dimension organized according to a series of cascading one to many relationships. This way of organizing data is comparable to a logical tree, where each member has only one parent but a variable number of children.
For example the positions of the Time dimension might be months, but also days, periods or years.
In a hierarchy, positions are classified into levels. All the positions for a level correspond to a unique classification. For example, in a "Time" dimension, level one stands for days, level two for months and level three for years.
Logical Data Model
In computing and more specifically systems engineering, a logical data model (LDM) is a representation of an organization's data, organized in terms of entities and relationships and is independent of any particular data management technology. It is a type of data model.
Logical data models represent the abstract structure of a domain of information. They are often diagrammatic in nature and are most typically used in business processes that seek to capture things of importance to an organization and how they relate to one another. Once validated and approved, the logical data model can become the basis of a physical data model and inform the design of a database.
Logical data models should be based on the structures identified in a preceding conceptual data model, since this describes the semantics of the information context, which the logical model should also reflect. Even so, since the logical data model anticipates implementation on a specific computing system, the content of the logical data model is adjusted to achieve certain efficiencies.
The term 'Logical Data Model' is sometimes used as a synonym of 'Domain Model' or as an alternative to the domain model. While the two concepts are closely related, and have overlapping goals, a domain model is more focused on capturing the concepts in the problem domain rather than the structure of the data associated with that domain.
Measure is a member with a numeric value. It is a business indicator or KPI (Key Performance Indicator) which is dimensioned by the axis of analysis. For example a measure Quantity could have the dimensions Time and Brand.
A dimension member is a discrete name or identifier used to identify a data item's position and description within a dimension. For example, January 1989 or 1Qtr93 are typical examples of members of a Time dimension. Wholesale, Retail, etc., are typical examples of members of a Distribution Channel dimension.
A set of technologies and processes that focuses on the distribution of business data to mobile devices such as smartphones and tablet computers. Recent prevalence of Mobile BI has been encouraged by a change from the ‘wired world’ to a wireless world with the advantage of smartphones which has led to a new era of mobile computing, especially in the field of BI.
MDM – Master Data Management
It is the practice of defining and maintaining consistent definitions of business entities, such as customers and products, then sharing them via integration techniques across multiple IT systems within an enterprise and sometimes externally. (Similar term: Customer data integration or CDI)
MPP – Massively Parallel Processing
The use of a large number of processors (or separate computers) to perform a set of coordinated computations in parallel.
The dimensional modeling approach provides a way to improve query performance for summary reports without affecting data integrity. A dimensional database generally requires much more space than its relational counterpart.
The technique uses in particular the modeling of a particular data construct known as a star schema or it’s derived from known as a snowflake schema.
A multi-star model is a dimensional model that consists of multiple fact tables, joined together through dimensions.
Navigation is a term used to describe the processes employed by users to explore a cube interactively by drilling, rotating and screening, usually using a graphical OLAP client connected to an OLAP server.
ODS (Operational Data Store)
An operational data store (ODS) is a type of database often used as an interim area for a data warehouse (ODSs are commonly used to populate data warehouses and data marts).
It can be used also as a set of integrated, scrubbed data without history or summarization provided for tactical decision support.
Unlike a data warehouse, which contains static data, the contents of the ODS are updated through the course of business operations.
An ODS is designed to quickly perform relatively simple queries on small amounts of data (such as finding the status of a customer order), rather than the complex queries on large amounts of data typical of the data warehouse.
An ODS is similar to your short term memory in that it stores only very recent information; in comparison, the data warehouse is more like long term memory in that it stores relatively permanent information.
Online analytical processing (OLAP) is a technology designed to provide superior performance for ad hoc business intelligence queries. OLAP is designed to operate efficiently with data organized in accordance with the common dimensional model used in data warehouses.
OLTP systems are designed to meet the day-to-day operational needs of the business, and the database performance is tuned for those operational needs. Consequently, the database can retrieve a small number of records quickly, but it can be slow if you need to retrieve a large number of records and summarize data on the fly.
OLTP systems are almost exclusively associated with E/R data modeling.
A page dimension is generally used to describe a dimension which is not one of the two dimensions of the page being displayed, but for which a member has been selected to define the specific page requested for display. All page dimensions must have a specific member chosen in order to define the appropriate page for display.
A variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events. Notable predictive analytic vendors include: SAS, Information Builders, SPSS, Spotfire, and Pervasive
Synthesizes big data, mathematical sciences, business rules, and machine learning to make predictions and then suggests decision options to take advantage of the predictions. Prescriptive analytics is the third phase of business analytics (BA) which includes descriptive, predictive and prescriptive analytics.
Physical Data Model
A physical data model (or database design) is a representation of a data design which takes into account the facilities and constraints of a given database management system. In the lifecycle of a project it typically derives from a logical data model, though it may be reverse-engineered from a given database implementation. A complete physical data model will include all the database artifacts required to create relationships between tables or to achieve performance goals, such as indexes, constraint definitions, linking tables, partitioned tables or clusters. Analysts can usually use a physical data model to calculate storage estimates; it may include specific storage allocation details for a given database system.
As of 2012[update] seven main databases dominate the commercial marketplace: Informix, Oracle, Postgres, SQL Server, Sybase, DB2 and MySQL. Other RDBMS systems tend either to be legacy databases or used within academia such as universities or further education colleges. Physical data models for each implementation would differ significantly, not least due to underlying operating-system requirements that may sit underneath them. For example: SQL Server runs only on Microsoft Windows operating-systems, while Oracle and MySQL can run on Solaris, Linux and other UNIX-based operating-systems as well as on Windows. This means that the disk requirements, security requirements and many other aspects of a physical data model will be influenced by the RDBMS that a database administrator (or an organization) chooses to use.
A value of a dimension. Also called a dimension member.
Restricting the view of database objects to a specified subset. Further operations, such as update or retrieve, will affect only the cells in the specified subset. For example, scoping allows users to retrieve or update only the sales data values for the first quarter in the east region; if that is the only data they wish to receive.
A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset.
For example, if the member Actuals is selected from the Scenario dimension, then the sub-cube of all the remaining dimensions is the slice that is specified.
The data omitted from this slice would be any data associated with the non-selected members of the Scenario dimension, for example Budget, Variance, Forecast, etc. From an end user perspective, the term slice most often refers to a two- dimensional page selected from the cube.
In a star schema each dimension is typically stored in one table; the snowflake design principle expands a dimension and creates tables for each level of a dimensional hierarchy.
For example, a Region dimension may contain the Levels Street, City, State and Country. In a star schema, all these attributes would be stored in one table, in a snowflake schema one would expand the schema and a designer might add city and state secondary tables.
The database field that data is extracted from to be populated into a data warehouse or data mart system.
A data warehouse is subject-oriented. It is oriented to specific selected subject areas in the organization, such as customer and product.
Star (Join) Schema
Arrangement of data in a relational database. In the middle is the fact table, whose columns constitute the multidimensional measures. The branches of the star, which radiate from the fact table, correspond to the dimensions. The conceptual data model represents this star schema.
The staging area is the place where the extracted and transformed data is placed in preparation for being loaded into the data warehouse.
It contains a collection of data - extracted from OLTP systems - primarily in 3NF, and represented by an E/R model. However, the staging area may also contain denormalized models.
In case of independent data warehouse architecture, there are separate staging areas for each data mart and therefore no sharing of data between these disparate staging areas.
A field in a data warehouse or data mart system that is to be populated with data from a source system.
Text analytics / Text Mining
In short, the process of deriving high-quality information from text. Typically employed as a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence.
A validation rule identifies a set of values that an attribute is allowed to take; it constrains or restricts the domain of values that are acceptable. These values have meanings in both an abstract and a business sense. For example, "person-name," if it is defined as the preferred form of address chosen by the PERSON, is constrained to the set of all character strings. You can define any validation rules or valid values for an attribute as a part of the attribute definition. You can assign these validation rules to an attribute using a domain. Supported domains include text, number, date time, and blob.
Definitions of attributes, such as codes, identifiers, or amounts, often do not lend themselves to good business examples. So, including a description of the attribute's validation rules or valid values is usually a good idea. When defining a validation rule, it is good practice to go beyond listing the values that an attribute can take. Suppose you define the attribute "customer-status" as follows:
Customer-status: A code that describes the relationship between the CUSTOMER and our business. Valid values: A, P, F, N.
Conclusion of Enterprise Data Management Terms:
While this is a strong representation and list of Enterprise Data Management terms, we all recognize it is not a 100 percent comprehensive list of Enterprise Data Management terms. As stated earlier, with the speed the industry is moving and the rate new terms and products are being developed, it would be virtually impossible to have all of the enterprise data management terms listed on just one page.
Are you ready to discuss your project?
Let's chat about it, we look forward to helping and becoming your data partner.