Demystifying the Basic Architecture Framework for Analytics
This article will simplify the architectural building blocks that unite data for decision making.
Understanding the technological architecture necessary to enable analytics is a daunting process for mid-sized companies. The marketing noise around “Business Intelligence” and “Big Data” has created confusion for managers. Having attended several conferences in the past few weeks, I found myself drawing diagrams to explain the structured and unstructured data architecture to frustrated executives. A company’s data architecture describes how data is collected, stored, transformed, distributed, and consumed. The purpose of this article is to share a simplified version of this architecture and explain how the components work together to support managerial decisions.
There are three (3) essential components that contribute to simplifying data architecture:
- Descriptive/Diagnostic: What happened and why did it happen?
- Predictive: What could happen?
- Prescriptive: What should we do?
This article will focus on helping non-technical managers understand the technology more than highlighting the business benefits of each pillar. Having a vantage point of working with both mid-sized and enterprise companies, I will also outline how differently the respective group approaches analytics.
Descriptive/Diagnostic:
Measuring and iterating from data is the baseline for analytics and necessary for streamlining essential business functions; exploring why events happened is even more critical to improving future outcomes.
In order to gain better insight into “the what” and “the why”, companies implement data warehouses, which consolidate data from operational applications to a centralized repository or “warehouse”. Larger companies also have data marts (subsets of the data warehouse oriented to specific business departments) for subject-oriented structures and security.
Moving data from the operational systems into a data warehouse is a process called Extract Transform Load (ETL). While ETL has been traditionally handled by IT and integration specialists with on premise applications, there is an important and monumental shift as applications migrate to the cloud. Vendors are creating more integration tools for non-technical users to move and transform the data. In fact, in the slew of recent initial public offerings (IPOs) this year, many of the companies like MuleSoft are specifically focused on connecting applications and simplifying the integration process. As indicated by the growth and successful IPOs of such vendors, integrating data sources is a strong pain point for many companies.
While the data warehouse serves a multitude of essential functions, maintaining the history of data sources and how they interact with each other is the most critical component to answering the “what” and “why” of a company. Because warehousing provides a comprehensive look into the “single version of the truth”, it naturally becomes the most reliable record for historical data. After having a “single-version-of-the-truth”, companies can use their favorite reporting, budgeting and visualization products.
**Evaluating these tools for usability and functionality will be explored in a future article. Remember that these front-end tools should only be used after the data is cleansed and secured.***
The idea that “the warehouse is dead” is finally being put to rest as more and more companies realize the necessity of consolidating and securing historical data. However, the inexpensive storage capability of technologies such as Hadoop have transformed the role of the traditional data warehouse into a database that holds dimensional models for measurement vs. storing all data for later use. After a concrete measurement or KPI is identified, it should be stored in the data warehouse for reports and budgets. In an effort to help users retrieve the data easily, data warehouses are typically built on a star schema approach. It is easiest to think of this as a method for organizing that data.
Cloud computing has created a market for data warehouse vendors to sell “out-of-the-box” solutions. This significantly decreases the complexity and time to implement a traditional data warehouse. The notable players for cloud-based data warehouses are Amazon’s Redshift, Snowflake, Teradata DW and IBM’s DB2 . While these products are useful for storing and analyzing structured and unstructured data, a business data warehouse such as the one offered by Solver BI360 can help finance and accounting professionals access their specific data. Furthermore, many mid-sized companies will find the BI360 data warehouse well-suited for all of their requirements.
Predictive:
Gathering and analyzing data to determine the possible drivers to a desired outcome is the goal of predictive analytics. While the data warehouse functions as the “single version of the truth”, the sources feeding predictive analytics can have multiple versions of the truth. For example, an average customer acquisition cost can be calculated using the historical data in the data warehouse. However, the possible customer acquisition cost in the future can come from a diverse set of data sources such as various social media outlets. It is important to collect this data to quickly adapt to a possible change in the customer profile and acquisition cost.
Data volumes are continuing to explode. In fact, more volumes have been created in the past two years than have been created in the existence of the human race. The marketing frenzy around “big data” has convinced companies that they need to capture this data – even if they decide what to do with it later. Marketing buzz coupled with the expensive traditional data processing applications, has created demand for technologies such as Hadoop to make data storage significantly cheaper. The storage of raw and diverse data (free text, image files, etc.) is often referred to as a “data lake”. The term “lake” is used because the data has not been transformed or structured for business usage.
The transformation and analysis of the statistical relevance of data is typically supervised by data scientists. These data scientists use programming language such R, Python and SAS to analyze correlations. Raw data, such as customer retention rates, sales figures, and supply costs, is of limited value until it has been integrated with other data and transformed into information. Consequently, data blend tools such as Alteryx can be used by business analysts to combine correlations from raw data with the operational data from transactional systems to determine causation.
Corporate Performance Management (CPM) suites enable companies to track performance and budget/forecast based on various metrics. While originally focused on the office of finance, many of these CPM tools are starting to focus across departments for company-wide data modeling.
Prescriptive:
Prescriptive analytics is often referred to as the final frontier of analytics.
Descriptive and predictive analytics endow data with relevancy and purpose. Companies can use that to set goals and facilitate strategy development. However, prescriptive proactively alerts managers with recommendations for which strategy to implement for the optimal outcome. Referring to the customer acquisition example, prescriptive can recommend reallocating marketing spend by analyzing average acquisition cost in the past (descriptive) and evaluating possible customer acquisition cost in the future (predictive). The architecture discussed in the previous sections enables business analysts to identify the levers driving certain outcomes. Those levers become the input for prescriptive to influence or recommend an action within a given time constraint.
Prescriptive is a broad term ranging from alerts/scorecards all the way to machine learning and artificial intelligence. Machine learning and AI can significantly improve analytics because the applications learn to make the best decisions without asking management. Machines can be programed to gain insight into the potential impact of each action and learn which one has the highest probability of reaching the desired outcome. Machine learning will help with the transformation of data. For example, the machine will learn the coding in one system (COMPUTER_) refers to the same object as (CoMputer) in another source. This will help companies more effectively use their business intelligence applications because “Computer” will be reported as one object.
While the conversations about artificial intelligence and machine learning are endless, they downgrade the importance of the descriptive data that measures what has already happened. The historical data should be incorporated to capture the company’s competitive advantage within the perspective framework. Unfortunately, the marketing and media attention around AI/Machine learning has resulted in companies initiating projects without the fundamentals of data management or a carefully planned strategy.
**While having machine learning has many benefits, no new technology will obviate an effective, well-run data management function. See descriptive section above.**
As discussed in the predictive section, mid-sized companies are currently at a disadvantage to enterprises with regards to infrastructure and personnel skills. Artificial intelligence can close that gap by helping mid-sized reduce the technical overhead necessary to maintain an analytics platform.
Summary:
Cross-industry studies show that on average, less than half of an organization’s structured data is actively used in making decisions-and less than 1% of unstructured data is analyzed. Furthermore, more than 70% of employees have access to data they should not. Indeed, understanding the architectural framework helps non-technical professionals avoid costly mistakes by effectively integrating solutions to support decision making.
It is difficult for non-technical managers to make sense of the highly fragmented market for analytics. This fragmentation occurs because vendors are selling directly into business units. Research firms such as Gartner have countless definitions and sub-categories for each component of analytics. Vendors are competing voraciously to become a leader in their category. These leaders will have an opportunity to consolidate and create a platform (app exchange) for other vendors to integrate their applications, which, in turn, will hopefully make the entire analytical landscape easier to understand.
The importance of not letting the technology and marketing terms obfuscate the business value cannot be stressed enough. The end goal should be to help managers turn data into information that helps drive profitability and market share.