The evolution of data warehouse solutions

Advances in data warehouse (DW) and data lake (DL) solutions from major players like Oracle, Teradata, Microsoft, IBM, Amazon Web Services, and Google Cloud have blurred the distinction between the two. Last year, Gartner recognized the shift and renamed their DW report to Data Management Solutions for Analytics (DMSA).

Dark data

We’ve experienced this transition first hand as our customers look for new ways to use what Gartner calls “dark data” with their traditional DW data. Dark data consists mainly of data that must be retained for various compliance reasons. But since the data is stored as log files, text files, and other semi-structured formats on file system archives, it’s basically not accessible to business analysts.

Dark Data

Serverless query against dark data

With Data Lake technologies like Azure Data Lake Analytics, Amazon Athena on top of S3 storage, and Google Cloud Dataproc, organizations can register dark data in the form of virtual tables. This allows traditional BI tools like Microsoft Excel, Power BI, Tableau, QlikView, IBM Cognos, and every other tool that has access to ODBC or JDBC drivers, to access dark data.

Here is an example of what the data processing pipeline looks like using Amazon Athena for serverless queries. For Microsoft Azure, the pipeline looks identical. You simply can replace:

Amazon S3 with Azure Blob Storage or Azure Data Lake Store
Amazon EMR with Azure HDInsight
Amazon Athena with Azure Data Lake Analytics
Amazon Redshift with Azure Synapse Analytics
Amazon QuickSight with Microsoft Power BI

Amazon Athena Sample Pipeline for Analyzing Dark Data

Here is where things get tricky. You now have business analysts processing petabytes of dark data and then joining the results with traditional DW data inside of the client BI tools. Needless to say, the Business Analysts will complain about performance inconsistencies.

Data warehouse external tables

The traditional DW vendors recognized this problem and have solutions to access “dark data” located on Hadoop Distributed File Systems (HDFS) or native cloud storage like Azure Blog Store/Data Lake, Amazon S3, Google Cloud Storage, etc. This means that features like PolyBase for SQL Server 2017 and Azure Data Warehouse, and Amazon Redshift Spectrum can join exabytes of cataloged dark data with DW tables with great performance.

Here is an example of a data processing pipeline using PolyBase for Azure Synapse Analytics (previously Azure SQL Data Warehouse). Like the prior pipeline above, the AWS pipeline is identical. You can simply replace:

Azure Data Lake Analytics with Amazon Athena
Azure Data Lake Store with Amazon S3
Azure Data Factory with Amazon Glue
Azure Data Warehouse with Amazon Redshift
PolyBase with Amazon Redshift Spectrum
Microsoft Power BI with Amazon QuickSight

How SQL Server Polybase simplifies access to Dark Data

More importantly, organizations can operationalize data for analytics using enterprise OLAP engines, data cleansing, and master data management solutions that already connect to the DW tables. This is because cataloged dark data looks like a regular SQL table or view.

Quite simply, it’s all about pushing the computations for aggregating and filtering data as close to the persisted data as possible. This will reduce network and storage latency, ensuring the best performance for your organization.

We are ready to help you make sense of how these technologies all fit together to support traditional BI and advanced analytics using machine learning for your organization.

Featured technologies for data warehouse solutions

Industry giants like Microsoft, Amazon, and Google have developed innovative solutions for accessing traditional data warehouse data, operational data, and dark data. They’ve created robust and customizable solutions to analyze, present, and take actions to transform your business into a data-driven business.

Check out the different services that we provide for our featured data warehouse technologies.

Microsoft Data Warehouse Solutions

See how ABCloudz can help you get the most out of the Microsoft Data Warehouse platform technologies like SQL Server, Azure Synapse Analytics, Azure Data Lake, and more.

Amazon Data Warehouse Solutions

ABCloudz is ready to help customers migrate aging on-premises data warehouse workloads to Amazon Redshift. We can also optimize your Redshift data warehouse and integrate it with other Amazon Web Services.

Google Data Warehouse Solutions

BigQuery is designed to enhance the productivity of your data analysts through a highly scalable and low-cost serverless data warehouse solution. Discover how ABCloudz can help you leverage this technology to benefit your businesses data warehouse needs.

Snowflake Cloud Data Platform

Discover how ABCloudz can help you get the most out of the Snowflake cross-cloud and cross-region data platform. Take advantage of the next-generation cloud data warehouse to address your analytics challenges.

Other supported data warehouse and big data solutions

With advances in big data analytics solutions, we see that the line between the data warehouse and big data technologies is becoming increasingly fuzzy. Here are some of the other data warehouse and big data solutions that we’ve helped customers with. Whether it was migrating to a newer platform, or optimizing what they were already using.

Traditional data warehouse platforms

Oracle Data Warehouse
Teradata
Pivotal Greenplum
IBM PureData System for Analytics (formerly Netezza)
Micro Focus Vertica
SAP HANA and SAP BW
SAP IQ (formerly Sybase IQ)

Big data solutions

Azure Data Lake Store with Azure Data Lake Analytics featuring U-SQL
Amazon Athena using standard SQL against S3 storage
Apache Hadoop and Spark including HortonWorks Data Platform, Cloudera, Azure HDInsight, AWS EMR, and Google Cloud Dataproc
Apache HBase including deployments using Azure HDInsight, AWS EMR, and Google BigTable.
Apache Cassandra including DataStax distribution on Azure, AWS, and Google Cloud.

What we do with data warehouse and big data technologies

We recommend starting with our Future-State Architectural Design Engagement to better understand your needs and then work with your team to develop a technology roadmap for your data warehouse and analytics solutions.

The process starts with understanding your current state and goals for your new solution either on-premises or on Azure, AWS, or GCP. This example shows how we looked at a customer’s current data warehouse solution to see how they could meet their scalability needs while controlling costs.

scheme_psd_1_y

We then work with your team to develop a high-level architecture to identify which services are needed to satisfy your objectives. Here is an example of a high-level solution architecture using AWS services that addresses the future state requirement. This includes maintaining the customer’s current investment with Pentaho for analytics.

scheme_psd_2_y

The process of understanding your on-premises and cloud-based data and then looking at the technologies used to ingest & collect, store, process & analyze, consume & visualize to deliver answers and insights provides your team with a greater understanding of the solution. We use this as a roadmap for developing a project plan and creating a return on investment model for a cost-effective solution which can grow as your business needs change.

Our core services with data warehouse technologies include the following:

New Solutions

We’ll take your initial app concept and turn it into a beautiful and functional web, mobile, desktop, or Internet of Things solution. Covering the full cycle of software development with optional 24/7 monitoring and support with our new solution services.

Consulting

Looking for expert help to optimize your application and data infrastructure? Our certified solution architects, IT operations team, and developers can provide the short-term help or augment your team with our managed services offers.

Cloud Migration

Increase your business agility and capabilities through cloud migration to Microsoft Azure, Amazon Web Services, or Google Cloud Platform.

Data Warehouse Migration

Discover how our unique approach helps perform Data Warehouse migrations to a new platform either on-premises or to a public cloud like Microsoft Azure, Amazon Web Services, or Google Cloud Platform.

Data Warehouses

Let ABCloudz experts improve your data warehouse solutions