Microsoft Data Integration

See how we can help you optimize and future-proof your Microsoft data integration solution on-premises and on Azure.

Additional tools we've used with Azure

Data integration platforms now support moving data between your data center and Azure. Our team has worked with the following data integration solutions with Azure and have a deep understanding of how to build hybrid solutions to optimize performance.

Special getting started offers

As a SQL Server customer,  SQL Server Integration Services (SSIS) is likely your default solution since it’s included with your SQL Server license. With the introduction of Azure Data Factory and Azure Stream Analytics, you are likely trying to figure out how all the pieces fit together.

We recommend that you start with our Future-State Architectural Design Engagement to develop a roadmap for your solutions.

If you are looking to migrate from other database platforms like Oracle, DB2, Sybase, you can use our SQL Server Migration Assistant Support offer to work through any issues you might have with migrating data or converting your ETL packages to Microsoft based solutions.

Using SSIS to create standardized Excel reports

ABCloudZ recently completed a complex development project for one of the largest banks in the Middle East. Faced with non-customizable and slow-to-generate reporting, the customer contacted ABCloudZ to develop a new and improved solution.

In this case study video, you can see how we used SSIS in a non-traditional way to extract data from SQL Server Analysis Services and push the data into standardized Excel-based reports.

Open-source data integration solutions for Azure

For organizations looking to use open-source data integration solutions, our team supports the following Apache solutions for performing data integration tasks and storage. We can also migrate solutions using the Apache technologies to Azure data integration tools like Azure Data Factory to take advantage of serverless computing, performance, security, and integration with other Azure solutions.

Data stores

  • Apache Hadoop is a distributed computing platform. This includes the Hadoop Distributed Filesystem (HDFS) and an implementation of MapReduce. Implemented on Azure as HDInsight.
  • Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google’s Bigtable. Azure Synapse Analytics (previously Azure SQL Data Warehouse) supplies similar capabilities.
  • Apache Hive is a data warehouse software facilitates querying and managing large datasets residing in distributed storage with tools to enable easy data extract/transform/load (ETL) to HDFS and other data stores like HBase. Implemented as U-SQL on Azure.
  • Apache CouchDB is a database that completely embraces the web that stores your data with JSON documents. Implemented on Azure as Cosmos DB.
  • Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics.
  • Apache Cassandra database provides scalability and high availability with linear scalability and fault-tolerance on commodity hardware or cloud infrastructure.

Complex event processing

  • Apache Storm is a distributed real-time computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing real-time computations. Implemented as Azure Stream Analytics.
  • Apache Beam is a unified programming model for both batch and streaming data processing, enabling efficient execution across diverse distributed execution engines and providing extensibility points for connecting to different technologies and user communities.

General data processing

  • Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. AWS Glue provides this capability.
  • Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.
  • Apache Kafka is a distributed, fault tolerant, publish-subscribe messaging that can handle hundreds of megabytes of reads and writes per second from thousands of clients.