According to a Gartner survey, 81% of public cloud users leverage more than one cloud provider. With the explosion of multicloud and hybrid cloud deployment as the primary type of cloud migration strategy, there is a growing need to integrate disparate cloud systems. Cloud data integration can help multiple applications—in the public cloud, private cloud, and on-premises—consistently synchronize and share data.
In this post, we’ll discuss important considerations and challenges facing cloud data integration projects, briefly review cloud integration platforms that can help, and show how NetApp Cloud Volumes ONTAP can help share storage between clouds and on-premises systems.
Cloud data integration is the practice of integrating data used by disparate systems, between or within public or private clouds, or between cloud-based and on-premise systems. The goal is to create unified data stores that can be accessed efficiently and transparently by all relevant users and applications.
There are mature tools for data integration within public cloud providers or private cloud platforms, for example within AWS or an OpenStack data center. The main challenge begins when organizations need to integrate multiple public clouds, set up hybrid cloud environments, integrate legacy on-premise systems with cloud workloads, or lift and shift legacy workloads into the cloud.
Without automation and central control, cloud data integration can be painful. Administrators need to manually set up multiple integrations, test them and verify data is being transferred correctly. They also need to convert or transform data to suit the file format, data structure or data model expected by different cloud systems. This is why many organizations use Integration Platforms as a Service (iPaaS) that have pre-built adaptors or connectors for many IT systems, making cloud integration faster, easier and less error prone. We briefly review a few of these systems below.
Cloud Data Integration Benefits and Challenges
Integrating data between cloud systems, and between cloud and on-premises systems, can have several important advantages:
Synchronizing data—ensuring IT systems and applications operating on the same data or entities have a consistent view of the data with frequent or real-time updates.
Automating workflows—integration can help automate organizational processes that involve manual copying of data or manual data entry, and standardize how data should be treated on its way from one application to another.
Eliminate redundant data—it is very common for IT systems to store the same data several times, for the benefit of different applications or organizational processes. Integration helps to eliminate duplications and use a shared data store, reducing storage costs and synchronization efforts.
Flexibility and scalability—integrating systems gives operational staff many opportunities to improve processes, and identify new systems that can provide more value to internal and external customers.
However, cloud data integration projects may face several important challenges:
Data movement—moving data between clouds, and between cloud and on-premise systems, can be time-consuming and error-prone, or even unfeasible in some cases, depending on the data volumes and the required data transfer frequency. Cloud data integration won’t work without solid strategies to transfer data in a timely manner.
No standardization—there is no standard approach or protocol for integrating data between cloud systems, not to mention cloud and on-premise systems. Each cloud platform, service or resource tends to have different data schemas and formats. Data connectors or adaptors need to be constantly updated, as new cloud services are introduced or as applications are updated or modified.
Architectural issues—cloud systems are often architected with scalability or performance in mind, not around data integration. For example, in a system that rapidly scales up or down, with data stored on dozens or hundreds of cloud instances, it may be challenging to synchronize with external systems.
ETL—in traditional data integration projects, complex Extract Transform Load (ETL) workflows were set up to clean data and transform it into the precise format needed by target systems. Many cloud systems work with unstructured data or provide a flexible data model for structured data. However, they still need data to be cleaned, treated and converted into the desired format. Integration strategies must consider how ETL can be performed without slowing down the integration or adding a lot of complexity.
Evaluating Cloud Integration Platforms for Data Integration
While simple data integration projects can be performed ad hoc, with scripting or home-grown automation, enterprise-scale projects will almost always require an integration platform.
Use the following considerations to evaluate cloud integration platforms for your project:
Specific application support—does the platform support the applications you are running today? Is it extensible and able to support any applications or data formats you may adopt in the future, and what is the effort involved in building a custom connector or integrating additional data sources via API?
How mapping works—most integration platforms offer a visual interface for mapping data fields between source and target systems. Check if the interface is friendly for the individuals who will carry out data mapping, who may not have a technical background. Also make sure it is powerful enough to handle your organization’s data formats, rules and exceptions.
Security and compliance—consider the security requirements and industry standards applicable to each of your datasets. See if the integration platform can fulfill those requirements, both for one-time data transfers and ongoing data synchronization.
Data cleaning, preparation and integrity—check if the integration platform takes responsibility for preparing data, converting it to the target format and verifying its integrity. If not, you will need to have other tools or strategies in place to create the data stream your target application expects.
Cloud Data Integration and Cloud Volumes ONTAP
NetApp Cloud Volumes ONTAP can help integrate your data as part of your cloud migration strategy. Cloud Volumes ONTAP is built on the raw cloud resources on AWS, Azure, and Google. And thanks to Cloud Manager, it has a single-pane UI from where you can control, automate, and orchestrate all your storage resources, no matter where they're located, whether it's in a hybrid or multicloud architecture.
In addition, NetApp SnapMirror® helps to replicate, migrate, and synchronize data across different data sources, platforms and clouds, making it easier for you to integrate your data across every storage environment you use.