Together these exceptional technologies unite to provide a highly scalable service that solves the omnipresent data problems all organisations are afflicted by. Multiple data warehouses, multiple data marts, inconsistent data models and inconsistent data content all contribute to increases in data maintenance, confused data modelling, lack of trust in the datasets and, inevitably, an increase in Total Cost of Ownership. So what are these products and how do they work together?
Let’s say that as an organization, you have data in various different source systems and different data marts with the exact issue highlighted above. By using Qlik Replicate to connect to source systems and pulling data in real time, then passing that data directly to snowflake for warehousing and governance, you create a solution that unifies the data, integrates and allows you to analyse data that previously was not available. The process would look like the below:
Working for an organisation which boasts Elite Snowflake and Qlik Partner status I decided to see if the hype was justified and tested it out for myself. I registered for SaaS trial accounts with Snowflake and Qlik QDI and was up and running.
Qlik provides a free trial of its QDI products – it is really simple to register, just go here and you’re ready to go. Qlik QDI comprises Replicate, Compose & Catalog, but I’m only delving into Replicate here. For info: Qlik Compose performs rapid ETL functions on the replicated data and in the spirit of democratisation, both your tecchie and non-tecchie users can go shopping for that data by browsing Qlik Catalog. For more details on Qlik QDI (Replicate, Catalog and Compose) please click here
With Qlik Replicate, I was looking to collect and replicate data from a source system (MySQL) then load it to a Snowflake Data Warehouse, although this might easily have been a Data Lake.
To understand Replicate all you need to know is that there are data endpoints, one is the source system and one target destination. A bunch of tasks sit in between the source and target, they do the work of ingesting data from the source system and copying it to the target. In my case, Snowflake.
Like Qlik, Snowflake allows users to signup for trial accounts and also has an amazing learning portal (Snowflake University) which takes you through the setup, configuration and usage of Snowflake. This can be found here
Snowflake is designed for the cloud, it lives on the cloud and takes advantage of what the cloud has to offer, it is SaaS product after all. The database issues I have been subject to in the past such as inconsistent data, poor performing queries (both read and write) and a diverging data model are all addressed by Snowflake using various techniques:
- Performance and speed – Using the elastic nature of the cloud allows you to scale the resources to your need. To load data faster or run multiple concurrent queries you can easily boost processing power by adding or “scaling up” virtual warehouses during heavy processing cycles and back down again when finished.
- Optimised storage – Snowflake automatically optimises how structured and semi-structured data is stored and can combine the two styles for loading into the cloud.
- Concurrency and accessibility – Snowflake addresses concurrency issues with its unique multi-cluster architecture: Queries from one virtual warehouse never affect the queries from another, and each virtual warehouse can scale up or down as required.
- Seamless data sharing – Snowflake’s architecture enables data sharing among Snowflake users. It also allows organizations to seamlessly share data with any data consumer — whether they are a Snowflake customer or not — through reader accounts that can be created directly from the user interface. This functionality allows the provider to create and manage a Snowflake account for a consumer
- Availability and security – Snowflake is distributed across availability zones of the platform on which it runs — either AWS or Azure — and is designed to operate continuously and tolerate component and network failures with minimal impact to customers. It is SOC 2 Type II certified, and additional levels of security — such as support for PHI data for HIPAA customers, and encryption across all network communications — are available.
With a trial activated, I jumped straight into Snowflake. As Qlik is a technology partner of Snowflake, straight away there is an option to be able to use the “Partner Connect” to setup a connection to Replicate and a host of other solutions. If you have not got replicate, this also activates a 14 day trial and gives you all the steps to set up the integration.
Using the partner connect, Snowflake helps the user setup a connection to Replicate and even sets up a database for the new data to be stored into, however you could also specify this yourself.
Now that I have the location for my Replicate endpoint, I can now build my task in Replicate. This is really simple once the endpoints have been identified. To create a source endpoint, I had to add in user access credentials and some other basic information relating to the data, columns etc for the MySQL database. Creating the target endpoint was equally as simple.
This screen shot of the GUI doesn’t really do Replicate justice – it really is drag and drop and you can (almost) build a rudimentary replication task with mouse clicks. It only took a few minutes to select the tables fields I wanted to be replicated and to add a small amount of data transformation. Incidentally, the data transformation I added was a simple concatenation of two text fields, also done with only a few mouse clicks.
There are other elements to add when building the task, such as choosing whether the task is to be Full Load (replicates the entire source) or whether it is intended to Apply Changes only (aka CDC – Change Data Capture). You can multi select these options and also select an option to save the history of each change that occurs.
After that it was a case of clicking on the Run button and watching progress via the intuitive Process Monitor. Once the monitor showed the task as having completed, I double checked the data in Snowflake and happy to say it matched the source data in MySQL, plus the additional concatenated column I created.
With the data replicated across into Snowflake, I now have a database in Snowflake with all data from MySQL replicated into the cloud. All this data is now being continually refreshed via CDC in Replicate. From this I can use Snowflake to create views of information, import different sources to enrich the data, transform and create new columns and much more. This is now continually refreshed, tracked, governed and secure. It is also available to any process that then needs to use this data.
To finish this example, I now want to connect to Snowflake and bring the data to a visualisation platform. Snowflake can be utilised be a multitude of visualisation software, such as Tableau, PowerBI and Qlik. To connect to the Snowflake instance in Qlik Sense SaaS, we simply use the snowflake connector to setup which Server, Database and Schema we want to connect to.
Once this is done, Qlik will pull the data from snowflake and you can visualise the results.
This would then complete the full end to end solution. Qlik Replicate would continually update the Snowflake instance, with ETL built in from Replicate or Snowflake, utilising the power of the cloud to leverage large queries and datasets on an elastic capacity, then updating into the visualisation platform of your choice!
For more details on Qlik QDI or Snowflake, please get in touch with Analytics8 (https://analytics8.co.uk or https://analytics8.com) and we will be happy to answer any questions.