Data what?

There is a New data guy(read term) in town, and before I introduce him to the crowd, let’s say hi to the old folks. Well, ever since the phrase ‘Data is the New Oil‘ became popular, the business world was on its feet and was after accessing all forms of data in all ways possible. On the other side, new vocabulary was being discovered – Data Warehouse(Not a new guy in town, probably a veteran), Data lake(been a while in town), Datamart(again, been around for a while). But the new entrant to the town is, *DRUMROLLS PLEASE* – Data Mesh.

We have probably heard about the other 3 terms quite a lot, but here we will briefly look at what these terms are and how a company can use them.

Data Warehouse

The purpose of a data warehouse is to store data that’s already modelled or structured. There are no nuances of requirements taken into account from a specific business unit or a function. A data warehouse usually consists of data that has been extracted from transactional systems and is made up of quantitative metrics and the characteristics that describe them. In terms of users, it is ideal for people who want to evaluate their reports, analyze their key performance metrics or manage data set in a spreadsheet every day. Hence, a data warehouse is ideal for “operational” users, as it is simple and it’s built to meet their needs.

A data warehouse can also support users who do more analysis on data. They use a data warehouse as a go-to source for data integration, data preparation and data analytics. Users may also use a data warehouse to do deep analysis, which may create totally new data sources based on research. These users are mainly ‘Data Scientists’ and use advanced analytical tools like predictive modelling and statistical analysis.

When to build a data warehouse?

First, if you need to analyse data from different sources. For instance, you might want to track your most valuable customers on a weekly basis — which requires you to combine payment information from your credit card processor, financial information from your accounting system, and the activity data your customers generate within your product. This is a lot easier to do if your data is located in one central location than if you were to go to three separate places for analysis.

Second, if you need to separate your analytical data from your transactional data. If you collect activity logs or other potentially useful pieces of information in your app or website, it’s probably not a good idea to store this data in your app’s database but a much better idea to purchase or build a data warehouse, one that’s designed for complex querying — and transfer the analytical data there instead.

The third reason you should get a data warehouse is if your original data source is not suitable for querying. Another compelling reason to go for a data warehouse is if you want to increase the performance of your most-used analytical queries.

Now, you have a set of criteria to tick off before you decide whether you want to go for a Data Warehouse or not. Sweet! But how different are they from a Data Lake? Let’s see,

Data Lake

A data lake is a place where you dump all forms of data generated in various parts of your business: structured data feeds, chat logs, emails, images (of invoices, receipts, checks etc.), and videos. The data collection routines do not filter any information out; data related to cancelled, returned, and invalidated transactions will also be captured.

When to build a data lake?

If your company is too big to fit all data into a warehouse. Let’s say your company has a lot of products and functions and there are many possible ways to analyze data to improve the business. In such cases, you might need a cheap way to store different types of data in large quantities.

The other scenario is when you do not have a plan on how to use the data but feel that at some point you will need to scale your business. In those cases, you can collect the data first and store it in a data lake and then use them to analyze it later.

We saw 2 important terms and concepts that are widely used in real-world applications and software. We will now see what a Data Mart is and how different are they from the above,

Data Mart

While a data warehouse is multi-purpose storage for different use cases, a data-mart is a subsection of the data warehouse, designed and built specifically for a particular department/business function. Let’s say, you are the head of a marketing division and you want a separate space to store data related to all things marketing. That would mean you are building a data mart which is only for your team. The benefits include isolated performances since each data-mart is only used for a particular department, the performance load is well managed and communicated within the department, thus not affecting other analytical workloads and Isolated Security, since the data-mart only contains data specific to that department, you are assured that no unintended data access (finance data, revenue data) are physically possible. It is also important to note that a data mart will have a shorter lifespan as it can be discarded based on a use case or a project’s life within an organization.

We saw the top 3 terms widely used in the data engineering space but let’s now see the new guy in town – Data Mesh!

Data Mesh

Data mesh is basically a decentralized system that is based on a distributed architecture for data management. The idea is to make data more accessible and available to business users by directly connecting data owners, data producers, and data consumers & where data is treated as a product and owned by teams that most intimately know and consume the data. Data mesh aims to improve business outcomes of data-centric solutions as well as drive the adoption of modern data architectures. Centralized data platform architectures fail to deliver insights with the speed and flexibility scaling organizations need. Data Mesh serves as a solution to these problems.

Benefits of a data mesh

  • 10X faster innovation cycles, shifting away from manual, batch-oriented ETL to continuous transformation and loading (CTL).
  • More than 70% reduction in data engineering, gains in CI/CD, no-code and self-serve data pipeline tooling, and agile development.

So, there you have it. All 4 important data terms are widely used in the Industry. But a picture speaks 1000 words, doesn’t it?

data terms
Source – LinkedIn

Share the Gyan!

Related Posts

This Post Has One Comment

Leave a Reply

Your email address will not be published.