Data warehousing is included in the data mining main area. It is a very interesting subject to learn about. Furthermore, if you are willing to be a data scientist this subject is a must. In this article, I will explain to you about an introduction to the data warehouse. The characteristics of them. Data warehouse approaches. architectures and issues. Along with a start up introduction to data mart.
If you find these areas interested, or if you are a student who is looking for basics about data warehousing. Please keep reading. I will explain to you all these with suitable examples too.
What is Data Warehousing?
Data warehousing is a broad definition. But the basic idea is, it is a huge set of data which is produced to get help for the decision making the process. This data pool can be consisting of both internal and external data. Furthermore, it can also include historical data or present data. Those data needs to have the potential interest to the company upper levels to make decisions.
Characteristics of Data in Data Warehousing
The data in a data warehouse should full fill these characteristics.
- Subject-oriented: usually, the data is transaction oriented. But in a data warehouse, it should store focusing on the subject.
- Integrated: The data is collected to the warehouse from different sources. But it should all be in a consistent format
- Time-Variant: this means the data contains historical data. The past data is very important when making decisions, to create trends, for forecasting and prediction too.
- Non-Volatile: When data enters the warehouse, it cannot change. If the data is updated, then it should remove from the warehouse and insert as a new entry.
- Data Granularity in Data Warehousing: Data needs to be in the level of details. The more details. The granularity is high. The decision-making process is more precise when using high granularity data.
The goal of Data Warehousing
Data Warehousing main goal is to give unified access to all data. You might wonder why we need this unified access, And also who could be the potential users? These data are valuable for all the users who use the system. The data is collected and combined to form information. After that, the system will provide an integrated view and an interface to interact with the user. Finally is will support sharing necessary data among the users. This is the goal of the data warehouse.
Approaches of Data Warehousing
There are two main approaches of data warehouse
- Query driven approach – also called an on-demand approach
- The warehousing approach
The query driven approach is the traditional research approach. The clients connect to the system. While the system is also connected to the meta data. Then the system is connected to the number of wrappers which connects to the sources. There is a number of disadvantages to this approach. There is a delay in query processing. And also this method is inefficient and expensive too. Because of these reasons, it has not been a star in the industry.
But the next approach, the warehousing approach is suitable for the industry. This is also known as update driven approach. Here, the information is integrated with advanced. Furthermore, for the wrapper it has replaces an extractor monitor. So that it is easy for the data warehouse for direct querying and analyzing.
Data warehouse Architectures
Let us see what are the data warehouse architectures in details.
- Single layer Data Warehousing architecture
- In this architecture, the data element is stored once. And each element has only once chance too. This is also a virtual warehouse. so the interactions are very direct
- Two layer
- Real-time data and derived data are combined in this architecture. This is commonly used data warehousing architecture in the industry. the real-time data is the first layer. And then the derived data connects as the second layer.
- Three layer
- This is a further step of the two layer architecture. Between the two layers of real time data and derived data, the reconciled data layer is inserted. It is a physical implementation too.
Issues of data warehousing
There are some issues in the data warehouse. mainly the warehouse design is an issue. Is it complex tasks to complete. Moreover, data extraction comes as a challenge. To overcome this, we can use wrappers and monitors. Wrappers convert from one model to another, whereas, monitors goal is to detect the changes.
To continue with the issues, data integration is also a challenge. Cleaning and merging data is a requirement of the integration process. In integration, receiving data from multiple wrappers or monitors and connecting to the warehouse it the purpose of it. Furthermore, warehousing specification, maintenance, and optimization are also big challenges and issues to overcome.
What is a data mart?
The data mart is a logical explanation. It is the logical subset of a complete data warehouse. These data marts organize around a particular subject / single business process. There are two types of data marts
- Dependent data mart: the subset creates directly from the data warehousing. So it is dependent on it. the data quality is high. And it is a consistent model too. The issue is the data warehouse needs to be created in advance.
- Independent data mart: this is small in size. Creates for a single business unit or a department. This is not enterprise-wide. Therefore, it is affordable for smaller companies as well.
In this article, we learned about data warehousing in details. What a data warehouse is. The purpose of it. characteristics of the data warehouse in details. We also learned about the data warehouse goals. And the approaches to it. I explained the three architecture types. Moreover, the issues were also discussed in details. Finally, I described to you about the data mart and the types of it.
If you need more information about data mining and warehousing you can read my previous article. I hope you gained good knowledge of data warehouse and related topics. See you in another article. Happy reading.