Data Mining and Warehousing Introduction
Data Mining and Warehousing is a very useful area for students who are completing research for their degree. For any type of research area, these Data mining techniques are really important and used highly. Even though data warehousing is another type of area it is also a sub-topic of Data Mining. In this article, I will explain to you the introduction to Data mining. How to do the knowledge discovery in database. How these are used in business intelligence, tasks. The challenges we face in data mining. Moreover, a brief knowledge of data science.
If you are interested to know more about Data Mining and Warehousing keep reading with us. All these topics will be explained with relevant examples accordingly.
What is Data in Data Mining and Warehousing?
The term Data is described as a set of variables which has a qualitative or a quantitative meaning. These data are measured, collected analyzed to transfer it to information. Furthermore, they are visualized using images, graphs, charts, and other analysis tools.
Why we mine data?
To answer this question for Data Mining and Warehousing we can further categorize it to 3 viewpoints.
- In the commercial viewpoint, we mine data for so many reasons. Gathering and collecting data for e-commerce reasons. Purchasing records gathering. Bank and card transactions. These data are used in the decision-making process in the commercial field.
- In a scientific view, point data is gathered at an enormous speed. Such as from satellites, scientific simulators. These data help the scientist to classify and segment them and to create a hypothesis.
- Society viewpoint data mining helps to help candidates win the election and other related data. These might be used in a certain period of time etc.
What is data mining in Data Mining and Warehousing?
In simple terms, we can describe data mining as discovering knowledge from data. When describing more, this is some sort of extraction of implicit, previously unknown and potentially useful patterns. The data is gathered in huge amounts. There are other names used to describe Data Mining and Warehousing. There are KDD. Which is knowledge discovery in databases. Knowledge extraction. Data and pattern analysis or archaeology. Harvesting etc.
So do you think everything we collect in data visa is a data mining process? No, it is not. As an example is we search for a simple query and a process which gives us a few sets of data. It is not data mining. And also if we look for a number in the phone book. It is also not data mining. Because data mining needs a large amounts of data. And also it basically searches for patterns.
How to do the knowledge discovery in Data Mining and Warehousing process?
The knowledge discovery in Data Mining and Warehousing process is straight forward. But there is a lot to research and study.
As the above picture depicts, the process flows step by step. First, the data is gathered in a database. The data is cleaned using methods to extract valid and useful data. Then it is sent to the data warehouse. Then the task-relevant data is extracted from the data set. Using data mining methods data patterns are identified. And then it is evaluated for the optimized pattern. Then the knowledge gathering will be started. These knowledge gathering methods will also be reused in data integration, data selection and for data mining algorithms too.
Data Mining and Warehousing in business intelligence
To start with, Data mining is used in business intelligence. Mainly, identifying the data sources is the first step. after that, the pre-processing, integration is done along with the data warehouses. Then using statistical summary, querying and reporting data is explored. Then information discovery is done for data mining. FInally, the data is presented using visualization techniques. All these are used in the decision-making process.
Data Mining and Warehousing interdisciplinary sub fields of CS
- Algorithms for Data Mining and Warehousing
- Machine learning
- Pattern Recognition
- Distributed Computing
Data mining Task Types
There are data mining task types you need to be aware of. The list is below
- Descriptive and visual
Average, median, min, max are some variables uses to visualize and describe the tasks
- Diagnostic analytics
Using to possibly test a hypothesis or find human interpretable patterns.
- Predictive and forecasting
Using to apply the observed patterns to predict the future. And forecast the future patters
- Prescriptive and optimization
Used mathematical, stochastic and statistical modeling for further studies
Data Mining Tasks
- Classification in Data Mining and Warehousing
- When there is a training set of records, the data classifies and finds the model for the class attribute. The goal is to find the previously unseen records. And they assign to the class accurately as possible.
- Clustering means that the data categorizes according to the similarities of the attributes.
- Association rule discovery
- This means that the set of data or records identifies according to the dependencies. And the dependency prediction forms using the occurrences of each item
- Sequential Pattern Discovery
- If a set of objects associates with its own events in the time line. And they are used to predict a stronger sequential dependency it is a pattern discovery.
Challenges of Data Mining and Warehousing
There are a few challenges in Data Mining and Warehousing. Mainly, the scalability and dimensionality of the data is a challenge. Furthermore, complex and heterogeneous data is an issue. Data quality with data ownership and distribution comes as a challenge too. Moreover, streaming data and privacy preservation is also a challenge to overcome in this field.
The term Data science is a collection which includes various interesting and research topics such as data mining, big data, statistics, data analytics, machine, etc. It also covers all the areas which include data.
Finally, in this article, we learned the basics of Data Mining and Warehousing. The definitions, how KDD works. The tasks types along with a brief introduction to data science. Keep reading my next article to know more information.