Data Mining

I want to go through the Wikipedia series on Machine Learning and Data mining. Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Date Created:

2 498

References

Data Mining Wikipedia Article

Notes

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Aside from the raw analysis step, data mining involves:

database and data management aspects
data pre-processing
model and inference considerations
interestingness metrics
complexity considerations
post-processing of discovered structures
visualization
online updating

The term data mining is a misnomer because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself. The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (clustering), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining). Data mining involves using machine learning and statistical models to uncover clandestine or hidden patterns in large volumes of data.

Data mining (today) is the process of applying automated data processing tools, such as artificial neural networks, cluster analysis, genetic algorithms, decision trees, and support vector machines with the intention of uncovering hidden patterns in large data sets.

The knowledge discovery in databases (KDD) process is commonly defined with the stages:

Selection
Pre-processing
1. A common source for data is a data warehouse. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing noise and those with missing data.
Transformation
Data mining

Data mining involves six common classes of tasks:

1. Anomaly Detection
2. Association Rule Learning
3. Clustering
4. Classification
5. Regression
6. Summarization

Interpretation/evaluation

- The final step in knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set.

Data Mining

References

Notes

Comments

User Comments