Data Mining

I want to go through the Wikipedia series on Machine Learning and Data mining. Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Date Created:

References



Notes


Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Aside from the raw analysis step, data mining involves:

  • database and data management aspects
  • data pre-processing
  • model and inference considerations
  • interestingness metrics
  • complexity considerations
  • post-processing of discovered structures
  • visualization
  • online updating

The term data mining is a misnomer because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself. The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (clustering), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining). Data mining involves using machine learning and statistical models to uncover clandestine or hidden patterns in large volumes of data.

Data mining (today) is the process of applying automated data processing tools, such as artificial neural networks, cluster analysis, genetic algorithms, decision trees, and support vector machines with the intention of uncovering hidden patterns in large data sets.

The knowledge discovery in databases (KDD) process is commonly defined with the stages:

  1. Selection
  2. Pre-processing
    1. A common source for data is a data warehouse. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing noise and those with missing data.
  3. Transformation
  4. Data mining
  • Data mining involves six common classes of tasks:
    1. Anomaly Detection
    2. Association Rule Learning
    3. Clustering
    4. Classification
    5. Regression
    6. Summarization
  1. Interpretation/evaluation
    • The final step in knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set.

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


Insert Chart

ESC

View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language