Anomaly Detection

I want to go through the Wikipedia series on Machine Learning and Data mining. Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Date Created:
0 17

References



Notes


In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

Three board categories of anomaly detection techniques exist:

  1. Supervised anomaly detection technique require a data set that has been labeled as normal and abnormal and involves training a classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of labelled data and inherent nature of the classes
  2. Semi-supervised anomaly detection techniques assume that some portion of the data is labelled. The techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance being generated by the model
  3. Unsupervised anomaly detection techniques assume the data is unlabeled and are by far the most common used due to their wider and relevant application.

There is no agreed upon definition for what qualifies as an anomaly, but here are some example definitions:

  • An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism
  • Anomalies are instances of collections of data that occur very rarely in the data set and whose features differ significantly form most of the data
  • An outlier is an observation which appears to be inconsistent with the remainder of that set of data

Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning.

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC