Classification

I want to go through the Wikipedia series on Machine Learning and Data mining. Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Date Created:
1 21

References



Notes


Statistical Classification: When classification is performed by a computer, statistical methods are normally used to develop the algorithm.

  • Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves.

Often, the individual observations are analyzed into a set of quantifiable properties, known as explanatory variables or features. These properties may be categorical, ordinal, integer-valued, or real-valued. Other classifiers work by comparing observations to previous observations by means of similarity or distance function.

An algorithm that implements classification is known as a classifier.

Classification and clustering are examples of the more general problem of pattern recognition, which is the assignment of some sort of output value to a given input value. A common subclass of classification is probabilistic classification. Algorithms of this nature use statistical inference to find the best class for a given instance. Unlike other algorithms, which simply output the best class, probabilistic algorithms output a probability of the instance being a member of each of the possible classes. Such an algorithm has numerous advantages over non-probabilistic classifiers:

  • It can output a confidence value associated with its choice (in general, a classifier that can do this is known as a confidence-weighted classifier).
  • Correspondingly, it can abstain when its confidence of choosing any particular output is too low
  • Because of the probabilities are generated, probabilistic classifiers can be more effectively incorporated into larger machine-learning tasks, in a way that partially or completely avoids the problem of error propagation

Classification can be thought of as two separate problems - binary classification and multiclass classification. In binary classification, a better understood task, only two classes are involved, whereas multiclass classification involves assigning one object to several classes. Multiclass classification often requires the combined use of multiple binary classifiers since many classification methods were developed specifically for binary classification,

Most algorithms describe an individual instance whose category is to be predicted using a feature vector of individual, measurable properties of the instance. Each property is termed a feature, also known in statistics as an explanatory variable (or independent variable, although features may or may not be statistically independent).

A large number of algorithms for classification can be phrased in terms of a linear function that assigns a score to each possible category by combining the feature vector of an instance with a vector of weights, using a dot product. The predicted category is the one with the highest score. This type of score function is known as a linear predictor function and has he following general form:

where is a feature vector for instance , is the vector of weights corresponding to category , and is the score associated with assigning instance to category . In discrete choice theory, where instances represent people and categories represent choices, the score is considered the utility associated with person and choosing category . Algorithms with this basic setup are known as linear classifiers. What distinguished them is the procedure for determining (training) the optimal weights/coefficients and the wat that the score is interpreted.

Comments

You have to be logged in to add a comment

User Comments

Frank Frank
Statistical Classification: When classification is performed by a computer, statistical methods are normally used to develop the algorithm.
  • this is the hello world

This us the


0

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC