Regression

I want to go through the Wikipedia series on Machine Learning and Data mining. Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Date Created:
1 19

References



Notes


In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between the dependent variable (often called the outcome or response variable, or a label in machine learning parlance) and one of more error-free independent variables (often called regressors, predictors, covariates, explanatory variables or features). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.

For specific mathematical reasons, this allows the researcher to estimate the conditional expectation (or population average value) of the dependent variable when the independent variable takes on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the condition expectation across a broader collection of non-linear models.

Regression is used for prediction and forecasting and to infer causal relationships between independent and dependent variables. The earliest regression form was seen in Issac Newton's work in 1700 while studying equinoxes. The method of least squares was published by Legendre in 1805 and by Gauss in 1809.

In practice, researches first select a model they would like to estimate and then use their own chosen method to estimate the parameters of that model. Regression models involve the following components:

  • The unknown parameters, often denoted a scalar or vector
  • The independent variables, which are observed in data and are often denoted as a vector
  • The dependent variable, which are observed in data and often denoted using the scalar
  • The error terms, which are not directly observed in data and are often denoted using the scalar

Most regression models propose that is a function of and , with representing an additive error term that may stand in for un-modeled determinants of or random statistical noise:

The researcher's goal is to estimate the function that most closely fits the data. The form of the function must first be specified. Once researchers determine their preferred statistical model, different forms of regression analysis provide tools to estimate the parameters . For example, least squares finds the value of that minimizes the sum of squared errors . There must be sufficient data to estimate a regression model. To estimate a least squares model with distinct parameters, one must have distinct data points.

By itself, regression is simply a calculation using the data. In order to interpret the output of regression as a meaningful statistical quantity that measures real-world relationships, researchers often rely on a number of classical assumptions:

  • The sample is representative of the population at large
  • The independent variables are measured with no error
  • Deviations from the model have an expected value of zero, conditional on covariates
  • The variance of the residuals is constant across observations
  • The residuals are uncorrelated with one another. Mathematically, the variance-covariance matrix of the errors is diagonal.



Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC