MIT OpenCourseWare, MIT 6.034 Notes Artificial Intelligence

I got tired of reading texts about the math of machine learning / artificial intelligence, so I am going to watch this video series by MIT OpenCourseWare on AI.

Date Created:
Last Edited:

References



Introduction and Scope


  • What is AI about?
    • Algorithms/procedures/methods enabled by constraints exposed by representations that support the models targeted at:
      • Thinking
      • Perception
      • Action
    • Loops that tie all three things above together.
  • If you get the representation right, you are almost done
  • Simple Trivial
    • Simple ideas are often the most powerful
  • Rumpelstiltskin Principle: once you name something you have power over it
  • History of AI
    • Ada Lovelace
    • Alan Turing (in 1950 Turing Test) and Martin Minsky in 1960 symbolic integration
    • Dawn of AI - 70s
    • Bulldozer Age - 80s
  • Language separates us from chimpanzees.


Reasoning: Goal Trees and Problem Solving


  • If a program can solve the above, is it intelligent? yes
  • Problem Reduction:
    • To solve the above, you simplify it to problems that are more likely to be found in a reference table of common integrals
    • Simplify the problem
  • To have a skill, you have to have understood it and witnessed it
  • Integral: Safe Transformations Heuristic Transformations Safe Transformations ... until done

And-Or Tree

An and–or tree is a graphical representation of the reduction of problems (or goals) to conjunctions and disjunctions of subproblems (or subgoals).
- And–or tree Wikipedia
  • Knowledge about knowledge is power
  • Catechism - questions you should ask yourself all the time
    • What kind of knowledge is involved?
    • How is the knowledge represented?
    • How is the knowledge used?
    • How much knowledge is required?


Reasoning: Goal Trees and Rule-Based Problem Solving


Goal Trees

  • Blocks World
  • Put box B1 on B2
    • Find Space
    • Grasp
      • Clear Top
        • Get Rid of Bx
          • Put Bx on Table ...
    • B1
    • Place
    • Ungrasp
  • Goal Trees, and-or trees
  • Whenever you build goal trees you can answer some questions about your own behavior.
  • Simon's Ant - Complexity of the behavior is the maximum of the complexity of the behavior and the complexity of the environment

Rule Based Expert Systems

  • Can you account for useful aspects of human intelligence by writing all forms of knowledge is simple rules (If this is true, then something else is true)
  • The knowledge of these things tend to be a veneer.


Search: Depth-First, Hill Climbing, Beam


  • British Museum Algorithm
    • General approach to finding a solution by checking all possibilities one by one, beginning with the smallest
  • Depth First Search
    • Going all the way down the tree, then backtracking
    • Break ties lexically
  • Breadth First
    • You know what these are. Going down and across.
  • Hill Climbing
    • Like Depth first search
    • Informed Heuristic of Depth First search
    • break ties according to which node is closest to goal
    • Can get stuck on local maxima
    • Telephone Poll Problem
  • Beam Search
    • Informed Heuristic of breadth first search
    • Limit the number of paths you are going to consider at any level

Beam Search Width 3

Techniques

  • Use Enqueued List
    • If you can't get to the target from a node, then don't check any more paths that go to that node
    • Keep track of paths that extend from node
  • Backtracking
  • Informed
    • Taking advantage of distance to the goal


Search: Optimal, Branch and Bound, A*


  • Oracle
    • Extend shortest path every time until you reach goal.
    • When you reach goal, if there are shorter paths still remaining to be checked, then check those paths (to see if their path length to goal is less) before confirming that you have found the shortest path
  • Branch and Bound
    • Extend shortest path every time until you reach goal.
    • When you reach goal, if there are shorter paths still remaining to be checked, then check those paths (to see if their path length to goal is less) before confirming that you have found the shortest path
  • A*
    • Branch and Bound + extended list + admissible heuristic
Branch and bound (BB, B&B, or BnB) is a method for solving optimization problems by breaking them down into smaller sub-problems and using a bounding function to eliminate sub-problems that cannot contain the optimal solution.

Techniques

  • Extended List
    • Don't extend a node if you already have been to that node through a shorter path
  • Admissible Heuristic
    • An admissible heuristic is used to estimate the cost of reaching the goal state in an informed search algorithm
    • , where H is the estimated distance from node x to the goal and D is the actual distance from x to the goal
  • Consistency
    • , the absolute value of the difference between the estimated distance from node x to the Goal and the estimated distance from node y to the goal has to be less than the actual distance from x to y.


Search: Games, Minimax, and Alpha-Beta


  • Adversarial Games
  • How to teach computer to play chess, strategies:
    1. Analyze, strategy, tactics
    2. If-Then Rules
    3. Look ahead and evaluate
    4. British Museum
    5. Look ahead as far as possible
  • How To Evaluate
    • , = static value, = linear polynomial function, = feature
    • , = constant. = Linear Scoring Polynomial
  • The branching factor, , is the number of children at each node, the outdegree.

  • Leaf nodes for tree with constant branching factor and depth = branching factor to the power of the depth

Minmax

  • You go to the bottom of the tree, you compute static values, you back them up level by level, and then you decide where to go

MinMaxWithAlphaBetaPruning

Alpha Beta

  • A layering on top of minmax that cuts of large sections of the search tree
  • You do this so you don't have to check as many options
  • Minimizer is trying to minimize the maximizer's score and the maximizer is trying to maximize the minimizer's score

Alpha Beta Pruning GIF

  • Progressive Deepening (or Iterative Deepening) is a depth-limited version of depth first search. This can be used with alpha beta pruning and minmax to speed up search. Progressive Deepening actually improves the performance of
  • Deep Blue (Chess computer) was minimax + alpha beta pruning + progressive deepening + parallel computing + opening book + endgame evaluation + uneven tree development


Constraints: Interpreting Line Drawings, Search, Domain Reduction and Visual Object Recognition


Interpreting Line Drawings

  • Interpreting Line Drawings
  • How can we recognize the number of objects in a line drawing? We consider how Guzman, Huffman, and Waltz approached this problem. We then solve an example using a method based on constraint propagation, with a limited set of junction and line labels.
    • The problem is usually divided into two steps, labeling and realization.
      • Labeling is meant to provide a qualitative description of the scene, by classifying the segments of a line drawing as the projection of concave, convex, or contour edges.
      • Realization involves the physical legitimacy of the interpreted scene, and tries to recover the underlying 3D structure.
convex edge is an edge along which the visible angle between the two faces forming the edge is greater than . A concave edge is an edge along which the visible angle between the two faces forming the edge is less than . An occluding edge is a convex edge along which only one of the two faces adjoining the edge in space is visible in the line drawing. The label  along an occluding edge is so directed that upon looking along the edge in that direction, the body of the object is on your right. Limbs are labeled with . Clearly, labeling is not unique, as is evidenced by optical illusions such as the Necker cube.
  • Every local consistency condition can be enforced by a transformation, called constraint propagation, that changes the problem without changing its solutions. Constraint propagation works by reducing domains of variables, strengthening constraints, or creating new constraints. This leads to a reduction of the search space, making the problem easier to solve by some algorithms. Constraint propagation can also be used as an satisfiability checker, incomplete in general but complete in some particular cases.


Search, Domain Reduction

  • Variable - something that can have an assignment
  • Value - something that can be an assignment
  • Domain - a bag of values
  • Constraint - a limit on variable values
Domain Reduction Pseudocode

For each depth first search assignment:

  • For each variable considered
    • For each in
      • For each constraint where
        • If there does not exist such that the is satisfied
          • Remove from
Domain Reduction Heuristics
  1. Check most constraints first
  2. You want to propagate through domains produced to a single algorithm,
  3. If you really try to figure out what the minimum number of resources needed is, you do this over under business and you converge on a narrow range where the search is taking a long time, and be sure that it lies within that narrow range.

Visual Object Recognition

  • Alignment Theory of Recognition
  • The Goldilocks principle = just the right amount of features take into account
  • Correlation principle


Introduction to Learning, Nearest Neighbors


  • Two kinds of learning
  1. Learning based on observations of regularity:
  • Nearest neighbors
    • Field of pattern recognition
  • Neural nets
    • Attempt to mimic biology
  • Boosting
    • Theory
  1. Learning ideas based on constraints (human-like)
  • One shot learning
  • explanation based learning

Nearest Neighbor

  • Mechanism that generates feature detector that generate vector of values that goes into comparator that consults a library of possibilities that then outputs recognition

Decision Boundaries

  • If something is similar in some respects, it is likely they will be similar in other respects


Learning: Identification Trees, Disorder


Identification Tree

Entropy of a Set

Decision Tree


Learning: Neural nets, Back Propagation


Neuron

Neural Net

  • Training a neural net is adjusting the weights and thresholds so that what we get out is what we want
  • A neural net is a function approximator
  • By taking partial derivatives of the performance with respect to the weights, you can change the weights using gradient descent to get better performance
  • Get rid of the threshold, add a constant weight
  • Computation required is linear with respect to the number of layers
  • Computation required with respect to features is parabolic
  • Convolutional Neural Nets
  • Boltzmann Machines
  • Backpropagation


Learning: Genetic Algorithms


In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection.
Genetic Algorithm Wikipedia Article


Learning: Near Misses, Felicity Conditions


One-shot learning

Learning in human-like way, in one shot: learning something definite from each example.

The evolving model

Comparing an initial model example, a seed, with a near miss or another example, the evolving model understands an important characteristic for each new near miss or example compared.

The evolving model develops a set of heuristics to describe the seedspecializing with near misses (reducing the potential matches) or generalizing with examples (broadening the potential matches) the characteristics of the seed.

  • Require link heuristic: specialization
  • Forbid link heuristic: specialization
  • Extend set heuristic: generalization
  • Drop link heuristic: generalization
  • Climb tree heuristic: generalization

Felicity conditions

The teacher and learner must know about each other to achieve the best learning. The learner must talk to himself to understand what he is doing.

How to package ideas better

To better communicate ideas to others in order to achieve better results, the following 5 characteristics makes communication more effective.

  • Symbol: ease to remember the idea
  • Slogan: focus the idea
  • Surprise: catch the attention
  • Salient: one thing to stand out
  • Story: helps transmission to people


Learning: Support Vector Machines


Decision boundaries

Separating positive and negative example with a straight line that is as far as possible from both positive and negative examples, a median that maximizes the space between positive and negative examples.

Constraints are applied to build a support vector (u) and define a constant b that allow to sort positive examples from negative ones. The width of a “street” between the positive and negative values is maximized.

Going through the algebra, the resulting equation show that the optimization depends only on the dot product of pair of samples.

The decision rule that defines if a sample is positive or negative only depends on the dot product of the sample vector and the unknown vector.

No local maximum

Such support vector algorithm can be proven to be evolving in a convex space, meaning that it will never be blocked at a local maximum.

Non linearity

The algorithm cannot find a median between data which cannot be linearly separable. A transformation can however be applied to the space to reorganize the samples so that they can be linearly separable. Certain transformations can however create an over fitting model that becomes useless by only sorting the example data.


Learning: Boosting


Classifiers

Classifiers are tests that produce binary choices about samples. They are considered strong classifiers if their error rate is close to 0,  weak classifiers if their error rate is close to 0.5.

By using multiple classifiers with different weights, data samples can be sorted or grouped according to different characteristics.

Decision tree stumps

Aside from classifiers, a decision tree can be used to sort positive and negative samples in a 2-dimension space. By adding weights to different tests, some samples can be emphasized over the others. The total sum of weights must always be constrained to 1 to ensure a proper distribution of samples.

Dividing the space

By minimizing the error rate of the tests from the weights, the algorithm can cut the space to sort positive and negative examples.

No over fitting

Boosting algorithms seems not to be over fitting, as the decision tree stumps tends to be very tightly close to outlying samples, only excluding them from the space.


Representations: Classes, Trajectories, Transitions


Vocabulary

In a semantic net, a diagram of relations between objects, essential notions can be defined as follows:

  • Combinators: linking objects together
  • Reification: actions implying results
  • Localization: a frame where objects and actions happen
  • Story sequence: a series of actions happening linearly in time

Classification

In natural language, knowledge is generally organized from general categories to basic objects and finally specific objects.

Transition

Another element of language is recording change in the evolution of objects during the unfolding of stories.

Trajectory

Language also tracks movement in the description of actions.

An agent makes an object move from a source to a destination using a instrument. A co-agent may be involved, the action might be aimed towards a beneficiary, and the motion may be helped by a conveyance, etc. In English, preposition tend to be used to define the role of each part in the action, enabled recording of interactions.

Language corpuses, such as the Wall Street Journal Corpus, are generally composed of 25% of transition or trajectory.

Story sequences

Agents’ action determine transitions in the semantic net, which result in trajectories.

Story libraries

Each type of story implies a number of characteristics that correspond to the situation. Example: events can be disasters or parties, they have a time and place, involved people, casualties, money, places…


Architectures: GPS, SOAR, Subsumption, Society of Mind


General Problem Solver

By analyzing the difference between a current state and desired state, a set of intermediary steps can be created to solve the problem = problem solving hypothesis.

SOAR (State Operator And Result)

SOAR Components:

  • Long-term memory
  • Short-term memory
  • Vision system
  • Action system

Key parts of the SOAR architecture:

  1. Long-term memory and short-term memory
  2. Assertions and rules (production)
  3. Preferences systems between the rules
  4. Problem spaces (make a space and search through that space)
  5. Universal sub-goaling: new problems that emerge during the resolution become entire new goal with assertions rules, etc.

SOAR relies on the symbol system hypothesis. It primarily deals with deliberative thinking.

Emotion machine

Created by Marvin Minsky to tackle more complex problems, this architecture involves thinking about several layers:

  • Reflective thinking
    • Self-conscious
    • Self-reflective
  • Deliberative thinking
  • Learned reaction
  • Instinctive reaction

It is based upon the common sense hypothesis.

Subsumption

System created by Rodney Brooks. By generalizing layers of abstraction in the building of robots (such as for robot vision and movement), modifications to certain layers don’t interfere with other layers computation, allowing for better incremental improvement of the system as a whole.

It primarily deals with instinctive reaction and learned reaction.

This is the creature hypothesis, if a machine can act as an insect, then it will be easy to develop further later. This architecture relies upon the following principles:

  1. No representation
  2. Use the world instead of model: reacting to the world constantly
  3. Finite state machines

Genesis

Based upon language, this system involves perception and description of events, which then allow to understand stories and further, culture both at the macro (country, religion…) and micro (family…) levels. This system relies upon the strong story hypothesis.


Probabilistic Inference I


Probabilities in Artificial Intelligence

With a joint probability table, recording the tally of crossed events occurrence will allow us to measure the probabilities of each event happening, conditional or unconditional probabilities, independence of events, etc.

The problem with such table is that as the number of variables increase, the number of rows in the table grows exponentially.

Reminders of probabilities formulas

Basic axioms of probability

Basic definitions of probability

Chain rule of probability

Independence

Independent events

if and are independent

Conditional independence

If  and  are independent

Belief nets

Causal relations between events can be represented in nets. These models highlight that any event is only dependent from its parents and descendants. Recording the probabilities at each node, the number of table and rows is significantly smaller than a general table of all events tallies.


Probabilistic Inference II


Beliefs nets

Events diagrams must always be arranged in a way so that there are final nodes and no loops. Recording probabilities in tables for each event, the tables are filled by repeating experience so as to know the probabilities and occurrences of each event.


Bayesian inference

Several models can be drawn for a given set of events. To know which model is right, the Bayesian probabilities formulas can be used to confirm if events are independent or not, make them easier to compute, and choose the more appropriate model.

Defining a as a class, and b as the evidence, the probability of the evidence given the class can be obtained through these formulas.

Using the evidence from experience, classes can inferred by analyzing the results and corresponding probabilities.

Structure discovery

Given the data from experience / simulation, the right model can be sorted as it better corresponds to the probabilities. This allows to select between 2 existing models.

However if multiple models can be created, volumes of data make it impossible to compare them all. The solution is to use two models and compare them recursively. At each trial, the losing model is modified for improvements until a model fits certain criteria for success.

A trick is to use the sum of the logarithms rather than the probabilities, as large numbers of trials will make numbers too small to compute properly.

To avoid local maxima, a radical rearrangement of structure is launched after a certain number of trials.

Applications

This Bayesian structure discovery works quite well in situations when a diagnosis must be completed: medical diagnosis, lie-detector, symptoms of aircraft or program not working…


Model Merging, Cross Modal Coupling, Course Summary


Bayesian Story Merging

By using the probability model discovery previously studied, certain concepts and ideas can be analyzed and merged if similar.

Cross-Modal Coupling

By analyzing the correspondences between clusters of two sets of data, certain data subsets regular correspondences can be sorted out. According to Prof. Patrick Winston, this system of correspondences is very likely to be present in human intelligence.

Applications of AI

  • Scientific approach: understanding how AI works
  • Engineering approach: implementing AI applications

The most interesting applications are not to replace people with artificial intelligence but to work in tandem with people.

Using a lot of computing power and data becomes more common, but an interesting question is how little information is needed to work a certain problem.

Genesis system

The system translates stories into a internal language to understand stories and display them in diagrams. It allows to read stories on different levels, and use different “personas” to understand stories differently.

Humans may not be intelligent enough to build a machine that is as intelligent as them.


Comments

You must be logged in to post a comment!

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


Insert Chart

ESC

View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language