The Lack of A Priori Distinctions Between Learning Algorithms

Looking for more research papers to read, I scanned my Hands-On Machine Learning notes for the many papers that were referenced there. This is one of those papers. These papers are mainly on machine learning and deep learning topics.

Reference The Lack of A Priori Distinctions Between Learning Algorithms Paper

DOWNLOAD TEX

Date Created: 25 10, 2025

Last Edited: 43 14, 2025

1 130

This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. This first paper shows loosely speaking that for any two algorithms A and B, there are ”as many” targets (or priors over targets) for which A has lower expected OTS error than B as vice versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is ”anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory.

Much of modern supervised learning theory gives us the impression that one can deduce something about the efficacy of a particular learning algorithm (generalizer) without the need for any assumptions about the target input-output relationship one is trying to learn with that algorithm. Can one get useful, caveat free theoretical results that link the training set and the learning algorithm to generalization error, without making assumptions concerning the target? The primary mathematical tool used in these papers is off-training set (OTS) generalization error, i.e. generalization error for test sets that contain no overlap with the training set. The sole concern of this paper is what can(not) be formally inferred about the utility of various learning algorithms if one makes no assumptions concerning targets.

In computational complexity and optimization, the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method, This name alludes to the saying ”no such thing as a free lunch”, that is, no method offers a ”short cut. This is under the assumption that the search space is a probability density function. It does not apply to cases where the search space has underlying structure that can be exploited more efficiently than random search or even has closed-form solutions that can be determined without search at all.

In the ”no free lunch” metaphor, each ”restaurant” (problem-solving procedure) has a ”menu” associating each ”lunch plate” (problem) with a ”price” (the performance of the procedure in solving the problem). The menus of restaurants are identical except in one regard- the prices are shuffled from one restaurant to the next. For an omnivore who is as likely to order each plate as any other, the average cost of lunch does not depend on the choice of restaurant. But a vegan who goes to lunch regularly with a carnivore who seeks economy might pay a high average cost for lunch. To methodically reduce the average cost, one must use advance knowledge of a. what one will order and b. what the order will cost at various restaurants. That is, improvement of performance in problem-solving hinges on using prior information to match procedures to problems.

The Lack of A Priori Distinctions Between Learning Algorithms

Comments

User Comments