The Unreasonable Effectiveness of Data

Looking for more research papers to read, I scanned my Hands-On Machine Learning notes for the many papers that were referenced there. This is one of those papers. These papers are mainly on machine learning and deep learning topics.

Reference The Unreasonable Effectiveness of Data Paper

Date Created:
Last Edited:
1 8

Sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics. Our goal for modeling human behavior should not be to try to make extremely elegant theories to model it, but to make use of the best ally we have: the unreasonable effectiveness of data. The reason that statistical speech recognition and machine translation were more successful than other tasks like document classification or sentiment analysis at first is because of the size f data available for these tasks. The first lesson of Web-Scale learning is to use available large-scale data rather than hoping for annotated data that isn’t available. Another important lesson from statistical methods in speech recognition and machine translation is that memorization s a good policy if you have a lot of training data. The statistical language models that are used in both tasks consist primarily of a huge database of probabilities of short sequences of consecutive words (n-grams). These models are built by counting the number of occurrences of each n-gram sequence from a corpus of billions or trillions of words. Simple models based ina lot of data invariably trump good models based on less data.

Simple n-gram models or linear classifiers based on millions of specific features perform better than elaborate models that try to discover general rules. For many tasks, words and word combinations provide all the representational machinery we need from text. The experimental evidence of the last decade (2000s) suggests that throwing away rare events (in an attempt to address the curse of dimensionality) is almost always a bad idea, because much Web data consists of individually rare but collectively frequent events. The Semantic Web is a convention for formal representation languages that lets software services interact with each other “without needing artificial intelligence.” “The Semantic Web will enable machines to comprehend semantic documents and data, not human speech and writings.” The semantic interpretation problem - the problem of interpreting human speech and writing - is quite different from the problem of software service interoperability.

Because of a huge cognitive and cultural context, linguistic expression can be highly ambiguous and still often be understood correctly. The same meaning can be expressed in many different ways, and the same expression can express many different meanings. Choose a representation that can use unsupervised learning on unlabeled data, which is so much more plentiful than labeled data.

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC