MTEB: Massive Text Embedding Benchmark

I need to do a lot with text embeddings, so I am going to read this paper. This paper establishes a benchmark to test text embeddings.

Reference arxiv paper

Date Created:
Last Edited:
1 31

Introduction

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or re-ranking.This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, the authors introduce Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. This paper establishes the most comprehensive benchmark of text embeddings to date. The paper finds that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up suffciently to provide state-of-the-art results on all embedding tasks.

Natural Language embeddings power a variety of use cases: clustering and topic representation, search systems and text mining, and feature representations for downstream models. The massive Text Embedding Benchmark (MTEB) aims to provide clarity on how models perform on a variety of embedding tasks and thus serves as the gateway to finding universal text embeddings applicable to a variety of tasks. The paper finds that there is no single solution for embeddings - different embeddings dominate different tasks.

Notes

MTEB is built on a set of desiderata:

  • Diversity: MTEB aims to provide an understanding of the usability of embedding models in various use cases. The benchmark comprises 8 different tasks, with up to 15 datasets each.
  • Simplicity MTEB provides a simple API for plugging in any model that given a list of texts can produce a vector for each list item with a consistent shape.
  • Extensibility New datasets for existing tasks can be benchmarked in MTEB via a single file that specifies the task and a Hugging Face datasset name where the data has been uploaded
  • Reproducivility: Make it easy to reproduce results

Tasks and Evaluation

  • Bitext mining Inputs are two sets of sentences in two different languages. For each sentence in the first set, the best match in the seond set needs to be found.
  • Classification A train and test set are embedded with the provided model. The train set embeddings are used to train a logistic regression classifier with 100 maximum iterations, which is scored on the test set.
  • Clustering Given a set of sentences or paragraphs, the goal is to group them into meaningful clusters.
  • Pair classification: A pair of text inputs is provided and a label needs to be assigned.
  • Reranking Inputs are a query and a list of relevant and irrelevant reference texts. The aim is to rank the results according to their relevance to the query.
  • Retrieval Each dataset consists of a corpus, queries and a mapping for each query to relevant documents from the corpus. The aim is to find these relevant documents.
  • Semantic Textual Similarity (STS) Given a sentence pair the aim is to determine their similarity.
  • Summarization The aim is to score the machine summaries.

PIC

You can read more about how comments are sorted in this blog post.

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Files

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Upload Previous Version of Editor State

ESC