Recommendation Engines

I wanted to read about recommendation engines for recommending content - the task of selecting which content to show to the user when there is a lot of content to choose from.

2 447

Recommendation Engines

References

A recommender system (RecSys), or a recommendation system (sometimes replacing system with terms such as platform, endine, or algorithm), is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potential overwhelming number of items that a service may offer.
Wikipedia Recommender system

Recommendation Systems

Background

Recommendations: What and Why?

ML-based recommendation models determine how similar videos and apps are to other things you like and then serves up a recommendation on Youube and the Google Play store. The two kinds of recommendations that are commonly used are:

  1. home page recommendations
  2. Home page recommendations are personalized to a user based on their known interests. Every usre sees different recommendations.
  3. related item recommendations
  4. Related items are recommendations similar to a particular item.

A recommendation system helps users find compelling content in a large corpus. A recommendation engine can display items that users might not have thought to search for on their own.

Terminology

  • Items (also known as documents)
    • The entities a system recommends. For YouTube, the items are videos
  • Query (also known as content)
    • The information a system uses to make recommendations. Queries can be a combination of the following:
      • user information
        • the id of the user
        • items that users previously interacted with
      • additional context
        • time of day
        • the user's device
  • Embedding
    • A mapping from a discrete set (in this case, the set of queries, or the set of items to recommend) to a vector space called the embedidng space. Many recommendation systems rely on learning an appropriate embedding representation of the queries and items.

Reccomendation Systems Overview

One common architecture for recommendation systems consists of the following components:

  • candidate generation
  • scoring
  • re-ranking

Candidate Generation, Scoring, Re-ranking

Candidate Generation

In the first stage, the system starts from a potentially huge corpus and generates a much smaller subset of candidates. For example, the candidate generator in YouTube reduces billions of videos down to hundreds of thousands.The model needs to evaluate queries quickly given the enormous size of the corpus. A given model may provide multiple candidate generators, each nominating a different subset of candidates.

Scoring

Next, another model scores and ranks the candidates in order to select the set of items (on the order of 10) to display to the user. Since the model evaluates a relatively small subset of items, the system can use a more precise model relying on additional queries.

Re-Ranking

Finally, the system must take into account additional constraints for the final ranking. For example, The system removes items that the user explicitly disliked or boosts the score of fresher content. Re-ranking can also help ensure diversity, freshness, and fairness.

Candidate Generation

Candidate Generation Overview

Candidate generation is the first stage of recommendation. Given a query, the system generates a set of relevant candidates. The following table shows two common candidate generation approaches:

Type Definition Example
content-based-filtering Uses similarity between items to recommend items similar to what the user likes If user A watches two cat videos, then the system can recommend cute animal videos to that user.
collaborative filtering Uses similarities between queries and items simultaneously to provide recommendations. If user A is similar to user B, and user B likes video 1, then the system can recommend video 1 to user A (evn if user A hadn't seen videos similar to video 1).
Embedding Space

Both content-based and collaborative filtering map each item and each query (or context) to an embedding vector in a common embedding space E=RdE = \R ^dE=Rd . Typically, the embedding space is low-dimensional (that is, ddd is much smaller than the size of the corpus), and captures some latent strructure of the item or query set.

Similarity Measures

A similarity measure is a function s : E×ERs\ :\ E \times E \rightarrow \Rs : E×ER that takes a pair of embeddings and returns a scalar measuring their similarity. The embeddins can be used for candidate generation as follows: given a query embedding qEq \in EqE , the system looks for item embeddings xEx \in ExE that are close to qqq , that is, embeddings with high similarity s(q,x)s(q,x)s(q,x) . To determine the degree of similarity, most recommendation systems rely on one or more of the following:

  • cosine
    • The cosine of the angle between two vectors: s(q,x)=cos(q,x)s(q,x) = \cos (q, x)s(q,x)=cos(q,x)
  • dot product
    • The dot product of two vectors. If the embeddings are normalized, then dot-product and cosine coincide.
  • Euclidean Distance
    • This is the usual distance in Euclidean space, s(q,x)=qx=[i=1d(qixi)2]12s(q,x) = \lVert q - x \rVert = \left[ \sum_{i=1}^d (q_i-x_i)^2 \right]^{\frac{1}{2}}s(q,x)=qx=[i=1d(qixi)2]21 . When the embeddings are normalized, the squared Euclidean distance coincides with dot-product (and cosine) up to a constant
Comparing Similarity Measures

Comparing Suimilarity Measures

In the image above, dpending on the similarity measure used, the ranking of the items can be different.

Which Similarity Measure?

Compared to the cosine, the dot product similarity is sensitive to the norm of the embedding. That is, the larger the norm of an embedding, the higher the similarity (for items with an acute angle) and the more likely the item is to be recommended. This can affect the reccomendations:

  • Items that appear very frequently in the training set tend to have embeddings with large norms. If capturing popularity information is desireable, then you should prefer dot product. However, if you are not careful, the popularity items may end up dominating the recommendations.
  • Items that appear very rarely may not be updated frequently during training. Consequently, if they are initialized with a large norm, the system may recommend rare items over more relevant items.

Content Based Filtering

Content-based filtering uses item features to recommend other items similar to what the user likes, based on their previous actions or feedback.

Advantages:

  • The model doesn't need any data about other users, since the recommendations are specific to this user. This makes it easier to scale to a large number of users.
  • The model can capture the specific interests of a user and can recommend niche items that very few other users are interested in.

Disadvantages:

  • Since the feature representation of the items are hard-engineered to some extent, this technique requires a lot of domain knowledge.
  • The model can only make recommendations based on existing interests of the user. In other words, the model has limited ability to expand on the users' exising interests.

Collaborative Filtering and Matrix Factorization

To address some of the limitations of content-based filtering, collaborative filtering uses simlarities between users and items simultaneously to provide recommendations. This allows for serendipitous recommendations; that is, collaborative filtering models can recommend an item to user A based on the interests of a similar user B.

In practice, embeddings can be learned automatizally, which is the power of collaborative filtering models.

Matrix Factorization

Matrix factorization is a simple embedding model. Given a feedback matrix ARm×n\textbf{A} \in \R^{m\times n}ARm×n , where mmm is the number of users (or queries) and nnn is the number of items, the model learns:

  • A user embedding matrix URm×dU \in \R^{m\times d}URm×d , where row i is the embedding for user i.
  • An item embedding matrix VRn×dV \in \R^{n \times d}VRn×d , where row j is the embedding for item j.

Matrix Factorization

The embeddings are learned such that the product UVTUV^TUVT is a good approximation of the feedback matrix A. Observe that the (i,j)(i,j)(i,j) entry of UUU. VTV^TVT is simply the dot product <Ui,Vj>\left< U_i , V_j \right>Ui,Vj of the embeddings of user iii and item jjj . Which you want to be close to Ai,jA _{i,j}Ai,j .

Choosing the Objective Function

One intuitive objective function is the squared distance. To do this, minimize the sum of squared errors over all pairs of observed entries:

(i,j)obj(Aij<Ui,Vj>)2\underset{U\in \R^{m\times d},\ V \in \R ^{n\times d}}{\text{min}}\sum_{(i,j)\in \text{obj}}\left( A_{ij} - \left< U_i , V_j \right> \right)^{2}URm×d, VRn×dmin(i,j)obj(AijUi,Vj)2

In this objective function, you only sum over observed pairs (i,j), that is, over non-zero values in the feedback matrix. However, only summing over values is not a good idea - a matrix of all ones will have a minimal loss and produce a model that can't make effective recommendations and that generalized poorly.

Choosing an Objective Function

You could treat the unobserved values as zero, and sum over all entries in the matrix. This corresponds to minimizing the squared Frobenius distance between AAA and its approximation UVTUV^{T}UVT

AUVTF2\underset{U\in \R^{m\times d},\ V \in \R ^{n\times d}}{\text{min}} \lVert A-UV^T \rVert ^2 _FURm×d, VRn×dminAUVTF2

You can solve this quadratic problem through Singular Value Decomposition (SVD) of the matrix. However, SVD is not a great solution either, because in real applications, the matrix AAA may be very sparse. The solution UVTUV^TUVT will likely be close to zero, leading to poor generalization performance.

In contrast, Weighted Matrix Factorization decompresses the objective into two sums:

  • A sum over observed entries
  • A sum over unobserved entries (treated as zeros)
(i,j)obj(Aij<Ui,Vj>)2+w0(i,j)obj(Aij<Ui,Vj>)2\underset{U\in \R^{m\times d},\ V \in \R ^{n\times d}}{\text{min}}\sum_{(i,j)\in \text{obj}}\left( A_{ij} - \left< U_i , V_j \right> \right)^{2} + w_0 \sum_{(i,j)\notin \text{obj}}\left( A_{ij} - \left< U_i , V_j \right> \right)^{2}URm×d, VRn×dmin(i,j)obj(AijUi,Vj)2+w0(i,j)/obj(AijUi,Vj)2

Here, w0w_0w0 is a hyperparameter that weights the two terms so that the objective is not dominated by one or the other. Tuning this hyperparameter is very important.

Minimizing the Objective FUnction

Common algorithms to minimize the object function include:

  • Stochastic Gradient Descent (SGD) is a generic method to minimize loss function
  • Weighted Alternating Least Squares (WALS) is specialized to this particular objective

The objective ius quadratic in each of the two matrices U and V. WALS works by initializing the embeddings randomly, then alternating between:

  • Fixing UUU and solving for VVV
  • Fixing VVV and solving for UUU

Each stage can be solved exactly and can be distributed.

Pros and Cons, SGD and WALS

Collaborative Filtering Advantages and Disadvantages
Advantages
  • No Domain Knowledge Necessary: We don't need domain knowledge because the embeddings are automatically learned
  • Serendipity: The model can help users discover new interests.
  • Great Starting Point: To some extent, the system needs only the feedback matrix factorization model. In particular, the system doesn't need contextual features.
Disadvantages
  • Cannot handle fresh items: If the item is not seen during training, the system can't create an embedding for it and can't query the model with this ite,. This issue is often called the cold-start problem. There are some techniques that can address this problem to some extent.
  • Hard to include side fatures for query/item: Side Features are any features beyond the query or item ID.

Recommendation Using Deep Neural Networks

Some limitations of matrix factorization include:

  • The difficulty of using side features. As a result, the model can only be queries with a user or item present in the training set.
  • Relevance of recommendations. Popular items tend to be recommended for everyone, especially when using dot product as similaroty measure. It is best to capture specific user interests.

Deep Neural Network (DNN) models can address these limitations of matrix factorization. DNNs can easily incorporate query features and item features (due to the flexibility of the input layer or the network), which can help capture the specific interests of a user and improve the relevance of recomendations.

One possible DNN model is softmax, which treats the problem as a multiclass prediction problem in which:

  • The input is user query
  • The output probability vector with size equal to the number of items in the corpus, representing the probability to interact with each item

The input to a DNN can inlcude:

  • dense features (e.g., watch time and time since last watch)
  • sparse features (e.g., watch history and country)

Softmax Input Layer

The model architecture determines the complexity and expressivity of the model. By adding hidden layers and non-linear activation functions, the model can capture more complex relationships in the data. The output of the last hidden layer can be denoted ψ(x)\psi (x)ψ(x) .

The model maps the output of the last hidden layer ψ(x)\psi (x)ψ(x) , through a softmax layer to a probability distribution p^=h(ψ(x)VT)\hat{p} = h(\psi (x) V^T )p^=h(ψ(x)VT) where:

  • h:RnRnh : \R^n \rightarrow \R^nh:RnRn is a softmax function, given by h(y)i=eyijeyjh(y)_i = \cfrac{e^{y_i}}{\sum _j e^{y_j}}h(y)i=jeyjeyi
  • $ V \in \R ^{n\times d} is the matrix of weights of the softmax layer

The softmax layer maps a vector of scores yRny \in \R^nyRn (sometimes called the logits) to a probability distribution.

Finally, define a loss function that compares the following:

  • p^\hat{p}p^ , the output of the softmax layer (a probability distribution)
  • ppp , the ground truth, representing the items the user has interacted with. This can be represented as a normalized multi-hot distribution.

The probability of item jjj is given by p^j=exp(<ψ(x),Vj>)Z\hat{p}_j = \cfrac{\text{exp}(\left< \psi (x), V_j \right>)}{Z}p^j=Zexp(ψ(x),Vj) , where ZZZ is a normalization constant that does not depend on jjj .

In other words, log(p^j)=<ψ(x),Vj>log(Z)\text{log}(\hat{p}_j) = \left< \psi (x), V_j \right> - log(Z)log(p^j)=ψ(x),Vjlog(Z) , so the log probability of an item jjj is (up to an additive constant) the dot product of two ddd -dimensional vectors, which can be interpreted as query and item embeddings.

  • ψ(x)Rd\psi (x) \in \R ^dψ(x)Rd is the output of the last hidden layer. We call it the embedding of the query xxx .
  • VjRdV_j \in \R ^dVjRd is the vector of weights connecting the last hidden layer to output jjj . We call it the embedding of item jjj .

In both the softmax model and the matrix factorization model, the system learns one embedding vector VjV_jVj per item jjj . What we called the item embedding matrix, VRn×dV \in \R^{n \times d}VRn×d in matrix factorization is now the matrix of weights of the softmax layer. The query embeddings, however, are different. Instread of learning one embedding UiU_iUi , per query iii , the system learns a mapping from the query feature xxx to an embedding ψ(x)Rd\psi (x) \in \R ^dψ(x)Rd . Therefore, you can think of this DNN model as a generalization of matrix factorization, in which you can replace the query side by a nonlinear function ψ()\psi (\cdot )ψ() .

The softmax training data consists of the query features xxx and a vector of utems the user interacted with (represented as a probability distribution ppp ). These are marked i blue in the figure below. The variables of the model are the weights of the different layers. The model is typically trained using any variant of stochastic gradient descent.

Softmax Training

Matrix Factorization versus Softmax

Retrieval, Scoring, and Re-Ranking

Retrieval

How to decide which items to recommend? At serve time, given a query, you start by doing one of the following:

  • For a matrix factorization model, the query (or user) embedding is known statically, and the system can simply look it up from the user embedding matrix
  • For a DNN model, the system computes the query embedding ψ(x)\psi (x)ψ(x) at serve time by running the network on the feature vector xxx .

Once you have the query embedidng qqq , search for item embeddings VjV _jVj that are close to qqq in the embedding space. This is a nearest neighbor problem.

To compute the nearest nighbors in the embedding space, the system can exhaustively score evey potential candidate. Exhaustive scoring can be expensive for very large corpora, but you can use either of the following strategies to make it more efficient:

  • If the query embedding is known statically, the system can perform exhaustive scoring offline, precomputing and storing a list of the top candidates for each query. This is a common practice for related-item recommendation.
  • Use approximate nearest neighbors. Google provides an open-source tool on GitHub called ScaNN. This tool performs efficient vector similarity search at scale.

Scoring

After candidate generation, another model scores and ranks the generated candidates to select the set of items to display. The recommendation system may have multiple candidate generators that use different sources, such as:

  • Related items from a matrix factorization model.
  • User features that account for personalization.
  • "Local" vs "distant" items; that is, taking geographic information into account
  • Popular vs trending items
  • A social graph; that is, items liked or recommended by friends.

The system combines these different sources into a common pool of candidates that are then scored by a single model and ranked according to that score. You should avoid letting the candidate generator compute a score. The choice of scoring function can dramatically affect the ranking of items and ultimately the quality of recommendations.

Re-Ranking

In the final stage of a recommendation system, the system can re-rank teh candidates to consider additional criteria or constraints. One re-ranking approach is to use filters that remove some candidates. Another re-ranking approach is to manually transform the score returned by the ranker. You should take into account the freshness, diversity, and fairness of your reccomendation system.

TensorFlow Reccomenders

TensorFlow Recommenders (TFRS) is a library for building recommender system models. It helps with the full workflow of building a recommender system: data preparatuon, model formulation, training, evaluation, and deployment. It's built on Keras.

TFRS makes it possible to:

  • Build and devleop flexible recommendation retrieval models
  • Freely incorporate item, user, and context information into recommendation models
  • Train multi-task models that jointly optimize multiple recommendation objectives

TFRS is open source and aailable on Github.

TensorFlow Recommenders: QuickStart

This tutorial builds a simple matrix factorization model using MovieLens 100K dataset with TFRS.

!pip install -q tensorflow-recommenders
!pip install -q --upgrade tensorflow-dataset
out[2]

[?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/96.2 kB ? eta -:--:--  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━ 92.2/96.2 kB 2.8 MB/s eta 0:00:01  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.2/96.2 kB 2.0 MB/s eta 0:00:00
[?25hERROR: Could not find a version that satisfies the requirement tensorflow-dataset (from versions: none)
ERROR: No matching distribution found for tensorflow-dataset


from typing import Dict, Text
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
# Ratings data.
ratings = tfds.load('movielens/100k-ratings', split="train")
# Features of all the available movies.
movies = tfds.load('movielens/100k-movies', split="train")
# Select the basic features.
ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"]
})
movies = movies.map(lambda x: x["movie_title"])
# Build vocabularies to convert user ids and mobie titles into integer indices for embedding layers
user_ids_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))

movie_titles_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
movie_titles_vocabulary.adapt(movies)

# Define a TFRS model by inheriting form tfrs.Model and implementing the compute_loss method

class MovieLensModel(tfrs.Model):
  # We derive from a custom base class to help reduce boilerplate. Under the hood,
  # these are still plain Keras Models.

  def __init__(
      self,
      user_model: tf.keras.Model,
      movie_model: tf.keras.Model,
      task: tfrs.tasks.Retrieval):
    super().__init__()

    # Set up user and movie representations.
    self.user_model = user_model
    self.movie_model = movie_model

    # Set up a retrieval task.
    self.task = task

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    # Define how the loss is computed.

    user_embeddings = self.user_model(features["user_id"])
    movie_embeddings = self.movie_model(features["movie_title"])

    return self.task(user_embeddings, movie_embeddings)

# Define the two models and the retieval task

# Define user and movie models.
user_model = tf.keras.Sequential([
    user_ids_vocabulary,
    tf.keras.layers.Embedding(user_ids_vocabulary.vocab_size(), 64)
])
movie_model = tf.keras.Sequential([
    movie_titles_vocabulary,
    tf.keras.layers.Embedding(movie_titles_vocabulary.vocab_size(), 64)
])

# Define your objectives.
task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(
    movies.batch(128).map(movie_model)
  )
)

# Fit and Evaluate it
# Create a retrieval model.
model = MovieLensModel(user_model, movie_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))

# Train for 3 epochs.
model.fit(ratings.batch(4096), epochs=3)

# Use brute-force search to set up retrieval using the trained representations.
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    movies.batch(100).map(lambda title: (title, model.movie_model(title))))

# Get some recommendations.
_, titles = index(np.array(["42"]))
print(f"Top 3 recommendations for user 42: {titles[0, :3]}")
out[3]

TensorFlow Recommendation Systems Home

Recommendation systems increase user engagement within your app and elevate user experience by providing the most desirable content. Modern recommenders are complex systems that are often broken down into multiple stages to achieve low latency in production. Through the retrieval, ranking, and potentially post-ranking stages, irrelevant items are gradually filtered out from a large pool of candidates and a list of options that users are most likely to interact with are finally presented.

When you've finished training your models, deploy them into production to serve recommendations to end users. TensorFlow Serving productionizes your models for high performance inference. It aims to mazimize throughput of machine learning models and can support large recommendation models that require distributed serving.

ScaNN is a library for vector similarity search at scale. It leverages state-of-the-art Approximate Nearest neighbor techniques, such as asymmetric hashing and aniosotropic qunatization, to accelerate retieval of top candidates.

TensorFlow Ranking is a library for developing scalable, neural LTR models. It provides additional functionalities to rank candidate items to maximize the ranking utilities.

The TPUEmbedding layer API facilitates training and serving large embedding tables on Tensor Processing Units (TPUs).

TesnorFlow Recommenders Addons is a community-contributed project that leverages dynamic embedding technology that is particularly useful for online learning.

TensorFlow Lite provides an on-device recommendation solution that achieves low-latency and high-quality recommendations, while keeping all user data on mobile devices.

TensorFlow Federated is a framework for federated learning and other computations on decentralized data. Federates Reconstruction brings matrix factorization to the federated learning setting and better protects user privacy for recommendations.

TensorFlow Agents Bandits is a comprehensive library of bandit algorithms that can explore and exploit effectively in the recommendation engine setting.

TensorFlow GNN is a library that can efficiently facilitate item recommendations based on network structures and can be used in conjunction with retrieval and ranking models.

Other Resources

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC