What Are Sentence Embeddings and why Are They Useful?

Published in

Talkdesk Engineering

6 min readMar 30, 2020

The Natural Language Processing (NLP) team at Talkdesk is in charge of improving the agent and customer experience in contact centers by understanding their conversation.

In a previous post, we discussed how Word Embeddings represent the meaning of the words in a conversation.

Sometimes we need to go a step further and encode the meaning of the whole sentence to be able to understand the context in which the words are said.

The representation of the meaning of a sentence is important for many tasks.

It allows us to understand the intention of the sentence without calculating individually the embeddings of the words. It also enables the comparison of sentences to cluster them by similarity or to predict values for the sentences, such as sentiment.

Agent Assist, which provides suggestions for the agent to help the customer, detects the intent of the customer to recommend articles and actions.

In Figure 1 we can see the Agent Assist suggesting scheduling an appointment based on the intention of the customer.

To detect the intention of the customer, we need to represent the sentences as numerical vectors, to be able to send them to a Machine Learning component.

Figure 1 — Agent Assist intent detection.

From Word Embeddings to Sentence Embeddings

To represent sentences, we can’t one-hot encode them as we did to words because there are so many possible sentences that it would be unpractical.

A possible and straightforward way to create sentence representations is to take advantage of the embedding of each word and calculate the embedding of the whole sentence based on those.

An example of that is SIF, which uses a weighted average of the embeddings of each word and applies a dimensionality reduction to obtain the sentence embedding (Figure 2).

This method is simple, fast and surprisingly good given its simplicity and word independence assumptions.

However, it does not take into consideration the interaction between the words in a sentence, nor the word order.

State-of-the-art Sentence Embeddings

Sentence-BERT is currently (March 2020) the state-of-the-art algorithm to create sentence embeddings.

It was presented in 2019 by Nils Reimers and Iryna Gurevych, and it uses recent advances in the NLP field to generate embeddings.

There are four concepts incorporated in its architecture that make it supersede all other algorithms up to this date:

Attention — The attention mechanism allows the algorithm to create the embedding by focusing only on the most important parts of the input.
Transformers — The Transformer architecture, presented by Google Brain and the University of Toronto in 2017, showed how to use the attention mechanism in a neural architecture that could be parallelized, taking less time to train.
BERT — Presented in mid-2019 by the Google AI language team. It fulfilled the promises of using the Transformers to create a general language understanding model much better than all its predecessors, taking a huge step forward in NLP development. When it was presented, it achieved state-of-the-art results in many tasks. It is composed of 24 layers of Transformer blocks.
Siamese Network — A siamese network is a neural network trained to compare the similarity between two inputs. In this case, Sentence-BERT was trained to calculate the similarities between two input sentences. The key is that it generates internal representations of the sentences, suited for similarity problems. Those representations, the Sentence Embeddings, will be created using two BERT networks in a siamese arrangement.

Calculating the Most Similar Sentences

One of the applications of sentence embeddings is to calculate the similarity between sentences.

This Jupyter notebook contains a test with four different sentence representation approaches: TF-IDF, Doc2vec, InferSent, and Sentence-BERT.

A brief explanation of TF-IDF, Doc2vec, and InferSent:

TF-IDF — Classical information retrieval method that creates a term-document matrix. It is known for its simplicity and speed, but it falls short when it tries to capture the semantics of the document, not taking into account the similarity between words.
Doc2vec — This algorithm (also known as ParagraphVector) was proposed in 2014 by Quoc Le and Tomas Mikolov, both research scientists at Google at the time. It is based on Word2vec and it follows the same principles of training a Machine Learning model that predicts the next word, relying on the surrounding words.
InferSent — Sentence Embedding method, presented by Facebook AI Research in 2018. Just like Sentence-BERT, it uses a siamese network, but instead of BERT, it utilizes a bi-LSTM, a neural network with memory, to remember the whole sentence to encode.

Given a news dataset (1000 news descriptions), we will create representations with the four approaches.

Given a query provided by the user, it will generate a representation of that query and we will compare it with all the news descriptions representations.

The top five most similar news is printed to the notebook.

Search Query: Democrats win Republicans in Election

Five most similar news using TF-IDF

“Democrat Richard Cordray will face Republican Mike DeWine in November.”
“Ohio state Sen. Troy Balderson now will face a Democrat in an Aug. 7 special election.”
“Republican Morrisey will face Sen. Joe Manchin, a conservative Democrat who has voted for the president’s agenda 61 percent of the time.”
“Haspel looks all but assured to win confirmation in a vote before the full Senate.”
“I win either way, second-place finisher Caleb Lee Hutchinson said.”

Five most similar news using Doc2Vec

“*Sends gift basket to Marvel*”
“Democrats are targeting the seat, and a former Marine is their candidate.”
“He defeated two congressmen and will challenge Democratic Sen. Joe Donnelly in November.”
“Vote counting will begin Saturday.”
“Paulette Jordan won the Democratic primary in the Idaho governor’s race.”

Five most similar news using InferSent

“Democrat Richard Cordray will face Republican Mike DeWine in November.”
“McConnell’s official campaign account trolls West Virginia GOP primary loser.”
“Paulette Jordan won the Democratic primary in the Idaho governor’s race.”
“McGrath’s victory continues the success women and political newcomers have found in Democratic primaries.”
“Unions denounced the president’s actions an assault on democracy.”

Five most similar news using Sentence-BERT

“He defeated two congressmen and will challenge Democratic Sen. Joe Donnelly in November.”
“Democrats are targeting the seat, and a former Marine is their candidate.”
“McGrath’s victory continues the success women and political newcomers have found in Democratic primaries.”
“Paulette Jordan won the Democratic primary in the Idaho governor’s race.”
“The measure’s passage is a significant victory for voting rights advocates.”

The search query was “Democrats win Republicans in election”.

All approaches produce good results, but it seems that InferSent and Sentence-BERT have better matches.

The fifth most similar description using TF-IDF does not refer to any political election, and the most similar description using Doc2vec is not related to the query.

On the other hand, all descriptions similar to the query selected with InferSent and Sentence-BERT are related to political elections.

You can see more interesting results in the notebook, and even try to search similar news descriptions with your queries to see the most similar.

Conclusion

Sentence Embeddings are very useful for many tasks in the language understanding domain, such as similarity or sentiment analysis.

In Agent Assist, Sentence Embeddings are crucial for detecting the intent of the sentences said by the customer.

There were a lot of advances in Sentence Embeddings approaches in the last few years. There are other algorithms for producing document representations that we’ve tested but they were not explored in this post.

If you want to know more we suggest you take a look at Universal Sentence Encoder, Skip-thought or FastSent.