Natural language processing, or NLP, is a huge part of search.
Google has been one of the leading innovators in the field for at least five years, making significant contributions to research in the field as they develop technologies for use in their products.
Pioneering NLP techniques is one of the many activities that keeps Google ahead of its search competitors. For SEOs and content creators working to improve different sites’ organic visibility, understanding more of what goes on beneath the surface of the world’s largest search engine is one of the best ways to gain an edge over our competitors.
In this post
- Meet BERT, the latest addition to Google Search
- How did we get here?
- What changed with BERT?
- Current NLP implications for organic search
- Can you optimise for BERT?
In this blog post, I’m going to take a dive into the current state of NLP in organic search. To understand where we are now, at the end of 2019, it’s important to get a sense of the steps taken to arrive at this point. I’ll also take a look at some of the biggest SEO implications of NLP’s current capabilities.
But first, we need to take a quick look at the recent announcement that has triggered the industry’s renewed interest in NLP.
Meet BERT, the latest addition to Google Search
In October 2019, news of further innovations in search broke once again, with Google announcing the integration of BERT with their search algorithms.
BERT – which stands for Bidirectional Encoder Representations from Transformers – has actually been around in some form since 2018. However, it has taken a little while for Google to integrate the technology with their organic search algorithms. We covered BERT’s announcement at the end of October, if you want to find out more.
Writing in a much-discussed post on BERT’s release, Google’s Pandu Nayak estimated that BERT would improve Google’s understanding of around 10% of English searches in the US. By taking a brief look at some touchpoints from the last five years of NLP development, we can start to get a sense of how Google has arrived at this point and what BERT is actually doing differently.
How did we get here?
2014 – Entity work by Dunietz & Gillick
In 2014, Jesse Dunietz and Dan Gillick – both employed by Google at the time – released a paper about using AI to predict the most important entities in news articles. Their entity salience research demonstrated a powerful application of natural language processors, using them to automate the process of understanding which named things in a document are more important than others.
If nothing else, Dunietz and Gillick’s work demonstrates the scope of Google’s ambition. They pioneered an application of NLP with clear usefulness for search results that has been built upon since by later innovations in natural language technology. Entity salience predictions now take into account all entities, not just named people, and make use of both better entity databases and improved text comprehension.
2016 – RankBrain and word vectors
Moz’s Dr Pete Meyers covered RankBrain and word vectors in a 2016 article that single-handedly inspired my love of content in SEO. The article is a fantastic read if you want to understand the last big iteration of Google’s NLP capabilities in search.
Pre-BERT, Google relied on vectors to understand the meaning of a word. Vectors encode mathematical relationships between words. They help machines to understand that ‘run’ and ‘ran’ have the same relationship as ‘turn’ and ‘turned,’ for example.
Word vectors enabled RankBrain – a significant AI component in Google’s algorithms – to understand words with a similar meaning, giving the search engine a better understanding of whole topic areas, as opposed to single keywords.
A limitation of word vectors is that they are context-free, as acknowledged by Google in their recent writing on NLP. For example, it is difficult for a word vector model to understand that ‘bank’ could mean both a financial institution and the edge of a river. Later innovations sought to fix this shortcoming.
2017 – Transformers and self-attention
Google’s development of the Transformer in 2017 remains one of the biggest leaps forward in NLP technology in recent years. Transformers are based on self-attention mechanisms, a kind of architecture that allows the relationships between words in a sentence to be mapped onto one another.
Transformers gave natural language processors the ability to take whole sentences into account when attempting to understand single words.
To return to entity salience as an example, when determining which entity is the most important in a sentence, a Transformer is able to look at its relationships to every other word in the sentence, as opposed to simply looking at the words immediately before or after. Transformers also revolutionised other difficult NLP tasks, such as translation.
2017-18 – Unidirectional Transformers & shallow bidirectionality
In their 2018 research paper discussing BERT, Google’s AI researchers compare their model to recent Transformer-based models, OpenAI GPT and ELMo. These two models used Transformers in different ways.
OpenAI GPT is an example of a unidirectional Transformer, which maps the relationships between words from left to right. This model lacked efficiency in that it took more steps to relate a word much later in a sentence to one much earlier. The more steps involved, the harder it is for a model to make an accurate prediction.
ELMo went one step further, combining separate unidirectional learning models, one of which is trained from left to right, and the other from right to left. In this way, it was able to make better use of a word’s context than OpenAI GPT. However, Google still saw room for improvement.
What changed with BERT?
BERT was released in 2018, but has only just been integrated into Google search. The model has applications in all sorts of areas and is expected to make a significant improvement to Google Search’s text comprehension. What sets it apart?
The B in BERT stands for bidirectionality. Simply put, BERT takes into account the words both to the left and to the right of its target. It can do this because Google’s researchers trained it using a new method: a masked learning model (MLM).
The MLM works by training the natural language processor to identify ‘masked’ words in training sentences taken from corpora of books and Wikipedia articles. In 80% of training sentences, 15% of words would be ‘masked’ (randomly removed). Another 10% of training sentences had 15% of their words randomly replaced, and the final 10% were left unchanged. This meant that the training data was slightly biased towards correct sentences, which enabled the model to grasp real language.
Unidirectional models are normally trained to predict the next word in a sequence, which works because they can’t ‘see’ what comes next. That method doesn’t work for a truly bidirectional model, which would indirectly be able to ‘see’ the word that it was guessing. The MLM method allows for the processor to be fully trained on the context of its input words.
The bottom line of a deeply bidirectional model is that it is better at working out the meanings of ambiguous words than any of its predecessors. This is why Google is able to say that queries containing small but important prepositions (words like ‘to’ and ‘for’) will be easier for its search engine to understand.
Sentence-pair training task
The MLM was not the only training task to help BERT build on its predecessors. The researchers also trained the model in identifying sentence pairs. The outcome of this task was for the model to be able to predict whether or not a pair of unseen sentences were connected.
Though the sentence pairing isn’t mentioned in Pandu Nayak’s main announcement blog post, it seems to me to be just as important a feature of BERT as the bidirectional training. It means that BERT should be very good at common search tasks, such as recognising logical answers to questions posed by users.
Summary of BERT’s improvements
BERT should do at least two things more effectively than any natural language processor before it, and will certainly improve Google’s ability to do these two things better:
- Identify how small, potentially ambiguous words can alter the meaning of a sentence.
- Identify where two different sentences are related to one another.
Current NLP implications for organic search
The previous section has outlined where BERT has improved on its predecessors, but I also want to make sure that this post has some more practical insight for SEOs. What will BERT actually change in the organic search results?
Improved search results
Google expects BERT to improve about 10% of US English searches. Some people thought that was a low figure, but 10% of all English-language searches in the US is huge. Google wasn’t bad at NLP before BERT’s integration. BERT shouldn’t change anything for the majority of simpler search terms, which will include most of the key traffic-drivers for commercial sites.
The biggest impact will be seen in those more complex searches that hinge on small but important prepositions and modifiers.
I also expect Google’s question-answering capabilities to improve thanks to BERT’s sentence pair training, and Google has already alluded to this with their suggestion that featured snippets will change. If Q&A style content is a key part of your SEO strategy, you may see fluctuations in the coming weeks.
It’s not a stretch to think that BERT’s integration will result in a higher proportion of no-click searches, in which Google is able to satisfy the user’s search intent within the search results themselves.
It is natural for SEOs to react against such a trend, but I don’t think that scaling back your informational content is the answer. Yes, SERP features may lower the clicks for certain searches, but if your website holds them you will still be more visible than any competitor in the same SERP, and your website will be referenced in voice results, which is – realistically – about as much exposure as you can hope to get in that channel.
I covered entity salience in-depth on Impression’s blog before BERT’s search integration was announced, but BERT doesn’t change anything significant in that article.
If anything, BERT’s deep bidirectionality will improve the accuracy of the search engine’s entity scores. It outperformed other natural language processors in an entity recognition task carried out by Google’s researchers, as detailed in their paper. However, there is no need for the factors contributing to an entity’s salience to change with the new technology’s arrival. As far as I can see, Dunietz and Gillick’s 2014 paper is still a good starting point for salience measurement.
Entity salience is a helpful topic area to be aware of if you’re creating or revamping content. BERT’s introduction does nothing to change that, so take a look at my earlier article if you want more information.
I expect BERT to improve Google’s sentiment analysis capabilities in the same way that it could improve the search engine’s entity salience predictions. BERT outperformed its closest competitors in the exhaustive General Language Understanding Evaluation (GLUE) benchmark, which includes a sentiment-based dataset of sentences from film reviews, indicating that it has potential in this area.
Accurate sentiment analysis is currently one of the most difficult tasks that Google has set out to achieve with its NLP technology and the results found in the Cloud API demo were hit and miss at best; Google’s NLP technology is certainly not a perfect replacement for human analysis at this point in time.
Will BERT change anything significantly for SEOs in this area? Not immediately, is my guess. It’s another step towards Google being able to make use of sentiment in their search results, but I would be very surprised if they are already doing so to any extent. This is another area that I touched on in my previous blog post, and one that we have also covered in a test of the technology’s current capabilities.
Can you optimise for BERT?
Google’s official line is that there is no need to optimise for BERT. While SEOs are right to treat such pronouncements with scepticism, I tend to agree with it in this case. However, I would suggest two strategic considerations off the back of BERT’s introduction:
1. Monitor your performance in featured snippets
I expect to see fluctuation in featured snippets over the next few weeks. If you know they’re important to your search visibility, I would monitor them and see if you can improve the quality or relevance of your content for any that you lose. We know that BERT is very good at finding links between sentences, so make the links between your content and target informational keywords as clear as possible.
2. Consider an informational content strategy if you’re not already doing so
I’m not the first to advocate a strategy driven by informational content. There are many advantages to doing so, but BERT’s integration is yet another clear sign of Google’s quest to improve the quality of informational content accessible via search. I’ve seen several examples of informational content creeping into otherwise commercial search results already, and wouldn’t be surprised if this trend continues. As a starting point, create a knowledge hub or FAQ page answering questions about your products and services. If possible, go further and keep a blog or content hub updated with the latest industry news, advice and answers to engage your audience and demonstrate a deep coverage of your target industry.