Entity salience has been launched into prominence by Google’s Cloud Natural Language demo. This little sales feature, sandwiched between blocks of sales copy, offers SEOs a rare glimpse into the inner workings of Google’s AI.
- Understanding entity salience
- What is natural language processing (NLP)?
- How does NLP work?
- What is an entity?
- What is entity salience?
- The entity graph
- Writing for entity salience
- Entity salience factors
- Text position
- Subject/object relationship
- Further linguistic relationships
- Entity clarity
- Mention counts
- How to use entity salience in SEO
If you want a snapshot of how Google ‘reads’ your content, all you have to do is copy it into the text box and hit Analyze. In this blog post, I’m going to unpick what Google is doing in this process and how SEOs can use the demo’s entity salience insights to write better content.
Understanding entity salience
We need to break the concept of entity salience down into a handful of component parts to understand why it could be such a big deal.
Firstly, it’s important to have a baseline understanding of what Google’s natural language processing (NLP) demo is and isn’t doing.
Next, we need to look at what constitutes an entity and how they relate to keywords.
Finally, I’ll run over the concept of salience, which is where the linguistics modules I took at university suddenly seem a lot more useful to my job than I thought they’d be.
After I’ve laid out the core concepts, I’ll use the second half of the article to offer some practical tips for using these ideas in your content optimisation.
What is natural language processing (NLP)?
Natural language processing is an activity carried out by artificial intelligence in order to understand texts. It has a huge variety of potential applications. But, for our purposes, it’s most interesting function is simulating what a human would see as important in a block of text.
How does NLP work?
The development of attention mechanisms was the catalyst for a huge leap forward in the accuracy of natural language processing. Attention mechanisms allow AIs to understand a sentence in its entirety, making it possible to understand the relationship between the first and last words and everything in between.
Prior to this innovation, AIs could only ‘read’ a text in a linear fashion, which meant they had no knowledge of the words further along in the text and only a limited recollection of past words.
In his excellent Beginner’s Guide to Attention, AI expert Chris Nicholson explains how attention works by turning two sentences into the dimensions of a matrix and mapping relationships between words. This form of attention is particularly useful for automated translation.
Self-attention is a type of attention mechanism in which one sentence is used for each dimension of the matrix, as opposed to two different sentences. Self-attention allows natural language processors to map the relationships between words in one sentence.
The advantage of attention mechanisms over their predecessors is that they are not bound by word order. Instead, they are built to comprehend the grammar and syntax of a whole sentence, which gives them a much more human-like level of comprehension than previous iterations.
What is an entity?
An entity is a named thing in a text. They are nouns and noun phrases that the AI can identify as a distinct object. Google’s entity categories include people, locations, organisations, numbers, consumer goods and more.
It’s impossible to avoid grammar when discussing natural language processing and nouns are our first stop in our wild ride through that linguistic wonderland. Nouns are ‘naming words’ that you might remember learning about in school. They usually come as part of a noun phrase (e.g. “the tall tree over there”). The broad noun category also includes names (proper nouns). In fact, a 2014 research paper from Google on entity salience focused exclusively on people’s names.
The entities that you’ll see identified in Google’s NLP demo are much more varied than just names. Google’s AI can also understand when nominal and pronominal words in the text refer to a named entity. For example, I fed an article on Liverpool FC’s midfield powerhouse Alex Oxlade-Chamberlain into the NLP demo. The demo recognised that the following words all referred to the same person:
- Alex Oxlade-Chamberlain (named)
- Oxlade-Chamberlain (named)
- midfielder (nominal)
It also recognised that Gareth Southgate (named) and manager (nominal) were the same entity. While it didn’t come up in this article, a pronominal reference, such as “He is the second-best English midfielder, behind only Jordan Henderson,” would also be recognised.
What is entity salience?
Salience is a linguistic term referring to the prominence that a word or phrase has within a particular text. Entity salience scores are always relative to the analysed text. In natural language processing, a salience score is always a prediction of what a human would consider to be the most important entities in the same text. A number of textual features contribute to the salience score. The following list is drawn from Google’s research papers and my own experimenting with the demo:
- The entity’s position in the text
- The entity’s grammatical role
- The entity’s linguistic links to other parts of the sentence
- The clarity of the entity
- Named, nominal and pronominal reference counts
I unpack each of these factors further below, when looking at how to turn entity salience to our advantage as content writers.
The entity graph
There is another factor whose impact on salience scores is hard to quantify and practically impossible to manipulate (sorry, SEOs): Google’s entity knowledge graph. Dunietz and Gillick wrote the following in their 2014 paper to explain what this graph is trying to achieve:
“All the features described above use only information available within the document. But articles are written with the assumption that the reader knows something about at least some of the entities involved.”
The authors go on to introduce the entity graph, a computation based on the better-known PageRank calculation that Google uses to determine the authority of a page based on its incoming links. Their initial computation drew on a now-deprecated API known as Freebase, which was essentially a database of connected entities. With the vast database of information that Google now has in its own Knowledge Graph, we can safely assume that their source data has improved since 2014.
The purpose of the entity graph is to simulate the wider contexts for different entities that human readers subconsciously draw upon all the time. This allows the AI to modify its salience scores for any given entity based on its connections to other entities in the text. A possible example in the article above is the salience of England (4th) compared to Bulgaria (12th) and Kosovo (16th). Each country is only mentioned once, yet England is directly linked to the footballers and manager mentioned in the article, whereas Bulgaria and Kosovo are only incidentally related.
The entity graph does not directly translate into writing practices that will help Google’s AI to process our text. However, it is a reminder (if we needed another one) that covering a single topic in-depth is usually going to be better than a shallow article or an article that tries to cover too many topics.
Writing for entity salience
SEOs and content writers can work with all of the textual salience factors to create copy with a meaning that is clear to search engines.
But before I get into that, I need to make a distinction. Entities are not keywords and entity salience is not keyword targeting. Your target keywords might well be distinct entities (e.g. “Nike trainers”), but longer tail keywords (e.g. “best Nike trainers for running”) can never be entities.
The goal of writing for entity salience is to ensure that the entities most closely related to your target keywords are the most prominent entities in your text. This means that Google will understand your focus topic and know what kind of keywords to show your page for. So, for our “best Nike trainers for running” example, we might want to ensure that “Nike trainers” and “runners” are two of the most salient entities.
If your target keyword is an entity, then, by all means, focus on its salience. However, don’t forget wider SEO best practices. Keyword stuffing will increase the salience of your entity/keyword, but it will most likely do more harm than good to the page’s search engine performance. Instead, focus on the other aspects of salience that help Google’s AI to understand your content so that it can rank it for the most relevant keywords.
SEOs haven’t been working with entity salience long enough to produce many results. I’ve started to see positive movement for one client, but nothing conclusive. However, Google’s NLP capabilities are clear enough that writing with salience in mind can only be beneficial. I’ll return to this point in the final section.
Entity salience factors
Google’s Natural Language AI uses linguistic cues to determine which parts of a text are the most important. Some of these cues line up with what we would recognise as human readers, while others are more subtle:
- The entity’s position in the text
- The entity’s subjectivity
- The entity’s linguistic relationship to other parts of the sentence
- The clarity of the entity
- Named, nominal and pronominal reference counts
- Entity graph
One of the most basic elements of salience is text position. In general, beginnings are the most prominent positions in a text. Therefore, entities placed closer to the beginning of the text and, to a lesser extent, each paragraph and sentence, are seen as more salient. The end of a sentence is also slightly more prominent than the middle.
The subject (the entity that is doing something) of a sentence is more prominent than the object (the entity to which something is being done). Take a look at these two sentences, each of which describes the same activity:
- Bilbo stole the ring.
- The ring was stolen by Bilbo.
In the first sentence, “Bilbo” has a score of 0.7, whereas “ring” has a score of 0.3. In the second, “ring” is more salient, with 0.87, whereas “Bilbo” has a score of 0.13. “Ring,” in sentence 2, has the highest prominence score across both sentences.
Why is that the case? Without going too deep into grammar, ‘by Bilbo’ is not actually the object of sentence 2, it is a prepositional phrase (or, in other grammatical models, the adjunct). A prepositional phrase is less important than either subject or object, as it provides additional information about both. Sentence 2 doesn’t actually have an object (because the verb is intransitive).
When considered alongside text position, it should be fairly easy to ensure that the target keyword is the subject of the majority of its sentences.
Further linguistic relationships
The most salient entities usually link grammatically to other words in the same sentence. Remember the attention model of NLP? This is where it comes into its own. If you use the Syntax tab in Google’s API demo, you’ll actually see a sentence-by-sentence breakdown of which words link to each other, along with a grammatical label. The example below shows how ‘water butt’ achieves a salience score of 0.76 in a moderately complex sentence (note that text position and subjectivity also play a part). The yellow words are the words referring to ‘water butt,’ the blue words are closely linked and the green words are more loosely linked.
This screenshot from the demo’s syntax tab shows how the AI has processed the first part of the sentence. We can see how the opening phrase links to so many parts of the sentence through the verb ‘take’:
The sentence also demonstrates that “water butt” does not need to be repeated artificially in every clause for it to be seen as prominent. It is more important that the other clauses and entities in the sentence depend on the target keyword for their meaning.
Google’s natural language technology is good at recognising entities but it’s not perfect. For example, I’ve found that it’s not great at recognising two entities as the same when their capitalisation changes, such as when they’re used at the start of a sentence and then again in the middle.
When analysing service page copy for a law firm and its competitors, I found that better salience scores were achieved when I capitalised terms like “Criminal Lawyer.” Writing it in this way meant that it was written consistently throughout the text and allowed the AI to see all the references as an entity.
Pluralisation also makes a difference, with “Criminal Lawyers” and “Criminal Lawyer” likely to be seen as distinct. We know that, in search, Google is capable of conflating plurals, non-plurals and close synonyms so it’s possible that this is a non-issue. However, the facts of the NLP demo remain and I wouldn’t want to ignore them completely.
My advice is to refer to your focus entities as consistently as possible throughout a text. If capitalisation makes sense, then it seems to be the safest strategy. The only thing you really need to avoid is inconsistent capitalisation. You should also be wary of how switching between acronyms and full phrases (NLP vs “natural language processing”) is affecting the AI’s understanding of your text.
The number of times an entity is referenced in your text is a simple but important salience factor. But, despite its helpfulness in natural language comprehension, do not be tempted to stray into out of date, spammy writing practices. Upping your focus entities’ mention counts should never be a thin disguise for keyword stuffing.
It’s important to remember my earlier point on Google’s ability to recognise different references to the same thing:
- Lucy Bronze – named
- defender – nominal
- she – pronominal
Don’t be afraid to substitute your focus entity for nominal and pronominal mentions, as long as it is clear that these all point to the same thing. If in doubt, write clearly. As long as it’s clear to a human reader what each nominal and pronominal phrase is relating to, it should be clear to Google’s AI.
How to use entity salience in SEO
It is still too early to tell how important entity salience optimisation is for SEO. I have used the recommendations above to make tweaks to small sets of landing pages on a couple of different client sites and have started to see a small improvement for one. With entity salience tweaks the only changes I’ve made to a handful of pages, six of 19 focus keywords have improved and none have worsened, including one page dropping onto page 1 for the first time. Interestingly, all of the significant improvements have come in and around top 10 rankings, with little change in the lower ranked pages.
My hunch is that entity salience is not going to be a silver bullet. You’re not going to be able to rinse the NLP demo with all your content and fly up the rankings, though I would expect the impact to be more noticeable if you’re starting with low-quality content.
I see entity salience as a helpful tool to add to the on-page optimisation armoury. It is most likely to be useful as a touchpoint in the content creation process. A client of mine is planning on uploading several dozen new product pages and a handful of new categories to their relatively small site, so I provided them with a stripped back version of these guidelines to help them ensure that the new content is focused and written to be search-friendly from the start.
I have also provided guidelines to another client who likes to draft their own content, but has a habit of copying chunks from other sites. For that client, my entity salience tips were part of a larger EAT (expertise, authority, trust) cheat sheet, helping them to understand the core concepts needed for high-quality content.
Finally, Google’s natural language API demo gives content writers a tool to help them craft their writing in a more structured way. The mechanisms used by natural language processors rely heavily on traditional grammar, which gives us the opportunity to dust-off our GCSE English (or equivalent) skills and think more critically about the way we write.
Ultimately, I’m all for anything that pushes marketers to create better content. Writing with the mechanics of language in mind will make us look better and our clients look more professional, both of which have benefits beyond organic rankings.
It’s not every day that SEOs get the opportunity to glimpse Google’s inner workings and we don’t know how long this API demo will exist. We need to learn what we can from these insights into entity salience, sentiment analysis and syntax while we have the chance.