Let's chat

01158 242 212

Get in touch with our team

13.11.2019

15 min read

Named Entity Determination and Its Implications for the Ways We Build Links

This article was updated on: 07.02.2022

BERT has been acclaimed in the natural language processing world for some time now as an innovative new solution to the problem of machine learning and AI understanding of language.

More recently, BERTs integration to Google’s search capabilities has enabled the search giant to better understand human language – and therefore, to get much closer to understanding the web as a human being does.

There are some fantastic resources out there to read up on the broad SEO implications of BERT, including this one from Dawn Anderson and this one from Ben Garry.

And while this latest update is the biggest change to search for five years (since Rankbrain’s introduction), it does, I believe, have potential implications for a much older patent – namely, US9002832B1 (or, as I’ll refer henceforth refer to it, the “implied links patent”).

Long story short, BERT is Google’s latest innovation in its journey to understanding the web like a human being does. By improving on previous methods of natural language processing (NLP), BERT has made Google much better equipped to understand the natural constructs of language and to reduce ambiguity that exists, naturally, in linguistics.

Think of it like Hummingbird’s much (much) more advanced younger brother. Where the Hummingbird update allowed Google to understand context and semantics, BERT is “a fundamental layer which seeks to help with understanding and disambiguating the linguistic nuances in sentences and phrases, continually fine-tuning itself and adjusting to improve” (source).

It means that Google is now better able to understand how words relate to one another and to make sense of the difference between ‘run’, ‘run’ and ‘run’ by context (‘run a tap’, ‘run a race’, ‘run a company’).

The key takeaway for the SEO industry and anyone producing content they intend to be found by an online audience is that nothing major need change – providing you’re already creating content for people rather than search engines. Put simply, we should be marketing to humans, not to algorithms, and that’s not something new. It’s just that Google is now more capable of understanding what that actually means, and attributing more value to those resources that provide the best experience for human beings.

While Google has been investing over the course of the last decade or more in its ability to digest content like a person would, that’s not to say that the search giant hasn’t always held this aspiration. Indeed, look back to some of its earliest patents and it’s clear that, in spite of the technical challenges in its path, Google has been aiming for a humanistic understanding of the web in every area.

The ‘implied links’ patent is one example of this. When Google’s PageRank was first created as an academic project for its two founders, Larry Page and Sergey Brin, one USP of the way Google would rank content was that it called on common techniques used in the world of academia; citations.

A citation, in an academic sense, is a reference to a source. In writing any academic paper and calling on previous knowledge or data, it is essential to reference that source with a citation, often included as a footnote in the document.

As such, every time a website was to cite a source, it was assumed that that source would be referenced in a similar way but, rather than simply being a note of the author and publication, the internet enabled sources to be referenced using hyperlinks, thus allowing the reader to click through to read the referenced text.

As the web has evolved, it could be argued that the quality of content in terms of how its sources are referenced has reduced – or at least that the thinking behind these references has changed. Rather than considering each reference as an academic citation, links have almost become an online currency, recognising that each one passes value to its recipient and perhaps causing a reduction in the willingness of authors to include explicit hyperlinks in their text where they reference sources outside of their own organisation.

Fewer people, it seems, want to pass link value outside of their site, even if that site is a valid resource.

This is most notable in the press landscape. Journalism has always relied on sources of information, often PRs, to provide content and stories. But while you’d perhaps like to think that quality journalism means referencing your source, the reality is that, with blanket ‘no external links’ rules like those imposed famously by Forbes, we can’t expect that journalists will cite their sources with a link.

That’s where the implied links patent comes in. Published in 2012, the patent is relatively old, but its application is, I argue, growing ever more pertinent.

What the patent states is that links can serve as indicators of quality, but that those links can be both explicit (i.e. a hyperlink) or implied by ‘another relationship between the source and the site’. Here’s the actual paragraph from the patent itself:

The link quality engine 140 uses linking data 150 that specifies links from resources to sites. The linking data can be organized in a data structure having resources, sites, and links between resources and sites. The search system can generate the linking data, for example, while populating the index by parsing indexed resources for links. A link to a site is a reference from a resource to the site, e.g., a hyperlink in an outside resource to one of the resources of the site. A link can be an express link or an implied link. An express link exists where a resource explicitly refers to the site. An implied link exists where there is some other relationship between a resource and the site.

Google Patent: Classifying sites as low quality sites

Of course, like many things, this is open to interpretation. I would argue that this is the most important element of that paragraph:

1) “A link can be an express link or an implied link. An express link exists where a resource explicitly refers to the site. An implied link exists where there is some other relationship between a resource and the site.”

That said, I know there are others in the search industry who would focus more on this:

2) “A link to a site is a reference from a resource to the site, e.g., a hyperlink in an outside resource to one of the resources of the site.”

Taken in isolation, sentence 1 above suggests that an implied link could be something other than a hyperlink. It could be argued that sentence 2 dictates that a link is a hyperlink. However, I’d argue that the “e.g.” in this sentence suggests that a hyperlink is just one example of what constitutes a link – hence my purposes for writing this article.

From this point onwards, I’m going to hypothesise based on my belief that the above patent content does indeed suggest that a ‘link’ – meaning a ‘reference’ and akin to an academic ‘citation’ – can be an explicit hyperlink or can be something else entirely, such as an unlinked mention (where a brand is written about but a hyperlink is not included).

As we know from the patent detail, “[a]n implied link exists where there is some other relationship between a resource and the site”. This, to me, means that as long as Google can understand the relationship between the resource and the site, it can ‘count’ toward its understanding of site quality and the notion of ‘link equity’ passage.

It’s therefore not to great a leap to see how BERT’s application as a framework to better understand language and entity salience will benefit those brands who have a distinctive name where their brand is mentioned but not linked. For example, give your business a name like Tesco or Monzo and you can probably expect Google to be able to recognise that its your brand to which an article is referring. The extent to which this impacts their ranking position is likely still minimal, but it’s logical to think how it might work.

The complexity comes in where a brand uses a word which is polysemous or homonymous – meaning they have multiple similar meanings or multiple unrelated meanings.

For example, a brand with a name like “Impression” may face challenges in that the word itself is a common term. It doesn’t necessarily refer to a brand.

Throw in the fact that that particular brand is a digital marketing agency and that an ‘impression’ is a common term in digital marketing, and it becomes even more challenging to see how Google will, even with contextual indicators, recognise a mention of “Impression” as a reference to the brand.

That’s not to say it’s impossible, and certainly things like capitalisation and deeper contextual relationships such as names of members of staff or office location might be signifiers, but suffice to say it’s likely to be more difficult for Google to attribute value to such unlinked mentions (hence why many in the SEO industry still believe the idea of implied links to be some way off actual application).

Building a brand, developing an entity

What is logical, and in fact documented, is the idea of Google’s entity graph and the Knowledge Graph too. This is where Google has been building, for many years, its understanding of entities, their context and their relationships to one another.

It’s why Google is able to show, with pretty high confidence, a result showing information about the musician when a searcher searches for ‘Mozart’, rather than Mozart Street in London or the town of Mozart in Idaho.

It’s also why Google is able to produce Mozart in a list of composers as a result for the search query ‘composers’:

When considering the best way to prepare your own brand for future innovation driven by BERT – and the potential widespread application of implied links – one logical approach would be to invest in population of your own entity graph.

Some ways to do this would include:

  • Populating your Google My Business profile with appropriate categorisation and complete information
  • Updating your profile on relevant quality directories with NAP consistency
  • Create complete brand profiles across social media platforms
  • Use structured data (specifically, organisation and author markup)
  • Create author pages / team pages to describe individuals in your business
  • Submit your business to Wikipedia if possible

There are many more, the essence of all of them being that you need to make clear to Google what your business is and who is involved.

PR as a brand / entity building tactic

It would be remiss for any article about building a brand to neglect to mention the tactic of PR. Indeed, since its very early days, public relations as a practice has related to brand building and the development of brand awareness. Even in today’s digital world, awareness is often cited as a benefit of digital PR work.

But while ‘awareness’ is a noble aspiration, part of what’s held digital PRs back from reporting on it as a metric is that it is, by its nature, immeasurable. We attempt to apply metrics to make it measurable – brand searches, click through rates from the SERPs, overall web traffic – but it still feels, especially for bigger brands, like these benefits of PR fall short of describing its true value.

So, what if we could describe part of the value of digital PR investment in terms of entity building? Specifically, could digital PR be used to populate Google’s entity graph for a business and to therefore help that business to benefit from the value of implied links should a time come to pass where Google is able to make use of the 2012 patent?

Here’s how we might, logically, utilise PR to feed Google’s understanding of a business:

Thought leadership / guest posting

Now, this is a tactic that has had a fairly bad rep in recent years, especially since Google’s own spokesperson Matt Cutts came out and suggested that guest blogging might not be the best tactic to support search visibility growth.

With that said, the practice of identifying target publications and approaching them with unique insight or expertise to share in the form of an article has continued with fervour – primarily, I believe, because it has benefits above and beyond the link.

Let’s say Google explicitly stated that guest blogs would no longer hold link value in SEO terms; would you stop doing it? In some cases, you might but in others, it’s logical to see how the placement of expert content, written by a spokesperson of the business and featured in a publication relevant to the industry/audience would still hold value for the business in question.

With BERT feeding off knowledge it attains from across the web, it’s logical, too, to assume that such placements, especially where the author name and business name are cited with an explicit link, could inform Google’s understanding of the brand and therefore its ability to relate the author name to the brand in future, and thus to make decisions appropriately based on that context in unlinked situations.

Topical relevance as an indicator of context

While many techniques of digital PR call on businesses to explore topics beyond their immediate knowledge sphere (e.g. the things they specifically sell), there’s logic in an argument that the topical relevance of coverage will grow increasingly important in helping Google to understand the context of a brand name.

For example, gaining coverage in a national news publication has many benefits, but given the broad focus of such outlets, it’s difficult for Google to understand too much about the brands featured therein based solely on the publication. Being featured in The Guardian doesn’t tell Google anything about what your business does.

That said, the topic of the article could do – giving further credence to the idea placing an article about contact lenses in The Mirror on behalf of a business that sells contact lenses will have more of a context-building benefit than an article about the most Instagrammable holiday locations from the same business.

It also gives further weight to the notion that a placement in a topically relevant publication overall holds more value in building a brand than one which appears in a general or non-related publication. For example, the domain rating or authority score of a website relating to home heating solutions might be less than that of a national newspaper, but inclusion in the former for a brand that sells boilers arguably holds more value than inclusion in the latter.

Aside from anything else, the topic of the publication gives valuable information to Google about the topic to which the featured business is relevant – meaning, logically, that even if the article itself is unrelated to the topic of boilers, the placement on the home heating publication is still hugely valuable.

Link duplication as a declining consideration

In slightly older-school thinking, the idea that one website would link to another multiple times has been considered to be of less value than the first, original link. The idea here is that, in a world where links are votes, a website can only vote for another website once – with each subsequent ‘vote’ being of diminishing value for that reason.

But, much like retention is celebrated in business as a signifier of quality, the continued targeting of links from one website to another has got, logically, to hold value in terms of entity understanding (obviously in cases where the recurring links come from trusted sites rather than ‘link farms’).

For example, if your business operates in the finance industry and is cited once in a finance publication, that’s great. But if the same publication references your business time and time again, and is itself a well respected publication, it makes logical sense that Google would recognise this as an indicator of your business’ persistent quality and relevance to the topic of finance.

In this way, we can be less averse to the idea of recurring links from a publication and, in fact, utilise PR techniques to build relationships that lead to recurring columns or the idea of a contributing author to appear again and again.

Since its announcement earlier this year, BERT has continued to develop thanks, in part, to its open source nature and to the digestion of more and more content every day. So we can expect its application to evolve too.

When it comes to link building, as with all areas of SEO in a post-BERT world, the advice is pretty consistent – don’t try to ‘optimise’ for BERT. What that means is that there’s no need to suddenly pivot your entire link building strategy; as long as your building links of which you can be proud and that you know you’d be proud to share with your audience, it’s all good.

What is logical, however, is that BERT may open the doors to a greater understanding of the previously difficult-to-measure benefits of PR.

I expect digital PR and traditional PR to merge into one in the coming years, with neither discipline averse to utilising techniques more typically associated with the other in order to achieve its goals (such as digital PRs using stunts and traditional PRs investing in social media). That said, the real, only differentiating factor between traditional and digital techniques lies, for me, in the ability of digital to be measurable. If BERT can help unlock new measurable benefits of PR, we as an industry should be exploring them.