My SEO Kung Fu Is More Powerful Than Your SEO Kung Fu
by Mike Grehan
So, the headline is my analogy for a conversation I seem to have had so many times. When an SEO does some competitive analysis and sees that, their efforts seem (to them) to be better than the competitor. And yet, the competitor seems to have more visibility at search engines. The conclusion they so often arrive at is that, there must be something missing in their SEO tactics, or that the competitor must be doing something sneaky.
However, the answer has, more often than not, to do with the way that search engines analyze end user behavior and fold that into the mix. There’s a whole lot more going on under the hood at search engines that can affect what ranks and what doesn’t. And more to the point, what frequently gets re-ranked as a result of end user intelligence.
Without getting too deep in the weeds, I want to take a little look under the hood to highlight some of the techniques that are pretty much standard in information retrieval terms, but rarely get a mention in SEO circles.
Did Google Just Mess Around With That Query?
Let’s start with the query itself. We imagine that the end user inputs a certain number of keywords and that a search engine then looks for documents that contain those keywords and ranks them accordingly. However, that’s not always the case. Frequently, documents in the corpus are more relevant to a query, even when they don’t contain the specific keywords submitted by the user.
That being the case, by understanding the “intent” behind a keyword or phrase, a search engine can actually expand the initial query. Query expansion techniques are usually based on an analysis of word or term co-occurrence, in either the entire document collection, a large collection of queries, or the top- ranked documents in a result list. This is not at all the same as simply running a general thesaurus check, which has proven to be quite ineffective at search engines.
The key to effective expansion is to choose words that are appropriate for the “context” or topic of the query. A good example of this would be where “aquarium” would be a good expansion for “tank” in the query “tropical fish tanks.” That would mean if you’re specifically targeting the term “fish tanks” but a page (resource) talking about “aquariums” proves to be more popular to the end user, then that’s the one most likely to be served. And subjective as it is, it’s the quality of the content end users are happy with, regardless of whether the actual words they typed appear in the content.
There are a number of different techniques for query expansion. But how does a search engine know that the expanded query provides more relevant results? The answer is “relevance feedback.”
Implicit data provided by the end user gives huge clues as to what are the most closely associated query terms. Early expansion techniques were focused on expansion of single words, but modern systems use full query terms. What this means is that semantically similar queries can be found by grouping them based on relevant documents that have a common theme, rather than (as already mentioned) the words used in the query.
This rich source of relevance data is then bolstered with click-through data. This means that every query is represented by using the set of pages that are actually clicked on by end users for that query, and then the similarity between that cluster of pages is further calculated.
Techniques for relevance feedback are not new; you can trace them back to the early Sixties. However, in this new realm of “big data” what I have described above (in a the most basic way possible to keep it simple) actually provides the “training data” (the identified relevant and non-relevant documents) for “machine learning” at search engines.
What The Heck Is “Machine Learning?”
It’s a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn stuff. Keeping it simple, an algorithm is given a set of data and infers information about the properties of the data – and that information allows it to make predictions about other data that it might see in the future.
So, having mentioned click-through data above, let’s dig just a tiny bit deeper into the importance of “implicit end user feedback.”
Whenever an end user enters a query at a search engine and clicks on the link to a result, the search engine takes a record of that click. For a given query, the click-throughs from many users can be combined into a “click-through curve” showing the pattern of clicks for that query. Stereotypical click-through curves show that the number of clicks decreases with rank. Naturally, if you interpret a click-through as a positive preference on behalf of the end user, the shapes of those curves would be as you might expect, higher ranked results are more likely to be relevant and receive more clicks. Of course, with search engines having access to “user trails” via toolbar and browser data (literally being able to follow you around the web) they now have an even richer seam of data to analyze and match for ranking.
Learning From Clicks.
Google and other search engines receive a constant stream of data around end user behavior. And this can immediately provide information about how much certain results in the SERPs are preferred over others (users choosing to click on a certain link or choosing not to click on a certain link). It’s no hard task for a search engine such as Google to design a click-tracking network by building an artificial neural network (more specifically, a multilayer perceptron (MLP) network). And this is a prime example of “machine learning” in action. No, I’m not going to explain the inner workings of an artificial neural network. There’s tons of data online if you’re so inclined to go find it.
But I do want to, in a simple fashion, explain how it can be used to better rank (and often re-rank) results to provide the end user with the most relevant results and best experience.
“Take no thought of who is right or wrong or who is better than. Be not for or against.”
First the search engine builds a network of results around a given query (remember the query expansion process explained earlier) by analyzing end user behavior each time someone enters a query and then chooses a link to click on. For instance, each time someone searches for “foreign currency converter” and clicks on a specific link, and then someone else searches for “convert currency” and clicks on exactly the same link and so on. This strengthens the associations of specific words and phrases to a specific URL. Basically, it means that a specific URL is a good resource for multiple queries on a given topic.
The beauty of this for a search engine is that, after a while the neural network can begin to make reasonable guesses about results for queries it has never seen before, based on the similarity to other queries. I’m going to leave it there as it goes well beyond the scope (or should I say purpose) of this column to continue describing deeper levels of the process.
There are many more ways that a search engine can determine the most relevant results of a specific query. In fact, they learn a huge amount form what are known as “query chains,” which is the process of an end user starting with one query, then reformulating it (taking out some words or adding some words). By monitoring this cognitive process, a search engine can preempt the end user. So the user types in one thing at the beginning of the chain and the search engine delivers the most relevant document that usually comes at the end of the chain.
In short, search engines know a lot more about which media are consumed by end users and how, and which is deemed the most relevant (often most popular) result to serve given end user preferences. And it has nothing to with which result had whatever amount of SEO work on it.
I’ve written a lot over the years about “signals” to search engines, in particular the importance of end user data in ranking mechanisms. In fact, it’s coming up to ten years now (yes, ten years!) since I first wrote about this at ClickZ.
And, on a regular basis, I still see vendor and agency infographics suggesting what the strongest signals are to Google. Yet rarely do you see end user data highlighted as prominently as is should be. Sure, text and links send signals to Google. But if end users don’t click on those links or stay on a page long enough to suggest the content is interesting, what sort of signal does that send? A very strong (and negative) one I’d say.
So, going back to the headline of this column. Next time you scratch your head, comparing your “SEO Kung Fu” to the other guy, give some extra thought to what search engines know… And unfortunately you don’t.