SPAMBrain - How Google detect Spam Links or from a YOU to a whale to a LINK

Fuxx_1000_1000_px_min.png__PID:0dcd7be9-00ba-4ea9-a31f-46034f15a070

Disclaimer

​​​​​​​We only show parts of the link evaluation and spam detection.Most of it is current technology. If we @FoxxUP can realize this technologically, then Google certainly can.

All Equal or Each Individually

In the early days of Google, in the 90s, as a search engine, Google's success was based on Pagerank. "A link is a recommendation!" was the starting point. A rating system that delivered great results, better than other search engines. Pagerank assigns a rating to each page, i.e. each link receives the same rating, e.g. if a PR 5 has been calculated for a page (displayed in the toolbar), then the PR 5 applies to each link. Google still uses this rating system today, but it has lost importance because other methods are available.

Pagerank Pyramid.png__PID:34d388d7-f88d-429c-8c09-31db88969bad

New score systems

We are pretty sure that there will be a second or more scoring systems. One of these systems seems to be SPAMbrain, of which we don't know that much about how it works. What is known is that it uses big data and machine learning. Some patents from Google show possible parts or ways of working. Just because a patent application has been filed does not mean that this is being used productively by Google.

Firefly New score systems that looks much more in detail 28937.jpg__PID:5072e950-f81c-4c40-89ff-a9ea4e34d388

Let's make a few changes of perspective.

Google Perpective.png__PID:4009ffa9-ea4e-44d3-88d7-f88d329c0c09

We are pretty sure that there will be a second or more scoring systems. One of these systems seems to be SPAMbrain, of which we don't know that much about how it works. What is known is that it uses big data and machine learning. Some patents from Google show possible parts or ways of working. Just because a patent application has been filed does not mean that this is being used productively by Google.

Firefly The Hitchhiker's Guide to the Galaxy- by Douglas Adams, there is a famous scene in which a w.jpg__PID:72e950f8-1c7c-4009-bfa9-ea4e34d388d7

Another change of perspective.

Let's pretend we are a link on a page (In "The Hitchhiker's Guide to the Galaxy" by Douglas Adams, there is a famous scene in which a whale suddenly comes to life during its free fall to earth and becomes aware of its existence.

Within seconds, the whale develops a primitive but remarkable self-awareness and begins to explore the world around it until it finally hits the ground).

Change in perspective - You are a LINK

Firefly The Hitchhiker's Guide to the Galaxy- by Douglas Adams, there is a famous scene in which a w.jpg__PID:72e950f8-1c7c-4009-bfa9-ea4e34d388d7

Another change of perspective.

Link: Where am I? It looks like I'm standing in the middle of a crowd of something.

What is around me? Around me are lots of letters, no, words.

Who am I?
There's a sign on me that says "Buy me now!" Maybe that's my name.

What do I look like? Oh, there's something round on me, it could be a button, and there's also something like an arrow pointing somewhere.

What can I do? I can't walk, but maybe I can press this button.

Is there someone else there? A voice sounds from the button:
"Yes."Hello, I'm "Buy me now".

And who are you?

Firefly The Hitchhiker's Guide to the Galaxy- by Douglas Adams, there is a famous scene in which a w.jpg__PID:72e950f8-1c7c-4009-bfa9-ea4e34d388d7

Link2: Uh, oh, there's a voice, "yeah, uh, here I am."

(Man, how embarrassing.) "My name is "Click here".

Buy me now: "Great, now I know someone. Uh, tell me, what's your situation?"

Click here: There are many letters around me and many others like me, with different names of course.

Buy me now: I also have words and I think it's about ....

Click here: "Mine too. ...."

And of course it could go on ...

Up to this point, it should be enough to understand through the change of perspective what Google technology looks at when it comes to links.

All the questions above can be answered with data, so here's a summary:

Anchor tag in HTML - A Comprehensive Guide.jpg__PID:31289835-f6f5-4b3b-a87c-6f82d54dc321

Who am I?

Name is in the HMTL A tag, but target and size etc. can also be determined programmatically.

layout recognition.png__PID:605f5b36-9bbf-4386-8c7d-db50403b31cc

Where am I?

First a layout recognition (e.g. machine learning), i.e. a determination of header, footer, content, sidebar, etc. . If the text or texts are extracted, then the distance in the text, e.g. from the beginning of the text or from the beginning of the sentence or paragraph.

Distance can be imagined here as in an Excel table, the cell D9 in a spreadsheet is a marker in relation to the starting cell A1, forming a diagonal that defines the exact location. (see picture) What is directly around me? Can be single words or the whole sentence (window).

buynow-click.gif__PID:8b0354b6-6b42-4d5d-9c47-e7f52d7dc0db

What can I do? - I look, I change the text, the color, I open a new window and pop-up etc.

What do I look like? Am I just a link, a button, an image or ... . Am I underlined or do I have the same color as the background?

Am I covered by something?
Technically detectable by rendering the entire page, but very cost-intensive and time-consuming.

Where do I link to or what do I recommend?
- The linked page should match in terms of content, and therefore you look at the other page in the same detail, with text, topic, layout etc.

Firefly Categorization, i.e. sorting into a category, 77259.jpg__PID:21d7fdeb-784a-42ac-9072-e950f81c7c40

Which topic?

Categorization, i.e. sorting into a category, is rarely considered, but is important, because SPAM links, for example, often do not fit thematically.Natural language processes can be used to create a summary or algorithms can be used to find the possible ranking keywords. The keywords can then be checked for similarity to each other.

Firefly How many of my kind are there- 7316.jpg__PID:eb784a12-ac50-42e9-90f8-1c7c4009ffa9Firefly Categorization, i.e. sorting into a category, 77259.jpg__PID:21d7fdeb-784a-42ac-9072-e950f81c7c40

Additional questions that can be answered but are not mentioned in the dialog above:

How many of my kind are there? - Counting the links, sounds quite simple, but a basis. The fewer links, the more important, more links, the less value.

Where are the others? - Find out in which part of the page layout the link is located and pay attention to the concentration. A link in a big dropdown menu with hundreds of other links is unlikely to receive the same attention or scoring as a link in the content.

Technically a combination of layout recognition and link recognition in the HTML code.

How do I feel or should I feel? 

What references and familiar terms can be found?

Just 2 more questions answered in detail, and you could write several articles on this alone.

All these questions can be answered with data.

Let me repeat that.

If we can do it, then Google probably can too!

Scoring for each link individually?

The traditional system widely known Pagerank scoring as well as all its derivatives e.g. Page Authority (see DR and PR https://moz.com/learn/seo/domain-authority), is based on the idea of one value for all.

The above change of perspective has made it clear that with current technology, it is possible to calculate a score per link much more accurately.

WebSpamReport2021_SpamBrain.png__PID:9bad0644-fcec-477f-b6a7-0eb2ce807d76

Which topic?

This is how it could work:

Each link on a page receives an individual score, which is made up of position, number of links, link density, link name, etc. (see questions above)

Simplified example with a score from 0 to 100:

  • Website with 500 links.
  • Huge drop-down menu because it's so cool for the user.
  • Footer with the important links, disclaimer and co.
  • Sidebar with advertising for a friendly partner company.
  • Sidebar with advertising for a friendly partner company.
  • Link from the content area: Score = 15 Link from the dropdown: Score = 0.2 Link from the footer: Score = 0.1
Firefly Manual selection and a pre-selection based on data can be assumed. 86399_0.jpg__PID:ac5072e9-50f8-4c7c-8009-ffa9ea4e34d3

A machine (ML) is trained with enough positive examples of pages that have been approved by the quality reviewers.

The ML then provides a prediction of how likely it is that a link is a SPAM link.

As already described in other articles, a programmatic decision is made.

The ML can be created individually for each topic and language, for example.

Scoring Systems - both or several in parallel?

The traditional Pagerank system is still used by Google, but with less importance in the overall ranking system.

If you take a look at the Pagerank algorithm, you will quickly realize that it answers a few questions reliably.

For example, the internal page structure of a website can be determined, as well as the top page and where an individual page is located within the page structure or categories can be identified. Understanding how Pagerank works should be basic SEO knowledge. Unfortunately, the reality is different. Perhaps this is due to the modest math lessons in schools.

Why Google still use Pagerank?

"Never touch a running system" and in this case the calculation of Pagerank has been working for over 10 years before the first ML models were even planned.A detailed evaluation of individual links is carried out in parallel to Pagerank.

Different goals suggest that there are several evaluation systems, including SPAMBrain for detecting unwanted links.

Another detailed evaluation could have been introduced with the "Helpful Content Update".Various movements in data collection (visibility index or traffic estimates), similar to a ripple effect, suggest a change in link evaluation.

Technology - Can I do the same?

Everything we have described above is current technology, at least for us.

Pagerank:

Anyone with some programming experience can test a Pagerank calculation on a small scale using PHP and Javascript.

Finding links, counting etc. works in almost any programming language.

Layout detection and text extraction are our own ML models, which are regularly adapted, retrained and improved.

Text analysis with Natural Language Processing(NLP) there are various possibilities, we prefer Python and one model per language. Text statistics (number of words, letters, sentences, etc.) are of course generated using algorithms (see existing libraries, e.g. in PHP, JS, Typescript, etc.).

Potential keywords are extracted from the text with algos, here it should be mentioned that it is not TF IDF. The algo is often mentioned, but is far too bad in reality. The same applies to Textrank Algo. We use a different approach, Transformer and N-Grams, which delivers better quality keywords and combinations, but is unfortunately not as performant. You can train an ML model to categorize a text, but our results were disappointing, which is why we developed our own solution that also works with multiple languages.

Downloading a page is web scraping and there are plenty of tutorials on the net. We operate a crawling infrastructure that processes a stream in real time and the data runs directly through the entire analysis process. This is not as performant as a crawler that only downloads and saves, but that is also not our focus as a data driven SEO solution.

Any enthusiastic programmer can try out the individual parts. If you have any questions, feel free to connect via LinkedIn and ask your question via chat. I'll be happy to help. However, it may take a few days for me to reply. Sorry in advance.

Firefly Clear pathways- Use arrows, lines, or pathways to visually guide the viewer's eye through th.jpg__PID:d7fdeb78-4a12-4c50-b2e9-50f81c7c4009

Question & Answers

What happens if I have an unwanted link? In principle, nothing. No negative consequences.

In the past, an offensive measure was a so-called. Manual Penalty, e.g. if someone has advertised the sale of links on their own site. These high-profile measures have not occurred for some years now. Spamming a fellow advertiser's project with garbage links (yes, that used to work), even that no longer works. The scoring of links obviously works.

Should I disavow backlinks? This is a very personal question. At this point, a whole group of well-known SEOs will disagree. My opinion is that individual link scoring works for BIG G. No, I think you can save yourself the work.

3 tips for links

Good Score LINK

Primarily, your link should be in the content area and fit thematically. This is nothing new for many, but take a much closer look, because it is not enough that the text fits thematically, but it is better that the entire domain or at least a suitable category is present on the linking website.

LINK BUY

Buying a link or guest posting is bad, against Google Guidelines, yet we do it every day, we just shouldn't get caught. Keep buying, but get a lot better, otherwise you're just wasting money and not making an impact. 

Better Update

 Instead of a link in a new article, use existing articles with a known DR and PR or a similar derivative of Pagerank. If the article is updated and a new link is set, it looks better for you and the operator!

Summary of this article​

A cool change of perspective on LINK "Buy me now" has shown you what you can and should know about a link.In addition to Pagerank, there are other link evaluations that are more accurate.