I thought it would be interesting to analyze 32 sites that were penalized by the Penguin Algorithm. I also thought about looking over some sites in order to discover the reasons for which they were penalized by this new algorithm. Thanks to colleagues from SEOpedia I managed to collect 32 sites. Before starting the analysis I thought I would discover something different from what I already knew.
I am telling you honestly, I was convinced I could draw some clear conclusions based on these websites and somehow to infer Penguin’s formula. I’ll detail below. In order to do this case study I used MajesticSEO, which I think is the best tool for checking and discovering backlinks.
The penalties were:
Penguin, manual penalizing due to the links. For 3 of the websites the reasons were unknown. 25 out of the 32 sites, have been penalized by Penguin, 3 for unnatural links and 3 websites for unknown reasons. Somehow all the penalties were directly related to links. The below chart explains our findings more clearly:
TLDs: many domains had the extension .biz , but I don’t think this could have been reason for a penalty. Sites didn’t have good links, and two of them had content in English.
What were the reasons for the penalties?
The reasons behind the penalties are multiple, but mainly due to low quality links. Types of links, anchors, links to internal pages, link schemes, links sitewide/footer.
- There are sites which have been penalized even though they have the great anchor text profile, but the majority of the links are irrelevant like blog comments and web directories.
- There are sites penalized on an single anchor text (money keyword) because the types of links, without insisting very much on that particular anchor text. The remaining pages rank very well.
- There are sites penalized because they bought links, or have links that Google consider to be manipulative ( links used just for passing PR).
- There are sites penalized for obvious reasons: high percentage of exact match anchor, spam links over 90%.
- There are sites penalized for many sitewide links and exact match anchors.
- There are sites penalized for a keyword or two, where the webmasters insisted too much on that anchors. You could think also about over optimization, but i’m not so sure.
- There are penalized sites that had very good links, diversified and relevant, placed in the content, achieved slowly, during a time process (good link velocity), but they used in a exaggerating way exact match anchor text.
- There are penalized sites that have a very good anchor brand percentage, small percent of exact match anchors, many partial anchors, but the types of links are irrelevant. At first look, they appear manipulative.
- There are sites penalized for links schemes, types of links. There are 2 websites that have: 202 Root Domains and only 36 Class C IPs, 195 Root Domains and 27 Class C IP. This clearly brings up something suspicious and I think that they worked with a web directories network, hosted in the same place.
What do the analyzed websites have in common ?
- Over 90% of the links lead directly to the Homepage. Here a few questions arise: Is there only one important page on this website? Does the Homepage deserve to get all the links? Naturally, the content receives links, no matter what kind of content (video, text, images, charts). This type of content is usually posted in a blog posts or an article pages. Page Rank flows in the site, even if the external links are directed to internal pages. This type of links are called “deep links”.
Below, I will bring into discussion two terms, Link Ratio and IP Ratio, two terms that are being mentioned and taken into consideration far to less in link building campaigns.
- Link ratio or link diversity is the total number of domains divided by the total number of links. In an ideal case, this ratio would be 1, meaning one link comes from one domain. This won’t happen, yet we have to do what is required to tend with this ratio to 1. For the Homepage, the link ratio is very low. The average for the Homepage link diversity is 0.2044 and 0.2013 for the entire domain. This is considered to be very low and indicates many sitewide links.
- IP Ratio is the total number of IPs divided by the number of Root Domains. In the ideal case it would be, of course, 1. Here we have the same principle as in Link Ratio. If the ratio is low, then you can quickly see if there has been a manipulation in terms of link schemes or link farms. IP ratio is 0.6793 which is pretty good, except for the 2 sites that are at 0.1782 and 0.1391.
- Another common aspect is represented by the much higher Trust Flow and Citation Flow of the Homepage than for the whole domain. Keep in mind that domain authority is more important than homepage authority. Although the percentages are close, the numbers say otherwise. Here is the homepage average: Trust Flow: 28.5, Citation Flow: 36.5. And the domain average: Trust Flow: 22 and Citation Flow: 27.
- I am making a comparison of these variables above with some very strong domains. I have selected seomoz.org, searchengineland.com, searchenginejournal.com, bbc.co.uk and emag.ro. All this websites have a natural link profile and are authorities in their fields. Do you notice how the value of the variables is lower for the Hompage than for the whole domain? This means that these websites have many links pointing towards their internal pages ( deep links).
All the websites had very few links toward internal pages. Deep links percentage is very small.
- Regarding the anchor text, there was nothing in common, absolutely nothing. As I already mentioned in the above lines, I have seen many penalized websites that had a good anchor mix: brand anchors, brand+keyword, partial anchors. The only aspect these sites have in common, is the fact that there are no anchors on generic terms such as: here, click, site, there, article or others. Think about how people normally link to websites. Think about how you normally link. Do you use only keywords as anchors? Or only brand names? Or would you rather go for short, generic words, as for example: use, click here, more information, etc?
- However, percentage of anchors and exact match anchors linking domains (money keyword) are quite high. I have considered exact match anchors the plural and the singular of the targeted keyword. There are many situations here. For example, for the domain Shoes.com and the title “Original Shoes Online”, I have considered exact match anchors , all the anchors containing: shoes, online shoes, original shoes and original shoes online. Keep in mind that all this data represents an average for the 32 analyzed domains.
- Partial anchors percentages are not very accurate, there are many negative values ( you’ll see it in the Excel, there is a link at the end of the article). To calculate the number of partial and generic anchors I decreased from the total number of anchors, the number of brand anchors and exact match anchors. The minus values are because of the link that appear and disappear, being impossible for the crawlers of MajesticSeo to be always updated. That is why I can’t put it in a graph. I can make a comparison between the graphs above. The remaining percentage up to 100% is represented by the partial match anchors.
- Another common thing is represented by the dominant presence of the penalized keyword in almost all anchors. It’s pretty obvious that all the links included targeted keywords in their anchors.
Anchors profile below is from a website that I have made link building for a period of time of 6-7 months in a very competitive niche. Do you notice the large percentage for “other anchor text”? If you have the targeted keyword in the Title, H1-H2 and meta-description I think it’s very obvious for both engines and users what keyword is that page relevant for. For this reason I don’t think it matters anymore what kind of anchor I use for that page.
- Do you think I discovered the Penguin formula? Well, here’s the secret: there isn’t any formula, although I first believed I could find something. Penguin acts for sure in a very interesting way. It usually hits the competitive keywords in every niche. There was a case of a website that dropped from position 5 to position 15 for the targeted keyword, so not all Penguin penalties are severe ones. Everybody wants to rank number one for the most searched keyword and tries to achieve many backlinks with exact anchor text for just a keyword or two. About the penalties reasons I wrote above.
- There are no ideal percentages of how to use anchor text, meaning 70% brand anchors, 10% exact match, 10% partial match and 10% generic anchors. Depending on each niche, this percentages can be higher or lower. Each niche/ keyword is treated differently. You need to analyze the competition before starting link building process.
- There is no general, success guaranteeing formula for link mixing and there is no point in insisting on a certain type of links. Backlinks must come from different sources and the best way to obtain them depends on the site you’re optimizing.
- Every single niche or site needs a different strategy for link building and a different approach. Make a detailed analysis of the competition, make an average of data and try to bring towards your website links as to reach 10% – 15% less than average. For an average of 40 % exact anchors, try to bring 25%- 30% exact anchor links or even less.
- Definitely, you must not insist on a single keyword or a single type of links.
- Penguin is subjective, you should keep that in kind.
Although this is not the best analysis you can do and not a very exact one (I found that no analysis is 100% accurate), I did manage to get some ideas about future link building campaigns, Penguin updates and links in general. Links are penalized each day and for us to have the certainty that we won’t be penalized, we must chase links that stand the test of algorithms and the test of time. Be as selective as you can when it comes to link building. Not all links are good, they can hurt sites too.
Here‘s where you can download a file containing my entire analysis (note that some of it is in Romanian).