Posted by rjonesx.
It's all wrong
It always was. Most of us knew it. But with limited resources, we just couldn't really compare the quality, size, and speed of link indexes very well. Frankly, most backlink index comparisons would barely pass for a high school science fair project, much less a rigorous peer review.
My most earnest attempt at determining the quality of a link index was back in 2015, before I joined Moz as Principal Search Scientist. But I knew at the time that I was missing a huge key to any study of this sort that hopes to call itself scientific, authoritative or, frankly, true: a random, uniform sample of the web.But let me start with a quick request. Please take the time to read this through. If you can't today, schedule some time later. Your businesses depend on the data you bring in, and this article will allow you to stop taking data quality on faith alone. If you have questions with some technical aspects, I will respond in the comments, or you can reach me on twitter at @rjonesx. I desperately want our industry to finally get this right and to hold ourselves as data providers to rigorous quality standards.
Quick links:Home Getting it right What's the big deal with random? Why not Common Crawl? How to get random The starting point: Getting seed URLs Selecting based on size of domain Selecting pseudo-random starting points Crawl, crawl, crawl Now what? Defining metrics Size metrics Speed metrics Quality metri... Read More