Friday, January 25, 2013

Link Equity Salvage: 7 Steps for Finding Your Long-Lost Links


by Garrett French
Link equity salvage is the process of finding and redirecting your site's dead pages, folders, and subdomains that still have links. These are the old and mis-redirected, unredirected or simply deleted sections of your site that webmaster tools doesn't know about since the URLs got axed more than 35 days ago.
We're talking about pages that even site crawlers aren't finding, presumably because they don't have any links from the visible pages of your site. And remember, link salvagers, you're not only recovering lost link equity here, but blocking competitive off site link salvage experts from capitalizing on your squandered links. Please read a great article on Great Methods for Reclaiming Lost Backlinks.
You don't necessarily need to rush off hunting for onsite link salvage opportunities though – especially if your site's only a couple of years old and never had a redesign. If you can say yes to 1-2+ of these criteria then definitely keep reading:
  • Your site is 5+ years old
  • Your content naturally earns editorial links
  • You've had 1+ CMS Migrations
  • You've had several major site redesigns over the years
  • You know of at least 1 mismanaged site redesign
  • You have a 10,000+ page site
  • You aren't seeking targeted keyword ranking increases
So I've broken the link equity salvage process into four parts: compiling, status checking and link checking a comprehensive-as-possible list of your site's URLs, and then redirecting them. The majority of the tools here are for compiling that critical master list of URLs.

1. Majestic SEO's Historic Index

Some folks complain of Majestic SEO's large quantity of dead links and pages. Not me.
Look for quick wins by placing your root domain (no www) into the Majestic Site Explorer. Click the Historic Index Radial. Then click explore. Download your Top Links (shows highest-value links and the pages on your site they point to) and Top Pages CSVs.
From both of these reports extract your site's URLs. Dedupe. Boom. Presto. Pow. You now have a big list of the most important pages on your site according to Majestic.
You could also run a full site report with Majestic and get all of your site's URLs with an AC rank of 1 or higher. This costs more resources but provides a more thorough list.

2. Xenu's Link Sleuth

Xenu is a relentless beast of URL discovery, and it even status checks the URLs for you. It won't find every last URL, at least it hasn't in my tests, and it obviously can't find your legacy pages that still have links from offsite like Majestic does. It finds only what's linked to onsite (as far as I understand how it works). 

3. Xenu's Orphan Checker

I haven't used this on a client's site yet, only salivated at the opportunity to try it out and run a comparison to what can be found via Majestic. Give Xenu's Orphan Checker FTP access and it looks for orphan pages with no links from anywhere your site.
My guess is that the Orphan Checker isn't going to show you anything that's been flat out deleted from your server, as can sometimes happen, so it's not a replacement for Majestic. If you're on an obsessive hunt for link equity it's worth a check though.

4. Check Links Pages, Old Directories and Press Releases

If your site has been getting editorial links and publishing press releases for years you could have links to now-dead pages from pages that Majestic may not have discovered. First you need to prospect for links pages and old press releases and then check those pages for your domain with a bulk link checker.

5. Use Google Queries to Find Legacy Subdomains

Not every salvage requires a dead page – it could be a long forgotten initiative prompted by an executive long gone from your organization. These subdomain-discovery queries, taught to me by Entrepreneur.com's SEO Jack Ngyuen, can help you find possibly-abandoned subdomains from your organization's subdomain-happy heyday of 2005.
  • *.domain.com
  • *.domain.com -inurl:www.
  • site:*.domain.com
  • site:*.domain.com -inurl:www.
Some of these queries work for some sites but not others. I assume it depends on the size and/or configuration of the site.

6. URL Status Checkers

Once you've compiled your insanely large list of URLs it's time to check, recheck, and rerecheck their status codes. Yup I'd advise at least three checks of a URL list no matter what tool you're using.
One-off URL status checkers abound. You're going to need something with a bit more capacity. I know and love the bulk URL status checker built into my scraper suite. It looks like Dixon Jones has an HTTP Status Checker too.
Whatever tool you choose, it needs to work in bulk – large bulk. If it's on your desktop it could be tying up a machine for a few days. And remember – check your lists of failed URLs a couple more times – you'll always shake out more false-positives.

7. Bulk Link Count Checkers

With your dead URLs in hand it's time to separate the wheat from the chaff. This requires a bulk link checker – ideally one into which you can paste (potentially) thousands of dead URLs.
I know of two. There are probably more but these are the only two known of at the moment.
Majestic has a nifty "Bulk Backlink Checker" built in, though it has a limit of 300 URLs (at least at my subscription level). If you've got 6,000 dead URLs to check you could run it 20 times. Also, I've built a bulk backlink checker that accepts as many URLs as you can copy and paste in – it utilizes the Linkscape data set.
Once you get your data back from either tool you can sort by number of referring domains and at last start the process of mapping and 301 redirecting your equity back where it belongs.

Share/Bookmark