All is not lost

September 25, 2010 Andy 0 Comments

‘What about one of your previous backups?’ I hear you ask. Great idea I say. Trouble is, the most recent good backup I have, is dated October 3rd 2006. Bugger.

Ok, what about archive.org? Not great, but there are twenty versions of my site, mainly from 2007. That’s good, as that was when I was travelling the world and my site was getting most traffic (hence archive.org deemed it important enought to take an archive). Not ideal, but it is a good start.

In steps Sash. A generation Y work colleague and someone who has a better grasp of Google than anyone else I know. Top marks to Sash, and to Google for what we managed to do next.

Google caches websites, so you can view the content. See that little cached link on every search result you get. The good news, my site is cached. Even better, it is extensily cached.

Using some of the clever syntax google offers I managed to get all sorts or results for my site by putting in the domain name:
site:whereisandy.net

Next, I refined this to get all posts and pages (based on the wordpress url structure for the site):
allinurl: p site:whereisandy.net

This returned all 200 odd published posts. Next job, to download the html. Using another recommendation from Sash, the Firefox plugin DownThemAll, which allows you to download multiple pages linked from one page. A bit of tweaking allowed me to get just the cached pages. I also needed to stop concurrent downloads in an effort to convince Google that my computer wasn’t sending automated request and thus blocking me.

Now starts the slightly more tedious task of putting this all back into the database. A bit of scripting might be in order here…

Leave a Reply:

Your email address will not be published. Required fields are marked *