Memorial Day seems like a month ago, but it’s only been a couple of weeks. Thankfully, this will be my last post on the topic, and I can get back to blogging about writing code and reading books.Just for a refresher, in part 1 I migrated our old blog to this new site. In part 2 I used some Perl to fix some problems caused by the migration and spent a lot of time learning absolutely nothing about SMNP.We’re now up to Sunday afternoon.There are a few reasons I created this on my own domain hosted on my own site:
- First, I wanted access to the server logs so I could see all requests coming into the site, not just the ones captured by Google Analytics and Feedburner (or especially TypePad, whose site usage stats are a joke).
- Second, I wanted a place where I could create and host my own web mashups, and host any widgets, gadgets, and other software that I create.
- Finally, I wanted a place where I could get first hand experience with Google (and other) Webmaster Tools, specifically Sitemaps (and maybe robots.txt at a later point).
It’s kind of funny that my job involves developing for a major retail website, but I’m not provided an opportunity to learn about web crawlers or site optimization unless I do it on my own time.
I decided to get started with Sitemaps. A Google search for [WordPress Sitemap] quickly sent me to Arne Brachhold’s WordPress plugin. I went with the latest beta. Installation was pretty easy, and it worked like a charm. (I sent Arne a small donation as a few days later.)
Though the plugin supports including additional, non-WordPress generated files in the Sitemap (and even supports putting the file somewhere other than the blog directory), I wanted something a little more flexible.
Lesson of the weekend 8: The Sitemap protocol defines a index file that can point to other Sitemap files on your website.
I created an index file in the root directory, and added a pointer to the blog’s Sitemap file. I also created a small Sitemap file for the homepage and included a pointer to it as well.
This was the easy part. The last step was to create a Sitemap for the archived blog. There are numerous tools for creating Sitemaps, but I wanted to use the program developed by the mother ship. Unfortunately, this required running Python, which is something I had never done.
Lesson of the weekend 9: Thanks to the power of Unix and config files, running Python scripts, at least those created by Google, is a piece of cake.
This turned out to be no big deal. I ssh’d into my site, typed python and hit enter. A Python prompt appeared along with a message saying I was running 2.3.5. I quit out of Python (using Ctrl-D, after an amazingly useful error message told me that ‘exit‘ wouldn’t work), edited my configuration file, uploaded the files, and ran the script according to Google’s instructions. It took a few tries to get the script to execute successfully. I didn’t realize that my root folder under ssh was different than my root folder under ftp. Also, using the shortcut for my home directory (~/) didn’t work as I had expected.
With the Sitemap file created, I added another pointer in the index file and patted myself on the back. I didn’t bother creating a cron job to periodically recreate the file since I don’t expect the archived blog to ever change.
I submitted the Sitemaps to Google Webmaster Tools and Yahoo’s Site Explorer, but they take a few hours to update. I’ll save you the suspense and let you know they both validated the Sitemaps without incident.
Brief intermission: To say that I “patted myself on the back” is a bit of a misrepresentation. Here’s a better description. You know that scene in Cast Away where Tom Hank’s character finally lights the fire and triumphantly yells, “I have made fire!” (I think he’s even thumping his chest.) It was like that, but I was yelling, “I have run Python!”
Whew! I still have a ways to go here, believe you me.
There were still a number of changes I wanted to make to the blog archive. Basically there was a bunch of functionality that would no longer work because this was now a collection of HTML pages, and not a working blog. I needed to remove the subscription links, disable comments, fix the search, and some other cleanup tasks.
Lesson of the weekend 10: If you’re going to archive your blog using Deep Vacuum, get rid of the features that won’t work first.
I could have used a Perl script similar to the one I used to update Google Analytics ids, but I wasn’t sure how to handle multi-line changes. (See part 2 for details.) Instead, I decided to just put a warning on the home page, and deal with the real fix later.
To edit the HTML, I decided to download Coda. I had read a lot about it on various blogs and decided to give it a try. I made my changes on a my local version of the site, and then copy and pasted them into the server version. It worked like a charm, but the warning only appears on the archived blog’s home page. Like I said, I’ll fix the rest later.
Okay, I know I said I finish up this series in this post, but I can’t. It’s already long enough that editing is a pain even with WordPress’s edit box open really huge. I have to tell you the rest at a later time. I will say that the coolest part is yet to come, but you’ll have to wait to find what it is.
Post a Comment