uweschmidt.org

42 / π ≈ 13.37 

“Freezing” WordPress

While a visiting student in Canada, I blogged occasionally (in German) to let my family and friends know what I was doing abroad. I used Wordpress for that purpose, which was set up as a farewell gift by a friend (who also happens to be my webhost). When I launched my personal website, I decided to use Drupal but still have the old blog up in a “read-only” state.

As time went by, the Wordpress installation aged and security concerns arose due to the inevitable discovery of bugs in popular software. Spammers also occasionally suceeded in bypassing the blog’s Akismet spam protection. Turning comments off was easy, but I didn’t want to go through the pain of upgrading WordPress, especially because I wasn’t even using it anymore. So I decided to “freeze” the WordPress blog, turning it into a static collection of HTML pages.

1. I disabled searching and commenting. All “dynamic features” that require any sort of “intelligence” or a database have to be turned off.

2. I used the program wget with the options --mirror and --convert-links to create a local mirror of the blog (wget -mk blog.uweschmidt.org for short). This causes wget to recursively follow all links on the site and download all necessary files (--mirror) and convert all links so that they work when browsing locally (--convert-links). I’ll refer to Create a mirror of a website with Wget and Website Mirroring With wget for more explanations and other uses of wget.

I ended up with all the content pages for the blog, but wget missed the stylesheet (and hence referenced files within). Apparently, wget only deals with HTML code and thus all referenced files in CSS code weren’t downloaded. I downloaded the missing files manually and put them in the appropriate folders.

<style type="text/css" media="screen">
  @import url( http://blog.uweschmidt.org/wp-content/themes/benevolence/style.css );
</style>

3. Then came the somewhat tricky part, because my friend hadn’t set up the blog with clean URLs. He left the default setting, blog posts being accesssed like blog.uweschmidt.org/?p=49, instead of something like blog.uweschmidt.org/title-name. A page with the URL blog.uweschmidt.org/?p=49 had been downloaded as index.html?p=49. That is a problem because index.html?p=49 means that index.html will be called with the parameter p=49. This doesn’t work with static HTML files.

The problem can be solved by using a rewrite engine. I used mod_rewrite because my website is running on the Apache webserver like most other websites on the Internet. To pick up the above example, I renamed index.html?p=49 to p-49.html and added the following rule to an .htaccess file:

RewriteCond %{QUERY_STRING} ^([^&]+)=([^&]+)$
RewriteRule index.html %1-%2.html [L]

A request to index.html?p=49 or ?p=49 on the mirrored blog will be redirected to p-49.html without the user knowing it. Note that existing links don’t break and all pages remain valid in the index of search engines. The “frozen” blog is available at blog.uweschmidt.org, and it looks like a regular Wordpress blog at first sight.

Here’s the complete .htaccess file, taking care of the all required files:

DirectoryIndex index.html
RewriteEngine on
RewriteBase /

# index.html, one parameter
RewriteCond %{QUERY_STRING} ^([^&]+)=([^&]+)$
RewriteRule index.html %1-%2.html [L]

# comment feeds for posts
RewriteCond %{QUERY_STRING} ^feed=rss2&p=([^&]+)$
RewriteRule index.html feed-rss2_p-%1.html [L]

# trackbacks
RewriteCond %{QUERY_STRING} ^p=([^&]+)$
RewriteRule wp-trackback.php tb_p-%1.html [L]

# xmlrpc
RewriteCond %{QUERY_STRING} ^rsd$
RewriteRule xmlrpc.php xmlrpc-rsd.html [L]

3 Responses

  1. Josh says:
    More work than it’s worth.

    Thanks for the info, I was wondering how you could go about doing this. Looking at all the code though, I think it would have been a lot easier to just upgrade the site than do all that. But I guess you don’t have to worry about it in the future now.

  2. Uwe says: in reply to Josh
    Re: More work than it’s worth.

    Hi Josh,

    Looking at all the code though, I think it would have been a lot easier to just upgrade the site than do all that. But I guess you don’t have to worry about it in the future now.

    You’re probably right, but I welcomed the opportunity to learn more about .htaccess files. My primary objective was to have a permanent solution that, as you pointed out, won’t require any additional work in the future.

    Uwe

  3. Prentiss Riddle says:
    Not more work than upgrading

    Even if your ingenious solution were more work than upgrading WordPress once, it’s not more work than upgrading WordPress with each new release, and doing so until the end of time. This summer alone there have been four security releases of WordPress.

    A useful feature would be a command-line utility to put WordPress in read-only mode. Although it probably couldn’t protect against all cross-site scripting vulnerabilities (some of which could conceivably appear in PHP itself), there should at least be a way to protect WordPress from corrupting its own files and database.

Leave a Reply