Skip to main content

Migrating WordPress content to Jekyll: some notes

Written Dec 30, 2022, tagged under post,Jekyll,WordPress

I've just been migrating an old WordPress blog into this Jekyll one. Here are some notes from the process...

The initial conversion was performed by running the second process described on this page. That extracted the posts from the WordPress XML export, and downloaded the images too. A few problems then came up:

Image filenames

In WordPress, quite a few of the images had been scaled by appending a width to the image URL as a query string. So, for instance, an image might be referenced as <img src="a/b/c.jpg?w=300 />". The import had named those images with the suffix. To rename them back to a sensible name, I ran:

find assets -name "*?w=*" -exec bash -c 'IMG={}; mv $IMG ${IMG%?w=*}' \;

Image captions

In the output HTML files, images were wrapped in [caption] tags, such as:

[caption id="" align="alignright" width="320"]
<a href="..."><img title="Thing" alt="" src="..." /></a>
What shall we call her?! [/caption]

As there weren't too many of them, I went with a regular expression search and replace in VSCode. The search string was:

\[caption([^\]]*)\](.*?)\s*([A-Za-z0-9\s\-'"\.,\?!]+)\[/caption\]

and the replacement:

<figure>$2<figcaption>$3</figcaption></figure>

Coping with the extra content

I've now got many more posts, and the lack of "next" and "previous" buttons on posts is getting annoying. This post describes what needs to change to add them.