Static website conversion tutorial

This chapter features the conversion of a static HTML website to a WordPress site using @cm4all-wp-impex/generator and ImpEx WordPress Plugin.

The sources can be found at the ImpEx WordPress plugin GitHub repository.

About

This is a full featured example of converting a regular static website of a fictional german dentist to a WordPress site.

The web site is available offline at directory ./homepage-dr-mustermann.

You can view the website by

  • starting the PHP built-in webserver : php -S localhost:8080 -t homepage-dr-mustermann/
  • and open the website in your browser : http://localhost:8080/.

Watch the walk-trough on YouTube

Watch the video

(German audio with english sub titles.)

Conversion process

The conversion process is implemented in a single file ./index.js :

The conversion process is implemented in less than 240 lines of code thanks to package @cm4all-wp-impex/generator.

You can run the conversion script by executing ./index.js (can be found at the GitHub repository : packages/@cm4all-wp-impex/generator/examples/impex-complete-static-homepage-conversion\index.js

Ensure the right nodejs version is active before using nvm install and to install the required NodeJS dependencies using npm ci.

Ensure that you've installed the script dependencies by entering directory cm4all-wp-impex/packages/@cm4all-wp-impex/generator and executing npm ci.

The result is a folder generated-impex-import/ containing the generated ImpEx export folder layout containing the ImpEx slice JSON files and media files.

This export can now be imported into WordPress using ImpEx CLI :

impex-cli.php import -username=<adminusername> -password=<adminpassword> -rest-url=<your-wordpress-rest-api-endpoint> ./generated-impex-export/

(Replace the <placeholder> with your own values.)

Ensure your WordPress instance is empty (does not contain any pages/posts/media).

After executing the command the website contents are imported into your WordPress instance.


The example website and conversion script is intentionally simple.

Since every website is different, the conversion process cannot be universal work for every website.

By implementing additional transformation rules using the hooks known by Transformer.setup(...) function of @cm4all-wp-impex/generator almost any detail of a website can be converted to a WordPress post/page.

Whats missing ?

The example does not cover every detail of a website conversion, only the content. But that's intentional.

Possible improvements:

  • The navigation bar could be converted to a custom WordPress nav_menu.

    Navigation is different handled in FSE and classic themes. In a FSE you would generate a Navigation block, in a classic theme it works different. It depends on the target WordPress environment how to take over navigation.

  • Styles are ignored in the example.

    Because it depends on the goal of the transformation. If the content should be styled completely by a WordPress theme providing the complete styling, this is not needed.

    But if needed, style properties like fonts and colors could be introspected and transformed to FSE theme.json settings.

  • Contact form will be taken over as core/html block. Submitting the form does not work in the example.

    WordPress/Gutenberg does not provide a generic Form block. There is no option to convert the HTML form to something matching using plain WordPress / Gutenberg.

    But the form could be easily converted into a Ninja Form or any other form builder plugin available for WordPress.

    To keep the example simple and working without depending on additional plugins like Ninja Forms the example ist just converted to a core/html block.

    So it depends on your target WordPress environment (and available plugins) how the conversion will be implemented.

  • The overall layout (header/footer/main section) is also ignored (but could be converted to FSE part templates).

    But : as you might guess - all these improvements may vary depending on the goal.

The important message is : Everything is possible, but because it's individual - it's up to you 💪

Local Development using cm4all-wp-impex

  • (optional) cleanup local wp-env installation : (cd $(git rev-parse --show-toplevel) && make wp-env-clean)

  • import using ImpEx cli : $(git rev-parse --show-toplevel)/impex-cli/impex-cli.php import -username=admin -password=password -rest-url=http://localhost:8888/wp-json -profile=all ./generated-impex-export/

Full conversion script

#!/usr/bin/env node

/*
 *  @cm4all-wp-impex/generator usage example converting a whole static homepage to an impex export
 */

import { resolve, join, extname, dirname, basename } from "path";
import { readdir, readFile, mkdir, rm, writeFile, copyFile } from "fs/promises";
import { ImpexTransformer, ImpexSliceFactory } from "../../src/index.js";

/**
 * STATIC_HOMEPAGE_DIRECTORY is the directory containing the static homepage
 */
const STATIC_HOMEPAGE_DIRECTORY = new URL(
  "homepage-dr-mustermann",
  import.meta.url
).pathname;

/**
 * generator function yielding matched files recursively
 *
 * @param   {string}  dir directory to search
 * @param   {boolean} recursive  whether to search recursively
 * @param   {string|undefined}  extension file extension to match or null to match all files
 *
 * @yields  {string} path to file
 */
async function* getFiles(dir, recursive, extension) {
  const entries = await readdir(dir, { withFileTypes: true });
  for (const entry of entries) {
    const res = resolve(dir, entry.name);
    if (entry.isDirectory()) {
      yield* getFiles(res, recursive, extension);
    } else if (!extension || entry.name.endsWith(extension)) {
      yield res;
    }
  }
}

/**
 * keeps track of images and their references from html files (aka pages)
 * key is image path relative to STATIC_HOMEPAGE_DIRECTORY
 * value is array of image references
 */
const img2imgSrc_mappings = {};

/**
 * set up the ImpexTransformer singleton
 *
 * @return  {ImpexSliceFactory}
 */
function setup() {
  ImpexTransformer.setup({
    onDomReady(document, options = { path: null }) {
      // replace <header> elements with the <ul> child
      for (const section of document.querySelectorAll("header")) {
        const ul = document.querySelector("ul.pure-menu-list");
        section.replaceWith(ul.cloneNode(true));
      }

      // replace <section> elements with its inner contents
      for (const section of document.querySelectorAll("section")) {
        for (const child of section.childNodes) {
          section.parentNode.insertBefore(child.cloneNode(true), section);
        }
        section.remove();
      }

      // replace <footer> elements with <p>
      for (const footer of document.querySelectorAll("footer")) {
        const paragraph = document.createElement("p");
        //paragraph.setAttribute("class", "footer");
        paragraph.innerHTML = footer.innerHTML;
        footer.replaceWith(paragraph);
      }

      if (options?.path) {
        // grab all image references and remember them for later processing
        for (const img of document.querySelectorAll("img")) {
          const src = img.getAttribute("src");

          // compute image path relative to static webpage directory
          const imgPath = resolve(
            join(STATIC_HOMEPAGE_DIRECTORY, src)
          ).substring(STATIC_HOMEPAGE_DIRECTORY);

          // add reference to image path
          (
            img2imgSrc_mappings[imgPath] || (img2imgSrc_mappings[imgPath] = [])
          ).push(src);
        }
      }
    },
  });
  return new ImpexSliceFactory();
}

async function main() {
  // setup ImpexTransformer singleton and get a ImpexSliceFactory instance
  const impexSliceFactory = setup();

  // group files by type (html or attachment)
  const attachmentResources = [];
  const htmlResources = [];

  // iterate over all files recursively in STATIC_HOMEPAGE_DIRECTORY
  for await (const res of getFiles(STATIC_HOMEPAGE_DIRECTORY, true)) {
    const resource = res.toString();

    switch (extname(res)) {
      // stick HTML files into htmlResources
      case ".html":
        htmlResources.push({ resource });
        console.log("HTML %s", resource);
        break;
      // stick media files into attachmentResources
      case ".jpeg":
      case ".jpg":
      case ".gif":
      case ".png":
        attachmentResources.push({ resource });
        console.log("ATTACHMENT %s", resource);
        break;
    }
  }

  // get a generator function yielding ImpEx export format conformant paths
  const slicePathGenerator = ImpexSliceFactory.PathGenerator();

  // compute target directory
  const IMPEX_EXPORT_DIR = new URL("generated-impex-export", import.meta.url)
    .pathname;

  // delete already existing directory if it exists
  try {
    await rm(IMPEX_EXPORT_DIR, { recursive: true });
  } catch {}

  // create target directory
  await mkdir(IMPEX_EXPORT_DIR, { recursive: true });

  // convert html files to gutenberg annotated block content
  for (const htmlResource of htmlResources) {
    // transform html body to gutenberg annotated block content
    htmlResource.content = ImpexTransformer.transform(
      await readFile(htmlResource.resource, "utf8"),
      { path: htmlResource.resource }
    );
    // remember html metadata for later processing
    htmlResource.title =
      document.querySelector("head > title")?.textContent ?? "";
    htmlResource.description =
      document
        .querySelector('head > meta[name="description"]')
        ?.getAttribute("content") ?? "";
    htmlResource.keywords = (
      document
        .querySelector('head > meta[name="keywords"]')
        .getAttribute("content") ?? ""
    )
      .toLowerCase()
      .split(" ");

    // create ImpEx slice json content for this html file
    const slice = impexSliceFactory.createSlice(
      "content-exporter",
      (factory, slice) => {
        slice.data.posts[0]["wp:post_type"] = "page";
        slice.data.posts[0].title = htmlResource.title;
        slice.data.posts[0]["wp:post_excerpt"] = htmlResource.title;
        slice.data.posts[0]["wp:post_content"] = htmlResource.content;
        // @TODO: categories (aka keywords)
        // @TODO: add navigation
        return slice;
      }
    );

    // compute ImpEx conform slice json file path
    const slicePath = join(IMPEX_EXPORT_DIR, slicePathGenerator.next().value);
    await mkdir(dirname(slicePath), {
      recursive: true,
    });

    // write json to file
    await writeFile(slicePath, JSON.stringify(slice, null, 2));
  }

  // make media files available as ImpEx slices
  for (const attachmentResource of attachmentResources) {
    // create ImpEx slice json content for this media file
    const slice = impexSliceFactory.createSlice(
      "attachment",
      (factory, slice) => {
        // apply relative path as content
        slice.data = attachmentResource.resource.substring(
          IMPEX_EXPORT_DIR.length + 1
        );

        // compute unique image file=>[img[@src]] mapping for this attachment
        let img2imgSrc_mapping = [
          ...new Set(img2imgSrc_mappings[attachmentResource.resource] ?? []),
        ];

        // add mapping to slice metadata
        slice.meta["impex:post-references"] = img2imgSrc_mapping;

        return slice;
      }
    );

    // compute ImpEx conform slice json file path
    const slicePath = join(IMPEX_EXPORT_DIR, slicePathGenerator.next().value);
    await mkdir(dirname(slicePath), {
      recursive: true,
    });

    // write slice json to file
    await writeFile(slicePath, JSON.stringify(slice, null, 2));

    // copy attachment file to target directory with ImpEx conform file name
    await copyFile(
      attachmentResource.resource,
      slicePath.replace(".json", "-" + basename(attachmentResource.resource))
    );
  }

  // JSDOM is preventing automatic process termination so we need to force it
  process.exit(0);
}

main();

The script is also available at ImpEx WordPress plugin GitHub repository