Skip to content

cPageBuild Manual

cPageBuild is a static site generator (SSG) that uses the C preprocessor as back-end. This manual explains cPageBuild’s features in detail.

See the download and introduction page for a general overview.

Table of contents

Command line

  • -i input path
  • -o output path
  • -t temporary build path

These paths are mandatory and must be different from each other, or else cPageBuild will abort with an error message.

Paths with spaces must be quoted with double quotes. File and directory names with spaces are handled automatically.

The last directory level for output and build path will be created if it does not exist. The build path must not point to a directory that already contains files.

  • -a rebuild all (optional, default: no)
  • -d database file path for incremental builds (optional, default: inputpath/​cpagebuild.db)
  • -jx number of worker threads (optional, 1-16, default: 2)
  • -r recursive directories (optional, default: no)
  • -s URL with noext, noextslash, noindex, noerror (XML sitemap generation)
  • -vx verbosity (optional, 0-3, default: 2)
  • remaining parameters: compiler and its options.

Verbosity: 0 means no output, return code 0 for success. 1 means print errors only. 2 means print errors and final summary (default, best for large builds). 3 means print detailed infos per file.

Threads: too many worker threads may slow down the build. If you want to use more than four threads, you should benchmark whether that gains speed.

Document model

Each output article is built from:

  • one source article file with the same name and relative path
  • none, some, or all of the files in the include/ directory.

Source articles

Each source article in the input/ directory must:

  • have .htm or .html as file extension (case-insensitive)
  • correspond to one article file in the output/ directory.

This implies that source articles must not include other files from the input/ directory, only files from the include/ directory.

You could write an article and include e.g. .inc files right within the article source directory, but then these .inc files would not be tracked for incremental builds. On the other hand, if an article were so long that it would be uncomfortable to edit, then it would be better to properly split it up anyway.

Include files

Files in the include/ directory may:

  • have any file extension, e.g. for inlined CSS and JS files
  • include other files from the include/ directory.

Incremental builds

cPageBuild uses an implicit dependency model that the project must match in order to profit from incremental builds.

Each output article file is assumed to depend on:

  • one corresponding source article file
  • all files in the include/ directory
  • all defines handed to the compiler
  • the platform because the database it not portable.

Dependency resolution

cPageBuild uses a database that stores file timestamps to notice changes. If they are any different from the last run, no matter whether older or newer, this will trigger a rebuild.

A hash of the defines handed to the compiler is also stored in the database, and any change will trigger a full rebuild.

If an error happens when building an output file, the input file will be marked for rebuild next time.

Several targets

If you want to generate several versions of your website from one source, e.g. one for desktop and one for mobile, you can control this via ifdefs in the source articles and include files. Defines such as DESKTOP and MOBILE can be handed over in the build script. A typical setup might look like this:

menu etc.

You would use cPageBuild as follows. Note the two different database files, and that the parameters after -v2 belong to the compiler:

(desktop website)
cpagebuild -i input -o output1 -t tmp -d input/cpb1.db -j4 -r -v2 gcc -DDESKTOP -x c -E -P -C -Iinclude -fdollars-in-identifiers -fextended-identifiers

(mobile website)
cpagebuild -i input -o output2 -t tmp -d input/cpb2.db -j4 -r -v2 gcc -DMOBILE -x c -E -P -C -Iinclude -fdollars-in-identifiers -fextended-identifiers

XML sitemap

The -s option activates the generation of a simple XML sitemap which search engines can use. A file named “sitemap.xml” will be generated in the output directory. Only files with corresponding source files will appear in the sitemap.

As per the sitemap standard, the sitemap is limited to 50k URLs and 50 MB file size. Nested sitemaps are not supported.

The URL must be a full URL including the protocol:

  • https://​example​.com/
  • https://​example​.com/​dir/
  • http://​www​.example​.com

The trailing forward slash may be omitted. If the URL contains spaces, the parameter must be quoted with double quotes. The spaces will be URL escaped automatically.

Spaces in filenames will be URL escaped automatically. Non-ASCII characters will be escaped correctly if the filesystem supports UTF-8, which does not work under Windows. Stick to ASCII characters instead.

The usage is:
  • -s URL : the files are appended with their .htm or .html extension.
  • -s URL noext : drop .htm or .html extensions in the sitemap.
  • -s URL noextslash : drop extension and append a forward slash.
  • -s URL noindex : drop index.htm or index.html and use directory.
  • -s URL noerror : drop HTTP error pages from the sitemap.

You can use any out of these options in any combination. Examples how output/​demo.html will appear in the sitemap file:

No option:
-s https://​example​.com

-s https://​example​.com noext

-s https://​example​.com noextslash

Which of these variants to choose depends on the website setup in the .htaccess file.

Independently of noext and noextslash, noindex removes index.htm(l) files from the end of the URL. This option has no effect on files other than index.htm(l). When using both noext or noextslash and noindex, the treatment for index.htm(l) will be as per noindex.

-s https://​example​.com noindex

This works with “DirectoryIndex index.htm(l)” in the .htaccess file. Having both an index.htm and index.html in a directory will result in a double sitemap entry.

The noerror option keeps HTTP error pages out of the sitemap if they are named errorxxx.htm(l), case-insensitive. Typically, the error pages for HTTP errors are not supposed to appear in the sitemap.

Example how output/​error404.html will not be listed in the sitemap:

-s https://​example​.com noerror
becomes dropped.

HTML processing

The C preprocessor has some limitations that you may encounter with macro definitions and preformatted text. cPageBuild solves this via specific escape sequences in the source and include files. The HTML processing will be applied to the output of the preprocessor.

The term “whitespace” means any of the characters for space, tab, CR, or LF. The escape sequences include the quotes.

Stateless sequences

  • "@dx@" delete sequence and 0-9 characters after
  • "@rx@" remove sequence and 0-9 characters before

Stateful sequences

  • "@ix@" push and set indentation removal 0/1
  • "@qx@" push and set whitespace quote removal 0/1
  • "@sx@" push and set space removal between tags 0/1
  • "@px@" pop the state i/q/s from its stack.

The i/q/s settings have each a stack of their own so that they may be interleaved in any way. Each stack has a depth of 4096 nesting levels.

The usage of a stack guarantees that you can enforce a setting within some part of a document without breaking anything before or after. This is especially important if you use that within an included file where you may not even know what the current state is.

The i/q/s sequences have 0 or 1 as parameter. Their meaning is always “keep item”, and all of them default to 1. So by default, they do not trigger text modification. See below for examples of their usage.

String concatenation

The C preprocessor does not concatenate adjacent strings. With C files, the compiler would do this after the preprocessor stage. However, this is useful e.g. for programmatically assembling CSS class names, or resource file URLs. See the demo project for an exmaple.

The delete sequence "@d0@" helps out. "@r0@" would do the same because backward and forward character removal for zero characters will only delete the sequence itself.

Suppose you have strings like this back-to-back: "foo""bar" Now you insert the delete escape: "foo"@d0@"bar" The result will be: "foobar"

If you have already quoted strings e.g. as macro parameters, you can use the escape sequence as follows: #define FOO "foo"#define BAR "bar"FOO@d0@BAR The quotes from the escape sequence are not directly visible here because they are hidden in the strings themselves. The result will be again: "foobar"

Unquoting strings

You can stringify in the C preprocessor, but not unstringify. If you want a macro that takes textual content with several words, then the macro needs to use an escape sequence. See the demo project for an example.

Removal of quotes works via a pair of "@d1@" and "@r1@". The former will delete the leading quote, the latter the trailing one.

Suppose you want to unquote this string: "two words" Now you add the delete escapes: "@d1@""two words""@r1@" The result will be: two words

Defines and escapes

Astute readers might wonder how to keep defines: "@d1@"​"#define"​"@r1@" Or with a zero width space: #​define Or with HTML escaping e.g. the ‘#’ character: #define

This is how to keep escape sequences: "@d"@d0@"1@" Or with a zero width space: "@​d1@" Or with HTML escaping e.g. a ‘@’ character: "@d1@"

Preformatting whitespace

The C preprocessor may delete consecutive whitespace, but the escape sequence for whitespace quote removal helps out. It will remove the quotes from any string that consists purely of whitespace. See the demo project for an example.

If you want the following within preformatted text: Words with two spaces then you can achieve it like this: "@q0@""@i1@"Words" "with" "two" "spaces"@pi@""@pq@"

The first sequence means “do not keep quotes around whitespace strings”. The second sequence means “keep indentation”, which you always need for preformatted text. The last two sequences restore whatever i/q setting had been active before.

Note that the order of the last two sequences does not have to match the stacking order which the first two appeared in. "@pq@""@pi@" would work as well because the i/q/s stacks are independent of each other.

Alternatively, it is also possible to directly unquote whitespace strings using "@d1@" and "@r1@", see chapter “Unquoting strings”. In this case, the "@q0@" and "@q1@" block is not necessary.

Preformatting empty lines

The C preprocessor may remove empty lines. This is usually not an issue, except for preformatted text when you want an empty line. The delete sequence helps out, just put "@d0@" on the empty line. See the demo project for an example.

Compiler comments

The C preprocessor may emit C/C++ comments at the start of the document. These will be removed automatically until the first character that is not part of a C/C++ comment and not a whitespace.

If you really want a document to start with a C/C++ comment, you can use the delete sequence "@d0@" before that, like this: "@d0@"/*C comment*/

However, this is not recommended because a valid HTML document needs a DOCTYPE as first element.

HTML minification

cPageBuild supports HTML minification with two tools: removal of indentation and removal of whitespace between tags.

Indentation: the escape sequence "@i0@" triggers “do not keep indentation”. It will keep a whitespace between text that wraps around at a line end.

However, it will remove all whitespace if a line ends with a tag and the next line starts with a tag. What it will not remove is an actual space without line break between two tags, e.g. a space between two spans that may be relevant for layout.

It will also condense consecutive whitespace within a line into a single space and convert a tab to a space - except within quoted strings. Note that the compiler may already do this anyway.

Usually, you can just have that sequence in the include file with the document header and leave it active for the whole document. The exception is that within preformatted text, indentation must be kept.

See the example under “Preformatting whitespace” with "@i1@" and its closing "@pi@". Always include the "@i1@" and "@pi@" pair with preformatted text even if you do not use "@i0@" globally. This way, you will keep the option of removing indentation without breaking preformatted text.

Tag whitespace: the escape sequence "@s0@" triggers “do not keep whitespace between tags”, specifically between > and < characters. Unlike "@i0@", this also includes actual spaces. Applying that for the whole document is not recommended because unexpected layout changes are too likely.

Use "@s0@" within predefined widgets and especially multi-line macros with indentation. After the end of the widget or macro, restore the state with "@ps@". See the demo project with the EL_PICTURE macro definition as example.