mydarc
 
Font size:      

xmlwiko

Fundamentally lazy, I always try to minimize my efforts to get work done...even when editing homepages. So when I planned to rewrite this site, I didn't want to fiddle with ApacheForrest XML the whole day. Instead, inspired by WiKo (the Wiki Compiler) I hacked together this little script.

It's not pretty, nor complete...but it does a good job in keeping me concentrated on creating simple pages for this website (or for Docbook XML).

Current version

The sources for this tool are now available at Bitbucket. Please visit https://bitbucket.org/dirkbaechle/xmlwiko for downloading the latest version.

Usage

Simply start the script as

python xmlwiko.py

and it will traverse the current directory. Whenever it finds a *.wiki file it converts it into an XML file, conforming to the ApacheForrest DTD.

You can also start xmlwiko with an additional argument (the exact text doesn't matter):

python xmlwiko.py db

and you get Docbook XML files instead. Easy!

Basics

An xmlwiko file (*.wiki) is an UTF8 encoded file that consists of text blocks. These blocks are separated by one or more blank lines (2+ newlines). A text block itself can't contain blank lines.

At the start of each file you should place a header with the "title" and "author" variables as follows:

@title: Title of the document
@author: Whowrote This

The markups for the text can be divided into the categories: sections, simple paragraphs, links, lists and environments.

Sections

Sections outline the structure of your text. You can indent or dedent sections to any level you like, so we need ways of adding a subsection or closing several opened sections at once (dedent).

A simple section is started by the code:

== title == [id]

As the square brackets imply, the id is optional for you...but required for the Forrest DTD. You can leave it out, then the given title will be joined by underscores "_" and the result converted to lowercase as the id of this section.

Starting a section like this, will keep the current indentation level. So if another section has been opened before, it will be closed first. If you want to open a subsection (indent) you type:

==+ title == [id]

Note the "+" that signals: "I want to increment the level of indentation".

While you can only increment by steps of one, you can dedent arbitrarily using:

==-- title == [id]

Here we dedent by two, which effectively results in closing the last three sections...and then opening the new one.

Larger levels of dedent can be directly entered with a single minus, followed by an integer number:

==-7 title == [id]

At the end of the text, all sections that are still open get closed automatically.

Finally, you can jump to a lower level of indentation by directly giving the section indent behind the starting tag:

==0 title == [id]

for starting a new section at the top level 0 (all opened sections are closed first).

Simple paragraphs

The following markups are local to a single text line. They have to appear matched, because they don't get closed automatically at the end of the block:

Emphasis (em)

This is an \\emphasis\\.

Bold (strong)

A !!bold!! word.

Double quotes

Enter ''quit'' to get out of here.

Code words (like variables or verbatim inline text)

This $$optionList$$ is never referenced.

Code words, enclosed in double quotes

The %%vlink%% attribute can be used for images.

Anchor (<anchor id=""/>)

@@label_id@@

Links, Images, Figures

These three elements confront us with a new problem: attributes. Sometimes it is simply not enough to say

<a href="test.html">test</a>
or
<img src="test.png"/>

because we want to add special attributes like "vlink" or "width". First, let's have a look at the basic forms:

[[href text]]
Creates a link to the external URL "href" with "text" as the link text.
((id text))
Creates a link within the document to the "id" with "text" as the link text. The "id" can point to a section or a defined anchor (see Paragraphs).
&&id text&&
Creates an xref entry to the "id" within the document. Although Docbook doesn't require a link "text" for this element, you have to provide a fallback for the Forrest output.
<<href>>
Places an "img" tag (or inlinemediaobject) for the image "href" at the current position.
Image: href
A graphic "href" that stands on its own line. It has no title and is specified like an environment.
Figure: href
A figure is an image "href" that stands on its own line. In contrast to the Image environment above it has a title.
Figure: href
Title/description of the figure.

Now when you want to give some special attributes to these six tags, you can split the text for the link/xref and the href for the images into two parts by a "||". Like this, the first token is always recognized as the link target (or image source). Then, the list of optional attributes follow and the link text is found at the end of the expression.

Some examples:

[[index.html vlink="#C0C0C0"||Visit this page!]]
<<test.png||alt="test" width="80%">>
((examples xrefstyle="template: the examples in ch. %n"||the examples))
Figure: er2.png||alt="er2" width="100%"
The parallel computer ER2.

Filters

Sometimes you may want to pass text or input in general, to one output type only. A typical example would be a parameter like "scale" for a Docbook "imagedata", which does have no effect in Forrest and leads to an error.

Here, you can use a filter (enclosed in "**") as follows:

<<test.png||alt="test" **docbook scale="80"**>>

The parameter "scale" only gets included in the ouput if "docbook" is selected.

Currently available filters are: "forrest" and "docbook".

Lists

Within a list block you can indent/dedent the item level and also change between ordered (#), unordered (*) and description lists (~). The opening and closing of the single environments is handled by xmlwiko.

Note
The description lists (~) can't be nested further, so the single "~" is always the rightmost element of a list specification!
#first
more text for first item
#second
#parent 1
##child 1
more text for child 1 item
##*non numerated child
##*non numerated child
##child 2
###subchild 1
#child 3
##~dt||dd (description list)
#parent 2

results in:

  1. first more text for first item
  2. second
  3. parent 1
    1. child 1 more text for child 1 item
      • non numerated child
      • non numerated child
    2. child 2
      1. subchild 1
  4. child 3
    1. dt
      dd (description list)
  5. parent 2

Environments

You can open special text blocks as an "environment" by prepending a line with the block type as follows:

Abstract:
Here we write text for our
abstract...

Available environments are:

Code, Figure, Abstract, Remark, Note, Important, Warning, Caution, Keywords, TODO, Definition, Lemma, Proof, Theorem and Corollary.

Note
When using the Code environment, you still have to escape the "<" as "&lt;" and the "&" as "&amp;"!

Environments extended

When defining environments or lists, you would normally be restricted to a single paragraph. Xmlwiko works line-oriented for its parsing and a blank line usually ends an environment/listitem.

If you'd rather like the script to understand the empty line as "start a new paragraph" , you can enclose the environment in a braces block, like this:

{{TODO:
This is the first paragraph.

And here is another...
}}

This gives:

TODO:

This is the first paragraph.

And here is another...

The same is possible for single list items, e.g.:

*First item
{{**Second indented item.

The second paragraph in this item.
}}

which yields:

  • First item
    • Second indented item.

      The second paragraph in this item.

Special stuff

You may wonder why all the markers for environments are doubled up. The reason behind this is, that we might want to print a "%%" in the middle of our text (uh, oh...there it happened already). And if there is a real "quoted" environment in the same text block, the script gets confused easily.

For this case, the "\blank" marker was introduced as a special "escape sequence". It gets replaced with a string of zero length for the final output (after all other processing work was done).

So if you want a "%%" in your text you can type:

''%\blank%''

Final question: ''What do you have to type in order to get "\blank" in the output?'' :)