XML Boiler program

Prerequisites
Introduction
The magic of chain
Scripts vs transformations
Command line
- Common options
- Chain
Supported namespaces and scripts
Your own transformations
Future plans
Support/donate

XML Boiler is a command line program (in the future I am going to make also HTTP(S) proxy interface) which automatically processes XML based on its namespaces.

This is an alpha (not thoroughly tested) release. The URLs used belows surely will change in future versions (so not preserving backward compatibility).

This program (well, almost) conforms to the specification. See the specification for more details of what this program does and what is its purpose. One important thing which does not yet work is XML validation.

Prerequisites

You need to know what is XML and what are XML namespaces before reading this document. You also need to know RDF to make anything except things like simple examples shown in this document.

To install XML Boiler, first install Python 3.7 (or above), then run:

pip install xml-boiler

Introduction

Consider an XML file with an XInclude directive:

<y xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="simple.xml"/>
</y>

To expand the XInclude directive, we can run XML Boiler as follows:

boiler -i xinclude.xml script 'http://portonvictor.org/ns/trans/XInclude#script1'

But the magic goes here: boiler can figure out which script to run, given an input XML file and possibly the precedence (such as include operation below).

boiler -i xinclude.xml chain -u http://portonvictor.org/ns/trans/precedence-include

We can chain several commands separating them by plus:

boiler --preload 'http://portonvictor.org/ns/base' -i xinclude.xml pipe \
  'script http://portonvictor.org/ns/trans/XInclude#script1 + script http://portonvictor.org/ns/base#NSClean-script'

boiler --preload 'http://portonvictor.org/ns/base' -i xinclude.xml pipe \
  'chain -u http://portonvictor.org/ns/trans/precedence-include + script http://portonvictor.org/ns/base#NSClean-script'

The later script http://portonvictor.org/ns/base#NSClean-script removes unused namespaces (http://www.w3.org/2001/XInclude in our case).

The last script above required to preload http://portonvictor.org/ns/base asset, because this asset defines the script http://portonvictor.org/ns/base#NSClean-script which we use. (Two previosus examples used asset http://www.w3.org/2001/XInclude which loaded automatically because the URL of the asset was present in the input file as a namespace URI.)

In the current version of this software assets are really loaded from local files despite of their names look like URLs. In a future vesion of this software we should make possible to load assets from real Web URLs, so allowing the namespace owners to put description of their XML tags online to be used by this software.

The magic of chain

The command chain is magical. It automatically finds a chain of tranformations of the source document into the destination namespace. To do this it consults assets located at namespaces in the documents, preloaded (preload) assets, etc.

For more details on how it works see the specification.

Why then not just to run a Unix pipeline instead of XML Boiler? The reasons are:

You need only a single command to run all transformations, no need to write a lengthy pipeline of commands.
XML Boiler decides which scripts to run automatically; you don’t need to find specific scripts and decide their order of running, it is done by XML Boiler automatically based on the namespaces in the documents.
XML Boiler can automatically extract subdocuments from the main XML document. For example (just for fun) if the input document is an XML RDF containing several XHTML documents (to be more interesting, containing a command to create tables of content), then XML Boiler would extract every XHTML file automatically and process them independently and then assemble them (with added tables of content) back into the XML container. So XML Boiler has a good support for embedding one XML document into another one (distinguished by XML namespaces).

Scripts vs transformations

The main thing that assets define are transformations.

A transformation may for example define how to transform from one XML namespace to another one.

You can (provided that the asset defining the transformation is loaded) call a transformation like this:

boiler --preload http://portonvictor.org/ns/base transform 'http://portonvictor.org/ns/base#xml-format'

A transformation can provide several alternative scripts. You can more selectively run a particular script instead of automatically chosen script for a transformation:

boiler --preload http://portonvictor.org/ns/base script 'http://portonvictor.org/ns/base#xml-format-script'

Command line

Common options

This list is not complete. Use --help for the full list of options.

-h, --help: Help message
-i INPUT, --input INPUT: input file (defaults to stdin)
-o OUTPUT, --output OUTPUT: output file (defaults to stdout)
-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}: Set log level.
-p NAMESPACE, --preload NAMESPACE: Load the specified asset before the main loop.
-r {none,breadth,depth}, --recursive {none,breadth,depth}: recursive download mode (none, breadth-first, depth-first)
-y NAME=DIR, --directory NAME=DIR: additional directory with assets
--software {package,executable,both}: determine installed software by package manager and/or executables in PATH. 'package' are now supported only on Debian-based systems. Defaults to 'both' on Debian- based and 'executable' on others.
-d DOWNLOADERS, --downloaders DOWNLOADERS: assets to be loaded before the main loop; a plus-separated list of comma-separated lists of "builtin","DIR" (DIR is given by --directory option, "builtin" is the assets distributed with XML Boiler)
-s {precedence,doc}, --next-script {precedence,doc}: next script algorithm (precedence is not supported)
-n {ignore,remove,error}, --not-in-target {ignore,remove,error}: What to do if the result XML file contains namespaces not in the target. remove is not supported.

Chain

chain or c command (boiler chain ...) runs an automatic transformation pipeline (see the specification). It accepts the name of the input file (none or - for stdin) and the following options:

-t NAMESPACE, --target NAMESPACE: target namespace (often the XHTML namespace http://www.w3.org/1999/xhtml)
-u URL, --universal-precedence URL: universal precedence (see the specification)

Supported namespaces and scripts

We currently have implicit support for the following namespaces:

`http://www.w3.org/2001/XInclude` (XInclude)

XInclude standard allows to include one XML document inside another.

See Wikipedia.

Extensible Modular Markup

The following transforms EMM to XHTML2:

PYTHONPATH=. ./bin/boiler -l DEBUG -i xmlboiler/tests/core/data/xml/emm.xml --preload http://portonvictor.org/ns/base --preload 'http://portonvictor.org/ns/EMM' \
pipe 'c -t https://www.w3.org/2002/06/xhtml2/ -n ignore + t http://portonvictor.org/ns/base#NSClean'

`http://portonvictor.org/ns/comment` (Comment)

Tags c:comment of this namespace are simply removed from the XML.

`http://portonvictor.org/ns/EMM/sections` (Structure)

This is transforming a EMM module into an XHTML1 document.

See the source of this document for an example. <h?> tags of the correct nesting are automatically created. This allows to generate <h?> tags of correct nesting.

<struct:toc/> automatically generates a table of contents.

`http://portonvictor.org/ns/syntax` (Syntax highlighting)

<pre syntax:format="JavaScript">function() { return 123 }</pre>

produces

function() { return 123 }

Also we support the following scripts and transformations:

http://portonvictor.org/ns/base#xml-format transformation (or http://portonvictor.org/ns/base#xml-format-script) from http://portonvictor.org/ns/base asset indents XML code.

http://portonvictor.org/ns/base#NSClean transformation (or http://portonvictor.org/ns/base#NSClean-script) from http://portonvictor.org/ns/base asset removes unused XML namespaces.

Your own transformations

You can create your own tranformations (and scripts) (after reading the specification). Currently transformations in Python and XSLT are supported, but to add support for new language is not very difficult. A script is simply a program which receives XML on stdin and prints transformed XML on stdout.

You can put your own assets (with your transformation, for example) into a directory and name the file with the asset as percent-encoded URI of the asset.

Then you use --directory NAME=DIR option to instruct XML Boiler to read assets from this directory.

Future plans

The key opportunities this project opens:

freely intermix tag sets of different sets of tag semantics (using XML namespaces), without disturbing each other (such as by name clash) in the global world
add your new tags to HTML (and other XML-based formats)
- get rid of using HTML in future Web, switch it to proper semantic XML formats
  - make XSL-format based browsers with automatic generation of XSL from other XML formats
- make automatic coloring of source listings (for example)
add macroses and include (such as by XInclude) other files in XML
intermix different XML formats, with intelligent automatic processing of the mix
- embed one XML format in another one
- automatically choose the order of different XML converters applied to your mixed XML file
make browsers to show your XML in arbitrary format
make processing XML intelligent (with your custom scripts)
integrating together XML conversion and validation scripts written in multiple programming languages
associating semantics (such as relations with other namespaces and validation rules) to a namespace
- semantics can be described as an RDF resource at a namespace URL (or a related URL)
file transformation with an automatically found “chain” of several conversions between different formats
many more opportunities
integrate all of the above in single command
(in the future) make it also a HTTP(S) proxy server

Support/donate

Support this project by money:

PayPal porton@narod.ru or this link
BitCoin 1BdUaP3uRuUC1TXcLgxKXdWWfQKXL2tmqa
Ether 0x36A0356d43EE4168ED24EFA1CAe3198708667ac0
Buy tokens at this page

If you know Python 3, participate in programming.