HTML Conventions and support arrangements

Hypermanual on Logic and Change (HYMALAC)
Division authoring/html, main page, 29.8.1995

HTML Conventions and support arrangements

General structuring conventions

Article segments use headings starting from H1; non-article segments only use headings of type H2, H3, and H4 only. This serves to accentuate the greater weight of article segments.

Support arrangements

The remainder of this division has to a large extent been superseded by the decision to not use any technique that requries user-side changes. (9.9.1995).

Formulae

Standard HTML does not provide any support for formulae. The currently published proposals for HTML-3 seem to provide only a slight improvement in this respect. We therefore have to find some practical solution to this matter, and preferably a strategy which we can "grow with", that is, a solution that works at short notice, but which can be extended in nice ways in the futre. Generally speaking, one first has to decide by what hack formulae are to be displayed in the browser, and then decide how to accomplish that particular display, using author discipline and/or preprocessors.

The following account of the situation and of the available options is based on extensive discussions with Michael Ericsson at IDA (thank you!), as well as with Lars Karlsson and other members of RKLLAB (thank you too!).

Starting at the display end, the following possibilities come to mind:

Formulae are included as pictures in the HTML file that is displayed by the browser. As pictures means, in practice, as so-called gif files. With this solution, formulae may be directly included in the document shown by the browser.
Whole sections of text containing formulae are represented as postscript files. Such files must then be accessed with a separate link (one more click), and the browser displays them by invoking an external viewer (e.g. Pageview or Ghostview). This appears to be part of the standard configuration of WWW browsers at least on Unix systems.
Whole sections of text containing formulae are represented as Latex source files. An external viewer is constructed which runs these files through Latex, and displays them using whatever viewer system is used on the platform in question. (This solves one problem with the previous solution, namely that postscript is not compatible acress Unix/PC/Apple platforms). The proposal is due to Lars Karlsson).
Formulae are written in HTML, and use as much as possible of the HTML facilities, such as the reportoire of fonts (roman, italic, boldface, tt). Special mathematical symbols are included as picture objects (.gif objects) in the file-to-be displayed.
A variant of the previous approach is to hack the fonts. HTML assumes six fonts (variable width and fixed-width, and then ordinary, italic, and boldface variants of each of those two). Each font contains a whats-it-called for each of almost 256 positions. In a quick test (made by Michael Ericsson) it was possible to redefine specific ones of these 6*256 positions, and obtain something reasonable on the screen.
Thus, for example, we could redefine tt boldface to contain exponent characters, tt italic to contain index (lowered) characters, and still have room for a considerable number of special mathematical symbols in the upper-7-bit combinations. (It might be sufficient to save the diacritics letters that we need for names in just one or two of the fonts).
Use of HTML-3 (a certain formula capability, but a weak one) as soon as it becomes available.
Use of newer developments, for example pdf (an interactive successor of postscript) or of Hot Java. Disadvantage: risky, we may bet on the wrong new technology, and may not be available in time for the present project.

In all of these cases it becomes necessary to have a preprocessor that translates a readable (for the author) form of the document to the form that is given to the browser, HTML or whatever else. For example, the last approach will require the displayed HTML file to contain constructs such as <tt>Ö</tt> for a single mathematical character. This is obviously not user-friendly.

Before a preprocessor is designed and built for alternatives 4 and 5, it is also important to know whether the resulting files can be expected to be readable on screens and also on paper. The file [authoring/html/fonttest] contains the results of a manual translation of [topic] (originally in Latex) into HTML format. The purpose of it is to show as illustration of what layout quality is obtained in alternatives 4 and 5 above. This file however has been constructed without the special characters, so set algebra and other similar signs have been improvised. It will presumably look a bit nicer when these symbols have been put in; the reader will have to imagine what it can be like.

A quick look at the source HTML file of [authoring/html/fonttest] will also show the necessity of a preprocessor. The formula part of the HTML is basically unreadable.

The crucial question is then where to locate the preprocessor, along the chain from the Hypermanual author, via the server computer system, to the browsing system and the end reader. There are at least the following possibilities (stated with a preliminary opinion about each):

Manual preprocessing by the author: he or she writes a source file in some language other than HTML. After each edit he invokes a translator that takes it to the target form that is chosen from the previous list.
Automation of alternative 1 using a sofbot.
The preprocessing as in alternative 1 is performed by the server. If the server does this operation each time, it may eventually become a drag on resources. If the results are cached (always check if the source file has been updated, if so redo the transformation) then the load problem will presumably not be serious. -- This alternative has an obvious advantage in terms of convenience, but *may* be less robust for errors that surface during the preprocessing step.
Links from one HTML file to another file (HTML or not) are written in such a way that they force the browser to invoke a transformation program. (A solution along these lines was suggested by Lars Karlsson).
Files containing formulae, and other files that need preprocessing, are equipped with a special extension which forces them to be sent to a particular viewer. This viewer may e.g. convert Latex to Postscript, and invoke a postscript viewer. It may possibly also convert to HTML containing .gif pictures, and send the result back to the web viewer. Or, finally, it may use its own, non-standard viewer.
Use of next-generation browsers which allow a section of HTML text to be written in an alternative sub-language. The established syntax is along the following lines (ref. Michael Ericsson):
```
    <APP Class="Formula" Param2="..." Param3="..." ... >
```
where the Class parameter specifies what kind of "applet" this is, and later parameters specify the appropriate preprocessor as well as the actual contents of the aplet (for example, the formula in Latex formula notation). This requires the browser to obtain the preprocessor from the appropriate arguments (pointing to a file at an arbitrary web server), apply it to the whole applet expression, and substitute in the results.

Proposed solution. Faced with this somewhat bewildering combinatorics of opportunities, I propose the following strategy and immediate solution:

We give authors two options: either write the text in HTML with embedded formulae, or write it in Latex and display it via postscript. Each author decides his or her choice between these alternatives.
For the clean Latex/postscript choice, we offer style files etc. that produce a uniform and professional look.
For the alternative of HTML-embedding, we decide on a notation which looks as follows
```
    <FORMAL **expression**  >
```
where FORMAL stands for "FORMula in Applied Logic", and the **expression** is written in a suitable notation.
Two alternatives are foreseen for the display of FORMAL expressions. (1, short term): By hacking math characters into the six available HTML fonts as discussed above; (2, longer term): By using the mechanisms 4 or 6 (the latter is the applet mechanism) for browser-side generation of pictures containing the formula, which then go into the text.
The short term solution requires the participating researchers to obtain their copy of these hacked fonts. (It may also impose certain constraints on font size etc). It also requires FORMAL expressions to be translated to HTML expressions referring to these hacked fonts. We should consider two alternatives for this translation, namely alternatives 3 and 5 above. Alternative 5 is better if it can be made to work. As a makeshift initially, when nothing at all works, alternative 1 (run the transformation manually) is OK too.
The long term solution should be that alternative 3 (server hack) is used to transform FORMAL expressions to ordinary applet expressions, that is,
```
    <APP Class="Formula" Param2="..." Param3="..." ... >
```
combined with setting up an applet program that can be fetched by the browser, and which further transforms the applet expression to a picture.
The reason for this two-step process is of course (1) that it would be clumsy to write the entire APP expression each time one has to create a formula, and (2) that it is a smooth extension of the short term solution.

If this strategy is accepted, it remains to decide on the formula language that is to be used in the FORMAL expressions. Two possibilities:

Latex formula expressions, with appropriate restrictions. For example, no pictures, no tables, no \mbox expressions except maybe some very restricted cases, and so on. Advantage: compatibility -- expressions can be directly moved over to be used in ordinary Latex source files. Disadvantage: Latex formula notation is actually a bit inconvenient for writing formulae of the type we have in C.Sc. variants of logic, where functions, variables, and constants are very often expressed using a sequence of letters, and not just a single letter as in ordinary math.
A specialized notation that is introduced for this purpose. For example, one might choose the essentials of Latex formula expressions, but modify it so that spaces are significant, an "identifier" is rendered as a coherent word in the display, and certain special characters preceding an "identifier" shifts it to another font. For example, one could let Holds be rendered as Holds, $alive as alive, @turkey as turkey, and #Shoot be rendered as Shoot in the formulae. Everything else would be the same, for example \subseteq for the subset-or-equal sign.

The advantage of the specialized notation would be more readable source formulae. The disadvantage is that one can not so easily lift them over to the Latex environment (unless someone can do the corresponding Latex hacking...).

In summary: I have outlined a number of options and possibilities -- now it is necessary to choose between them.