Hypermanual on Logic and Change (HYMALAC)
Division authoring/html, main page, 29.8.1995
HTML Conventions and support arrangements
General structuring conventions
Article segments use headings starting from H1; non-article
segments only use headings of type H2, H3, and H4 only.
This serves to accentuate the greater weight of
article segments.
Support arrangements
The remainder of this division has to a large extent
been superseded by the decision to not use any technique
that requries user-side changes. (9.9.1995).
Formulae
Standard HTML does not provide any support for formulae. The
currently published proposals for HTML-3 seem to provide only
a slight improvement in this respect. We therefore have to
find some practical solution to this matter, and preferably
a strategy which we can "grow with", that is, a solution that
works at short notice, but which can be extended in nice ways
in the futre. Generally speaking, one first has to decide by
what hack formulae are to be displayed in the browser,
and then decide how to accomplish that particular display,
using author discipline and/or preprocessors.
The following account of the situation and of the available options
is based on extensive discussions with Michael Ericsson at IDA
(thank you!), as well as with Lars Karlsson and other members of
RKLLAB (thank you too!).
Starting at the display end, the following possibilities come to
mind:
- Formulae are included as pictures in the HTML file that is
displayed by the browser. As pictures means, in practice, as
so-called gif files. With this solution, formulae
may be directly included in the document shown by the browser.
- Whole sections of text containing formulae are represented
as postscript files. Such files must then be accessed with a
separate link (one more click), and the browser displays them by
invoking an external viewer (e.g. Pageview or Ghostview).
This appears to be part of the standard configuration of
WWW browsers at least on Unix systems.
- Whole sections of text containing formulae are represented
as Latex source files. An external viewer is constructed which
runs these files through Latex, and displays them using
whatever viewer system is used on the platform in question.
(This solves one problem with the previous solution, namely that
postscript is not compatible acress Unix/PC/Apple platforms). The
proposal is due to Lars Karlsson).
- Formulae are written in HTML, and use as much as possible of
the HTML facilities, such as the reportoire of fonts (roman,
italic, boldface, tt). Special mathematical symbols are included
as picture objects (.gif objects) in the file-to-be
displayed.
- A variant of the previous approach is to hack the fonts.
HTML assumes six fonts (variable width and fixed-width, and
then ordinary, italic, and boldface variants of each of those two).
Each font contains a whats-it-called for each of almost 256
positions. In a quick test (made by Michael Ericsson) it was
possible to redefine specific ones of these 6*256 positions, and
obtain something reasonable on the screen.
Thus, for example, we could redefine tt boldface to contain
exponent characters, tt italic to contain index (lowered)
characters, and still have room for a considerable number of
special mathematical symbols in the upper-7-bit combinations.
(It might be sufficient to save the diacritics letters that we
need for names in just one or two of the fonts).
- Use of HTML-3 (a certain formula capability, but a weak one)
as soon as it becomes available.
- Use of newer developments, for example pdf (an
interactive successor of postscript) or of Hot Java.
Disadvantage: risky, we may bet on the wrong new technology,
and may not be available in time for the present project.
In all of these cases it becomes necessary to have a preprocessor
that translates a readable (for the author) form of the document
to the form that is given to the browser, HTML or whatever else.
For example, the last approach will require the displayed
HTML file to contain constructs such as
<tt>Ö</tt>
for a single mathematical character. This is obviously not
user-friendly.
Before a preprocessor is designed and built for alternatives 4 and 5,
it is also important to know whether the resulting files can be
expected to be readable on screens and also on paper. The file
[authoring/html/fonttest]
contains the results of a manual translation of
[topic]
(originally in Latex) into HTML format. The purpose of it is
to show as illustration of what layout quality is obtained in
alternatives 4 and 5 above. This file however has been constructed
without the special characters, so set algebra and other similar
signs have been improvised. It will presumably look a bit nicer
when these symbols have been put in; the reader will have to
imagine what it can be like.
A quick look at the source HTML file of
[authoring/html/fonttest]
will also show the necessity of a preprocessor. The formula part
of the HTML is basically unreadable.
The crucial question is then where to locate the
preprocessor, along the chain from the Hypermanual author, via the
server computer system, to the browsing system and the end reader.
There are at least the following possibilities (stated with a
preliminary opinion about each):
- Manual preprocessing by the author: he or she writes a source
file in some language other than HTML. After each edit he invokes
a translator that takes it to the target form that is chosen from
the previous list.
- Automation of alternative 1 using a sofbot.
- The preprocessing as in alternative 1 is performed by the
server. If the server does this operation each time, it may
eventually become a drag on resources. If the results are cached
(always check if the source file has been updated, if so redo
the transformation) then the load problem will presumably not be
serious. -- This alternative has an obvious advantage in terms of
convenience, but *may* be less robust for errors that surface
during the preprocessing step.
- Links from one HTML file to another file (HTML or not) are
written in such a way that they force the browser to invoke a
transformation program. (A solution along these lines was
suggested by Lars Karlsson).
- Files containing formulae, and other files that need
preprocessing, are equipped with a special extension which forces
them to be sent to a particular viewer. This viewer may e.g.
convert Latex to Postscript, and invoke a postscript viewer. It may
possibly also convert to HTML containing .gif pictures,
and send the result back to the web viewer. Or, finally, it
may use its own, non-standard viewer.
- Use of next-generation browsers which allow a section of HTML
text to be written in an alternative sub-language. The established
syntax is along the following lines (ref. Michael Ericsson):
<APP Class="Formula" Param2="..." Param3="..." ... >
where the Class parameter specifies what kind of "applet" this is,
and later parameters specify the appropriate preprocessor as well as
the actual contents of the aplet (for example, the formula in
Latex formula notation). This requires the browser to obtain the
preprocessor from the appropriate arguments (pointing to a file
at an arbitrary web server), apply it to the whole applet expression,
and substitute in the results.
Proposed solution. Faced with this somewhat bewildering
combinatorics of opportunities, I propose the following strategy and
immediate solution:
- We give authors two options: either write the text in HTML with
embedded formulae, or write it in Latex and display it via
postscript. Each author decides his or her choice between these
alternatives.
- For the clean Latex/postscript choice, we offer style files etc.
that produce a uniform and professional look.
- For the alternative of HTML-embedding, we decide on a
notation which looks as follows
<FORMAL **expression** >
where FORMAL stands for "FORMula in Applied Logic", and the
**expression** is written in a suitable notation.
- Two alternatives are foreseen for the display of
FORMAL expressions. (1, short term): By hacking math characters
into the six available HTML fonts as discussed above;
(2, longer term): By using the mechanisms 4 or 6 (the latter is
the applet mechanism) for browser-side generation
of pictures containing the formula, which then go into the text.
- The short term solution requires the participating researchers
to obtain their copy of these hacked fonts. (It may also impose
certain constraints on font size etc). It also requires FORMAL
expressions to be translated to HTML expressions referring to these
hacked fonts. We should consider two alternatives for this
translation, namely alternatives 3 and 5 above. Alternative 5 is
better if it can be made to work. As a makeshift initially,
when nothing at all
works, alternative 1 (run the transformation manually) is OK too.
- The long term solution should be that alternative 3 (server
hack) is used to transform FORMAL expressions to ordinary
applet expressions, that is,
<APP Class="Formula" Param2="..." Param3="..." ... >
combined with setting up an applet program that can be fetched by
the browser, and which further transforms the applet expression
to a picture.
The reason for this two-step process is of course (1) that it would
be clumsy to write the entire APP expression each time one has
to create a formula, and (2) that it is a smooth extension of
the short term solution.
If this strategy is accepted, it remains to decide on the formula
language that is to be used in the FORMAL expressions. Two
possibilities:
- Latex formula expressions, with appropriate restrictions.
For example, no pictures, no tables, no \mbox expressions except
maybe some very restricted cases, and so on. Advantage:
compatibility -- expressions can be directly moved over to
be used in ordinary Latex source files. Disadvantage: Latex
formula notation is actually a bit inconvenient for writing
formulae of the type we have in C.Sc. variants of logic, where
functions, variables, and constants are very often expressed
using a sequence of letters, and not just a single letter as in
ordinary math.
- A specialized notation that is introduced for this
purpose. For example, one might choose the essentials of Latex
formula expressions, but modify it so that spaces are significant,
an "identifier" is rendered as a coherent word in the display, and
certain special characters preceding an "identifier" shifts it
to another font. For example, one could let
Holds be rendered as Holds,
$alive as alive,
@turkey as turkey, and
#Shoot be rendered as Shoot in the
formulae. Everything else would be the same, for example
\subseteq for the subset-or-equal sign.
The advantage of the specialized notation would be more readable
source formulae. The disadvantage is that one can not so easily lift
them over to the Latex environment (unless someone can do the
corresponding Latex hacking...).
In summary: I have outlined a number of options and possibilities --
now it is necessary to choose between them.