FITZ DISPLAY TREE DESIGN DOCUMENT

--------------------------------------------------------------------

The Fitz world is a set of resources. There are many kinds of resources.
Resources can depend on other resources, but no circular dependencies.
The resource types are: tree, font, image, shade, and colorspace.
A document is a sequence of tree resources that define the contents
of its pages.

A front-end is a producer of Fitz worlds, that reads a file and
creates a Fitz world from its contents. Abstracting this into
a high-level interface is useful primarily for viewers and
converters.

A back-end is a consumer of Fitz worlds. The default rasterizer
is one back-end. PDF writers and other output drivers too.
I don't think there should be a special interface for these.
They are just functions or programs that take the Fitz world
and do something unspecified with it.

The resource API is what I need help fleshing out. Keep in mind
that a Fitz world should be able to be serialized to disk, so having
callbacks and other hooks into the front-end is a no no.

--------------------------------------------------------------------

TREE

The Fitz tree resource is the primary data structure.
A tree consists of nodes. There are leaf nodes and branch
nodes. Leaf nodes produce images, branch nodes combine
or change images. An image here is a two-dimensional
region of color and shape (alpha).

The display tree consists of three basic node types.
Some nodes provide shape, other nodes provide color,
and some nodes combine other nodes.

LEAF NODES

Leaf nodes provide only shape and/or color.

SOLID.
A constant color or shape that stretches out to infinity.

IMAGE.
A rectangular region of color and or shape derived
a from a grid of samples.
Outside this rectangle is transparent.
References an image resource.

SHADE.
A mesh of patches that defines a region of interpolated colors.
Outside the mesh is transparent.
References a shade resource.

PATH.
A path defines only shape.
Moveto, lineto, curveto and closepath.
Stroke, fill, even-odd-fill.
Stroke parameters (line width, line caps, line joins)
Dash pattern.

TEXT.
A text node is an optimization, efficiently combining
a transform matrix and references to glyph shapes.
Text nodes define only shape.

Text nodes have a b c d matrix coefficients and a reference to a font,
then an array of tuples with a glyph index and e and f coefficients.

Text nodes may also have a separate unicode array,
associating the (glyph,e,f) tuples with unicode character codes.
One-to-many and many-to-one mappings are allowed.
This is for text extractions, search and copy.

BRANCH NODES

TRANSFORM.
Transform nodes apply an affine transform matrix to
its one and only child.

OVER.
An over node stacks its children on top of each other,
combining their colors with a blending mode.

MASK.
A mask node has two children. The second child is masked
with the first child, multiplying their shapes.
This causes the effect of clipping.

(mask
	(path ...)
	(solid 'devicegray 0))

BLEND.
This does the magic of PDF 1.4 transparency.
The isolated and non-isolated and
knockout group stuff happens here.
It also sets the blend mode for its children.

LINK.
This is a dummy node that grafts in another
tree resource. The effect is as if the other
tree was copied here instead. This is used
for XObject forms and similar cases.
References a tree resource.

TILE.
Similar to a link node, but this node
will tile its child to cover the page.

--------------------------------------------------------------------

COLORSPACES.

A colorspace needs the capability to transform colors into
and out of any other colorspace. All colorspaces resources
referenced from the display tree are in the
form of ICC input profiles.

SHADES.

This is fairly simple. A mesh of patches as in PDF.
Three levels of detail: with full tensors, with only patches,
with linear quads. Axial and radial shadings are trivially
converted. Type 1 (functional) shadings need to be sampled.

If the backends cannot cope, it can either convert to
linear shaded triangles (clip an axial shading with a triangular
path) or render to an image.

FONTS.

There need to be a few types of font resources.
For now I am going to use FreeType, but that should not be a necessity.
The resource format for fonts should be independent.

 * Substitute fonts.
	Refer to another fall-back font resource and override the metrics.

 * Type 3 fonts.
	Each glyph is represented as a Fitz tree resource.

 * CFF fonts
	Type 1 and Type 1 CID fonts are losslessly convertible to CFF.
	OpenType fonts can have CFF glyph data.

 * TrueType fonts

An option would be to just use an FT_Face for normal font files and
not worry about the exact format.

IMAGES.

Image data could be chopped into tiles to allow for independent
and random access and more CPU-cache friendly image scaling and
color transforms.

 * JPEG encoded images.
	Save byte+bit offsets to allow random access to groups of eight scanlines.
 * Monochrome images.
	Chop into tiles. flate encoding.
 * Contone images.
	Chop into tiles. flate encoding.

--------------------------------------------------------------------

Future considerations

Reduce the number of node types. Over, mask and blend can be combined
into one node type. Link and tile nodes can be combined. Remove the
transform node type and put a transform matrix in the leaf nodes.

If we remove the transform node, the display tree is in essence
a very flat display list which only branches for clipping
and masking parts of the display list.

Representation

The display tree can be represented ...

As an in-memory tree.
As a serialized list of objects, with the structure implicit.
As a serialized list of objects, with the structure explicit by push/pop commands.

On disk, as a serialized stream of resources, which have to be
defined before they are referenced.

