Principles and bibliography
We have formed some theoretical foundations for the standards, but they are far
from complete, so you are invited to share your ideas. This document is a required read if you want to contribute to the standard.
NDI standard concerns two major areas:
- data indexing and navigation (meta-views)
- Views on data (data transformations), data representation
Although data indexing and navigation are huge tasks per se, many theoretical and practical developments exist. The standard will not cover particular techniques or implementations but briefly. The standard will require features as part of a compliant implementation, as well as recommended features, along with pointers to relevant projects and bibliography.
The Data transformations and representation section will provide a
theoretical background, by laying out the basic notions and principles behind
Data Indexing and Navigation
- The standard will require that each datum be indexed by a set of keyword phrases, and leave the details of the storage of those indexes to the implementors
- The standard will also require that the data space be navigable, i.e. will
specify functionality for data space navigation be provided by the implementors. In the beginning, the standard will present links to possible data space navigation R&D initiatives, like the MIT one. Later the implementors themselves will drive the standard requirements in this regard
- The standard will require certain features to be presented by remote
repositories, for purposes of remote navigation.This is an example of how it would look.
Views on data, data transformations and representation:
The standard will rely on the data-centric phylosophy, where the data are the focus and result of the transformations. Data are also providing certain services, so each datum is in fact a "micro-server".
The standard will define two types of data in regards to the life span. One would be permanent and indexed, and the second will be a temporary type. A datum of the latter type can only be the result of a view, is not indexed and can only be used for presentation or as a base to any other view. A temporary datum is never stored in the system.
Data security and access rights
The standard will define a security framework for data, probably similar to the unix file access rights, with users and groups. Per-view access anyone?
Primitive data types
The standard will define primitive data types that will serve as bricks for more
complex data. The only criteria for deeming a data type primitive is the
necessity for such a type for data interoperability. Primitive data types will
be absolutely necessarily stored in a Royalty-Free (but preferably public-domain)
format. Primitive data types are viewed as something simple and
self-sufficient, as far as the client is concerned. However, even primitive data
can be cut into pieces. Also, each primitive type will have a list of required
views, that can only result in primitive data or presentation data.
List of primitive data:
- Text. A collection of words in ISO or UTF format. Might posess internal structure, such as paragraphs, columns, pages.
- Formatted text. Text with associated formatting data (RTF?), ready for display. This will ensure the lowest-denominator for formatted text.
- Lossless raster image. This will provide the lowest-denominator for images stored in a lossless format
- Lossy raster image. Lossy equivalent of the above.
- Vector image. The lowest-denominator for images stored in a vector format.
- Sampled data frame. Can hold sampled sound and other sampled data.
- Lossy sampled data frame. Equivalent of above, compressed with loss of data.(OGG/mp3/whatever)
- Link. The ubiquitous URL extended with 'datum://' for local data.
Useful complex data:
The standard will specify a list of useful complex types, along with a list of required views for each. These will be a minimal set.
Views on data:
Views are what makes manipulation with data easy. A view can be applied to
one or more data, resulting in a new datum.
PART(OFFSET[,OFFSET,..], SIZE[,SIZE,..], SIZE_UNIT)
This view should return a sub-unit of the same data type, starting with OFFSET and of SIZE SIZE_UNIT units.
Multi-dimensional data (image) can accept more than one axis as OFFSETs and SIZEs.
SIZE_UNIT can be letters, words, paragraphs, columns, pages, pagesets. Decimal points are allowed in SIZE, where every next dot reduces the granularity, e.g PART( OFFSET(.3.1), SIZE(..2), 'COLUMN' ) means fetch two words starting with the first word of the third paragraph.
Presentation views are special views that result in a presentation on a device
(such as the screen or printer or PDA or soundcard). These should contain as
little as possible information other than necessary for formatting/presentation.
To clear up the confusion, presentation view is not the datum, it is the process that generates the datum in a format fitting
for presentation. HTML_out would be a presentation view. Presentation views are
usually written in a programming language for reasons of efficiency.
It should be possible to create aggregate views and save them.
An aggregate view specifies limited information about presentation. In ideal, an aggregate view will specify structural information as to how other data fit into
the view. Decisions on how to display something are only taken care of in
presentation views. Aggregate views are also processes, however they are open
for the user to create and edit them visually or by writing commands in a
specific data-manipulation language. Aggregate views are stored internally as compiled code.
Keeping a clear separation between presentation views and aggregate views allows to have multiple presentations of the same aggregate data, as well as combine non-presentation aggregate views into more complex views, without getting mixed up with lower-level presentation if that is not desirable.
Example of aggregate view: if I had defined one metric cube of uniform bricks,
and wanted to build a house, the aggregate view of the house would be the
instructions on where, and in which order, to lay each brick (NOT the house!).
The execution of the "house" view would be the house. Note that the house could
not be yet represented as it lacks a presentation view on top of it. Now, I
could put this house in a 3D scene, add some lights and a camera, and generate a 3D rendering of
it - THAT would be the presentation!
Sometimes it might be impractical to store a complex datum as an aggregate, when
it is intended for frequent view. For example,
consider storing a typical gimp layered image, with history of cuts, fills
and other actions resulting in the final image. In such cases, the system should
generate snapshots containing the result of an aggregate, in order to reduce the
time needed to build the aggregate. For time-consuming presentation views, the system will create a
snapshot of a presentation view for quick display (e.g. raster image). Ideally, the
presentation view type will be a primitive datum. The snapshot would be marked
invalid when the component data have changed, and would be rebuilt on next
Views on complex data:
ELEMENT( NAME [,INDEX] )
Returns the datum known by NAME in the target aggregate. If the NAME holds an
array of data, INDEX will specify which one to retrieve. Please note that no additional presentation is attached to the datum returned (unless the datum is a presentation datum itself). Further application of a presentation view is required to that effect.
Queries on data:
Every datum should implement a lookup interface, accepting atomic datum as parameter. The interface should allow for incremental search forward/backward (transparent cursor), results totals.
Complex queries on data:
It should be possible to have complex queries with inclusion, span relationships, boolean expressions, etc.
Timeline and Data changes
A complying library will present support for data requiring change history, via transparent storage of diffs.
A complying library will present support for automatic update of compound data when components changed.
Appendix A. Possible SIZE_UNIT values