tEXPR — Tuple Expressions
| Version: | Working Draft 0.5 |
| Author: | Christian Neukirchen <chneukirchen@yahoo.de> |
| Created: | 03dec2003 +chris+ |
| Last change: | 03may2004 +chris+ |
Changes:
- 03may2004 +chris+
- Forked tEXPR out of tRPC.
- "<" and ">" are no more special.
- 11jan2004 +chris+
- Dropped special Base64Strings, introduced string flags.
- 07jan2004 +chris+
- Changed object syntax. Objects are now called "Typed Tuples".
- 02jan2004 +chris+
- Added object handling, hashes are no-more special.
- 03dec2003 +chris+
- First draft.
Table of Contents
Introduction
tEXPR is a general purpose data transmission and storage format.
tEXPR is no [XML] replacement, don't use it for documents, that's what XML is made for; if you are developing a Internet protocol however, tEXPR is a good choice, much better than XML.
Orgin and Goals
tEXPR is being developed by Christian Neukirchen.It's design goals are (list inspired by [XML]):
- tEXPR shall be straightforwardly usable over the Internet.
- tEXPR shall support a wide variety of applications.
- It shall be easy to write programs which process tEXPR documents.
- The number of optional features in tEXPR is to be kept to the absolute minimum, ideally zero.
- tEXPR data should be human-legible.
- tEXPR data should be easy to parse by computers, being totally unambiguous and requiring maximal one char lookahead.
- The design of tEXPR shall be formal and concise.
- tEXPR documents shall be easy to create.
- Terseness in tEXPR markup is important, but not overvalued; that is, space should not be wasted, but the syntax is not as compact (due to readability and ASCII encoding) as, e.g. ASN.1.
Terminology
If you use a text browser, in the format descriptions double quotes (""") will be used to mark literal text.
If you use a CSS browser, literal text will be boxed.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [ReqLevel].
Examples
| Description | Data |
|---|---|
| A tuple of three integers, a float, a string and some Base64-data | {1 2 3 5.0e-7 'foo' 4,base64~Zm9v~} |
| A tuple containing two hashes of three integers identified by symbols |
{{Hash :a 1 :b 2 :c 3}
{Hash :foo 42 :bar 69 :baz 666}}
|
Data Types
There are two kinds of values in tEXPR: scalars and (possibly typed) tuples. Scalars are atomic, while tuples are compound, that is, they consist of scalars and other tuples.
Implementations are REQUIRED to support all described kinds of data types.
Scalars
A scalar is one of these:- Integer:
-
An integer gets encoded as a decimal number. It's size is unspecified, but MUST be at least 32 bytes.
- Double:
-
A double-precision signed floating point number gets encoded using the usual C-like format (that is, it matches the regular expression /^[+-]?((\d*\.\d+)([eE][-+]?\d+)?)$/.)
- Boolean:
-
A boolean value is either "#t", being true or "#f", being false.
- Nil:
-
Unset/Unspecified/Invalid values like "nil" in Smalltalk, Ruby and Java, "undef" in Perl or "NULL" in C get encoded as "#n".
- Ordinary string:
-
A ordinary string is written between single quotes ("'"). A string may contain a single quote, which gets escaped by doubling, that is "''".
- Sized string:
-
Sized strings make transmitting big chunks of data (>128byte) easier since their length is known in advance; therefore parsing byte for byte is unneeded and you can read bigger blocks at once.
They get encoded as
Length (", Flag")* "~" String "~"
Where Length is the length of the String, encoded as a valid tEXPR integer and String the data. No assumption is made at the String data.
Note: The tildes MUST embrace the data! This is a safety check.
There may be several Flags to change the handling of the data. Unsupported flags MAY be ignored. Currently, there is only one flag defined:
- base64
-
The data is assumed to be encoded in Base64 (see [Base64]).
This flag can be used to transmit data using methods which don't fully support 8-bit transmission.
All whitespace (that is: space, tab, line feed, form feed and carriage return) of String gets stripped. Other characters than letters, "+", "/", "=" are invalid.
- Symbol:
-
This data type is much like a string, but has different semantics: It is used to refer to items of an enumeration or keys of structures. Use Symbols instead of strings if you have a fixed set of possible values.
It looks like
":"SymbolName
where SymbolName may consist of all chars except whitespace, "{" and "}". Only alphanumeric characters (including "-" and "_") SHOULD be used to simplyfy mappings to languages with limited symbol support.
Tuples
Tuples have the following format:
"{" Value* "}"
Where Value may be any tEXPR value. All Values are separated by whitespace.
Typed Tuples (Objects)
Typed Tuples (also known as Objects) look and work like ordinary Tuples, with one important difference: They include type information. This is important if tEXPR data is stored in files or used for RPC.
Example: You have this tuple:
{{1 2} {2 4} {5 8} {3 6}}
What does that mean? It is impossible to tell if you only have the bare data. If you however include type information, that is, use Typed Tuples, you could say:
{Polygon {Point 1 2} {Point 2 4}
{Point 5 8} {Point 3 6}}
And everything would be alright.
Typed Tuples are encoded as:
"{" Type Value* "}"
Here Type is the type (class) of the Tuple and which starts with a letter and may consist of all chars except whitespace, "{" and "}", and Values are the values of the Tuple.
Value may be any tEXPR value. All Values are separated by whitespace.
The handling of Typed Tuples (Objects) is left to the implementor.
Standard Tuple Types
Currently, there is only one predefined Tuple type, the Hash.
Later versions of this standard MAY introduce new Tuple types. Everyone MAY (and SHOULD) introduce new Tuple types for his own protocols.
Hash
A Hash is a collection of key-value pairs. Indexing is done via arbitrary keys of any type.
Value format:
(Type Value)*
References
- [Base64]
- RFC3548: The Base16, Base32, and Base64 Data Encodings by S. Josefsson
- [ReqLevel]
- RFC2119: Key words for use in RFCs to Indicate Requirement Levels by S. Bradner
- [XML]
- Extensible Markup Language (XML) 1.0 (Second Edition) by Tim Bray, Jean Paoli, C. M. Sperberg-McQueen and Eve Maler
Copyright © 2003 Christian Neukirchen
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation. See http://www.fsf.org/copyleft/fdl.html for more information.