Topical

Preliminary draft, 17oct2004 +chris+

ChangeLog:

Introduction

Topical is a lightweight way to mark occurrences of topics in documents in a simple and straightforward way.

This document will mainly discuss and describe the Topical core. Information about how to add Topical meta-data to documents can be found in the appendices.

The Basics of Topical

Topical is based on a very simple, text-based syntax that resembles Unix path syntax. Each Topical description is based on the following syntax

description ::= topic [";" topic]*

Which is essentially a semicolon-separated list of topics.

A topic is a list of names that is separated by separators:

topic ::= separator? name [separator name]* description?

separator ::= "/" | "//"

name ::= ["a".."z" | "A".."Z" | "0".."9" | "-" | "_"]+

description ::= " "+ "(" text* ")"

text ::= " ".."'" | "*".."~"

Valid topics could be, for example:

/programming//ruby

/programming/languages/scripting/ruby (The Ruby programming language)

object-oriented/ruby

ruby

/things-to-do//diving (Diving)

These have this semantical meaning:

/programming//ruby: The marked data is about all ruby subtopics of the topic programming.

/programming/languages/scripting/ruby (The Ruby programming language): The marked data is about the ruby subtopic (with the description "The Ruby programming language" of the subtopic scripting of the subtopic languages of the topic programming.

object-oriented/ruby: The marked data is about the subtopic ruby of any (sub)topic object-oriented.

ruby: The marked data is about all (sub)topics named ruby.

/things-to-do//diving (Diving): The marked data is about any subtopic diving of the topic things-to-do, and labeled "Diving".

In Topical, there is no need for a central specification of topics; all descriptions will be merged into a topic-tree (see section "Merging"). However, it is recommended to implement support for a description file.

Description files

Description files enable users to predefine a certain taxonomy. A description file is structured like this:

description ::= [topic "\n"]*

Which is a file with one topic per line.

The topics of a description file do not specify any occurrences of topics, they just declare that certain topics exist. For example, to create a taxonomy of programming-languages, you may want to use a description file like this:

/programming (Programming)
/programming/languages (Programming languages)
/programming/languages/scripting (Scripting languages)
/programming/languages/compiled (Compiled languages)
/programming/languages/object-oriented (OO languages)
/programming/languages/scripting/ruby (The Ruby programming language)
/programming/languages/object-oriented/ruby (The Ruby programming language)
/programming/languages/scripting/perl (The Perl programming language)
/programming/languages/compiled/c (The C programming language)

Note that to a Topical implementation, /programming/languages/scripting/ruby and /programming/languages/object-oriented/ruby are not the same! However, /programming/languages//ruby can (and should) be used to specify that the section is about both topics.

Merging

Merging is the name of the process that takes several topics and builds a topic tree of them. In the case of merging all topics of above description file, you would get this topic tree:

programming (Programming)
  languages (Programming languages)
    compiled (Compiled languages)
      c (The C programming language)        
    object-oriented (OO languages)
      ruby (The Ruby programming language)
    scripting (Scripting languages)
      ruby (The Ruby programming language)
      perl (The Perl programming language)

Merging gets harder if we add a few topics: Let's say the book "Programming Ruby" is about these topics:

programming//ruby
/book//ruby (Ruby books)
/book/publisher//addison-wesley

Now, we have this topic tree:

programming (Programming)
  languages (Programming languages)
    compiled (Compiled languages)
      c (The C programming language)        
    object-oriented (OO languages)
      ruby (The Ruby programming language)
        * Programming Ruby
    scripting (Scripting languages)
      ruby (The Ruby programming language)
        * Programming Ruby
      perl (The Perl programming language)
book
  ruby
    * Programming Ruby
  publisher
    addison-wesley
      * Programming Ruby

The tree would look quite different if we had these topics additionally in the description file:

/book (Books)
/book/programming (Programming Books)
/book/programming/ruby (Ruby books)
/book/publisher (Book Publishers)
/programming/languages/favorite (Favorite programming languages)
/programming/languages/favorite/ruby (The Ruby programming language)

The topic tree would now be:

programming (Programming)
  languages (Programming languages)
    compiled (Compiled languages)
      c (The C programming language)
    favorite (Favorite programming languages)
      ruby (The Ruby programming language)
        * Programming Ruby
    object-oriented (OO languages)
      ruby (The Ruby programming language)
        * Programming Ruby
    scripting (Scripting languages)
      ruby (The Ruby programming language)
        * Programming Ruby
      perl (The Perl programming language)
book (Books)
  programming (Programming Books)
    ruby (Ruby books)
      * Programming Ruby
  publisher (Book Publishers)
    addison-wesley
      * Programming Ruby

[FIXME: describe (and implement ;-)) the exact merging algorithm: first add all exact topics, then insert the inexact ones (sorted by?)]

Appendix: Adding Topical descriptions to documents

E-Mails

The recommended way to mark RFC2822 messages is by using an unofficial field, for example:

X-Topical: /programming//ruby
X-Topical: /author//chneukirchen (Christian Neukirchen)

(X)HTML

In (X)HTML, you have two options of marking: document-wide, or by section.

For documents, you simply can use a <meta> tag in the header:

<head>
  ...
  <meta name="topical" content="/programming//ruby" />
  <meta name="topical" content="/author//chneukirchen" />
</head>

[FIXME: define a profile?]

Alternatively, you can use the class= attribute to only mark sections of the document:

<p class="topical /programming//ruby;/author//chneukirchen">
  ...
</p>

It is recommended to prefix the description with topical to make it easier to find it.

[FIXME: define alternative syntax for better CSS matching?]