Notes on Data Format Description Framework

This blog is developed to make notes on the development of DFDF at

Monday, November 26, 2007

The Zen and the Love of Information Resource

During the discussion of my article on the URI Identity and Web Architecture, I was told that what is Non-InformationResource, i.e., the complement set of InformationResource, is not defined by AWWW. Yet, can still be treated as an instance of it and made false per the bylaw of httpRange-14. I find it very amusing because it reminds me this Buddhism's proverb.

Q: What is Buddha?
A: Can't say, can't say. Wrong once said. (不可说, 不可说, 一说即是错

The best interpretation about the answer (but not the Buddha) is that Buddha is such an infinitude that no language is capable of describing it. Buddha is the unspeakable and, therefore, unteachable. Everyone must comprehend Buddha by themselves. Buddha is the truth that are not endorsed by and realized by any truth. Therefore, once Buddha becomes speakable, some truth must be associated with the Buddha, making the Buddha no longer the true Buddha because true Buddha doesn't have any truth. So, the Buddha that I am trying to explain it here is not true Buddha either. It is just one form of the Buddha. True Buddha doesn't have any form but may manifest in any form.

Sanmao (三毛) a Taiwanese author, once was asked: what is Love? She gave the same answer. "Can't say, can't say. Wrong once said."

I am amused because I wonder, do we want to make the web be a love affair so that we have to fall in love with InformationResource in order to understand it? Or do we want to make it a religion, i.e., to make the web as the quest for the true InformationResource?

Thursday, November 15, 2007

URI Identity and Web Architecture Revisited

Out of the discussions on W3C's TAG mailing list, I have written down my thoughts on URI's identity issue and my personal viewpoint on the architecture of the web. The article is published at here.

Here is a brief summary.

(1) Current definition of resource ignores the nature of URI as an interface to the web. There are three different kind of resources, the one that we cared about should be defined as "abstract entities that have a dereferencible URI in the web".
(2) The current definition of "information resource" on the AWWW document is not well thought. The debate about what information resource is do not solve any real issues.
(3) TAG's httpRange-14 is incorrectly phrased. HTTP response code should indicate if a URI is informational but not resource.

Tuesday, October 2, 2007

Best practice on using fragment identifier under content negotiation

HTTP content negotiation is a double-edged sword. On one hand, it can logically bind different "representations" under the same URI. But on the other hand, it posts challenges to the fundamental concepts in web architecture because it seems no longer clear what a URI denotes. For instance, what is Because the URI is designed to be an ontology used in semantic web, we would expect to obtain an RDF documents upon a HTTP GET. But such an operation can, in fact, get back two different types of representation – either an RDF/XML or a HTML documents - depending on how content is negotiated. The question is: would this be a good practice. I think so. The reason is: what you HTTP GET back is just one, but not the, representation of what the URI denotes. For an information resource, (let's leave the non-information resource out of this text because non-IR does not have representations), what its URI denotes is the union of all possible representations. Therefore, denotes an ontology, and the RDF/XML returned document is only one particular representation of this ontology in RDF formalism. There can be representations in other ontology formalism. And as a matter of fact, an HTML document is just one of those representations that are formulated in HTML to facilitate human understanding.

With the above understanding, let's now consider if it is a good practice to use the same fragment identifier in different representations. According to the URI specification, the semantics of a fragment identifier depends on the MIME type of a document. For most content type, such as HTML and XML etc., a fragment identifier usually denotes a sub-fraction of the document. But in RDF, fragment identifiers are mostly used to denote external entities. For instance, in DFDF, the URI is designed to denote a one dimensional homogenous information space. The question now is: what is the right thing to do for the above URI in its HTML representation? There are two options: either the HTML uses the fragment id or it does not.

Of course, not using the same fragment id in two different representations avoids, but not solves, the issue because what is not there is left un-identified. But the problem with this approach is that if a client is given a URI and then tries to GET a particular representation of the URI. A non-existing entity of the fragment identifier means that the client cannot be sure if given URI does not exist or exist in which kind of representation. Hence, in DFDF, the same identifier is used in both the RDF and HTML representation of an ontology. But is it wrong? Because in HTML representation, a fragment id, such as refers to an anchor or an HTML element, so that "a df:Stream is also an anchor" seems wrong. But the problem again resides at how we understand URI under content negotiation. If a URI without fragment identifier is the union of all its representation, a URI with a fragment identifier should also be considered to be the union of the interpretation by all mime-types. Hence, "df:Stream is an anchor" is not correctly phrased. If we say that "df:Stream refers to an anchor element in its HTML representation", it is correct. Hence, as long as the usages of the same fragment identifier in different representations are consistent with each other. I think it is a better practice than avoid using them at all.

Monday, October 1, 2007

DFDF started

Just started putting up the DFDF website last week. Put in some essential documents, but there are still a lot of documents needed to be filled. Will work hard on that soon.