This blog is developed to make notes on the development of DFDF at http://dfdf.inesc-id.pt

Tuesday, October 2, 2007

Best practice on using fragment identifier under content negotiation

HTTP content negotiation is a double-edged sword. On one hand, it can logically bind different "representations" under the same URI. But on the other hand, it posts challenges to the fundamental concepts in web architecture because it seems no longer clear what a URI denotes. For instance, what is http://dfdf.inesc-id.pt/ont/voc? Because the URI is designed to be an ontology used in semantic web, we would expect to obtain an RDF documents upon a HTTP GET. But such an operation can, in fact, get back two different types of representation – either an RDF/XML or a HTML documents - depending on how content is negotiated. The question is: would this be a good practice. I think so. The reason is: what you HTTP GET back is just one, but not the, representation of what the URI denotes. For an information resource, (let's leave the non-information resource out of this text because non-IR does not have representations), what its URI denotes is the union of all possible representations. Therefore, http://dfdf.inesc-id.pt/ont/voc denotes an ontology, and the RDF/XML returned document is only one particular representation of this ontology in RDF formalism. There can be representations in other ontology formalism. And as a matter of fact, an HTML document is just one of those representations that are formulated in HTML to facilitate human understanding.

With the above understanding, let's now consider if it is a good practice to use the same fragment identifier in different representations. According to the URI specification, the semantics of a fragment identifier depends on the MIME type of a document. For most content type, such as HTML and XML etc., a fragment identifier usually denotes a sub-fraction of the document. But in RDF, fragment identifiers are mostly used to denote external entities. For instance, in DFDF, the URI http://dfdf.inesc-id.pt/ont/voc#Stream is designed to denote a one dimensional homogenous information space. The question now is: what is the right thing to do for the above URI in its HTML representation? There are two options: either the HTML uses the fragment id or it does not.

Of course, not using the same fragment id in two different representations avoids, but not solves, the issue because what is not there is left un-identified. But the problem with this approach is that if a client is given a URI and then tries to GET a particular representation of the URI. A non-existing entity of the fragment identifier means that the client cannot be sure if given URI does not exist or exist in which kind of representation. Hence, in DFDF, the same identifier is used in both the RDF and HTML representation of an ontology. But is it wrong? Because in HTML representation, a fragment id, such as http://dfdf.inesc-id.pt/ont/voc#Stream refers to an anchor or an HTML element, so that "a df:Stream is also an anchor" seems wrong. But the problem again resides at how we understand URI under content negotiation. If a URI without fragment identifier is the union of all its representation, a URI with a fragment identifier should also be considered to be the union of the interpretation by all mime-types. Hence, "df:Stream is an anchor" is not correctly phrased. If we say that "df:Stream refers to an anchor element in its HTML representation", it is correct. Hence, as long as the usages of the same fragment identifier in different representations are consistent with each other. I think it is a better practice than avoid using them at all.

No comments: