this is aaronland

Welcome to the Terroirdôme

Next steps

I've released, or updated, a series of tools for working with, and converting between, Eatdrinkfeelgood 1.1 and (still experimental) 2.0 documents.

So...

Converting from 1.1 to 2.0

#!/bin/sh

EXEC_XSLTPROC=/usr/bin/xsltproc
XSL_ERDFG="/home/asc/lib/xsl/eatdrinkfeelgood/eatdrinkfeelgood-1.1-to-2.0a.xsl"
EDFG=$1
ERDFG=$2
${EXEC_XSLTPROC} -o ${ERDFG} ${XSL_ERDFG} $EDFG} 
  1. Galatoire's Sweet Potato Cheesecake (1.1)
  2. Galatoire's Sweet Potato Cheesecake (2.0)

Converting from 2.0 to text

#!/bin/sh

EXEC_PYTHON=/usr/bin/python
EXEC_TOTEXT="${EXEC_PYTHON} /home/asc/lib/python/erdfg/bin/text.py"
ERDFG=$1
TXT=$2
${EXEC_TOTEXT} ${ERDFG} > ${TXT}
  1. Galatoire's Sweet Potato Cheesecake (2.0)
  2. Galatoire's Sweet Potato Cheesecake (text)

Converting from 2.0 to XHTML, by way of 1.1

#!/bin/sh

EXEC_PYTHON=/usr/bin/python
EXEC_TOXML="${EXEC_PYTHON} /home/asc/lib/python/erdfg/bin/as_xml.py"
EXEC_XSLTPROC=/usr/bin/xsltproc
XSL_XHTML="/home/asc/lib/xsl/eatdrinkfeelgood/eatdrinkfeelgood-1.1-to-xhtml.xsl"
ERDFG=$1
XHTML=$2
TMP_XML="${ERDFG}.xml"
${EXEC_TOXML} ${ERDFG} > ${TMP_XML}
${EXEC_XSLTPROC} -o ${XHTML} ${XSL_XHTML} ${TMP_XML}
rm ${TMP_XML}
  1. Galatoire's Sweet Potato Cheesecake (2.0)
  2. Galatoire's Sweet Potato Cheesecake (XHTML)
  3. Galatoire's Sweet Potato Cheesecake (XHTML+CSS)

Converting from 2.0 to PDF, by way of 1.1

#!/bin/sh

EXEC_PYTHON=/usr/bin/python
EXEC_TOXML="${EXEC_PYTHON} /home/asc/lib/python/erdfg/bin/as_xml.py"
EXEC_XSLTPROC=/usr/bin/xsltproc
EXEC_FOP=/Installers/fop-0.20.5/fop.sh
XSL_XSLFO="/home/asc/lib/xsl/eatdrinkfeelgood/eatdrinkfeelgood-1.1-to-indexcard-fo.xsl"
ERDFG=$1
PDF=$2
TMP_XML="${ERDFG}.xml"
TMP_FO="${ERDFG}.fo"
${EXEC_TOXML} ${ERDFG} > ${TMP_XML}
${EXEC_XSLTPROC} -o ${TMP_FO} ${XSL_XSLFO} ${TMP_XML}
${EXEC_FOP} -fo ${TMP_FO} -pdf ${PDF}
rm ${TMP_XML}
rm ${TMP_FO}
  1. Galatoire's Sweet Potato Cheesecake (2.0)
  2. Galatoire's Sweet Potato Cheesecake (PDF)

First impressions

The good and the bad are mostly where I expected them to be found.

I really, really, miss being able to use XPath and XInclude. They are like magic and make combining and transforming recipes fantastically easy. Doing the same iterating over graphs, particularly when they are sometimes ambiguous by design, is not so much fun.

However, writing XML (or, by extension, writing and maintaining tools to make writing XML easy) is a pain in the ass. Writing e(r)dfg using this funny hybrid of plain text and N3 is not.

Next steps

In no particular order :

That is all.

<you> :a "what you eat" .

Independent of the actual markup format, I've been trying to work out the relationship between the various elements in an Eatdrinkfeelgood recipe. Here's the current working model of the Eatdrinkfeelgood Markup Language, affectionately referred to as e(r)dfg for short.

I've given up trying to find proper names for things right now. For the purpose of this discussion, here are four names that, really, all mean the same thing. But remember : They don't, okay?

Classes

Classes are the highest level elements of an e(r)dfg document and are a conversational shortcut, never explicitly named, that denote how often an item is likely to appear in a document.

Classes typically contain all the properties contained in a facet but this is not always the case so it better to think of a property, rather than a facet, as being of a certain type of class.

Facets

Like classes, facets are just a conceptual device to group parts of a document but are not named as such. (At least not explicitly. The RDF peanuts in the gallery would be correct in assuming that classes are treated like (cue drumroll) RDF classes.)

Facets may singletons or repeatables and, in the case of annotations, both. It is probably better to think of a property, rather than a facet, as being a member of a particular class.

Properties

Properties are the actual named parts in a document. They are like keys in a dictionary and may point to simple string values or child data-structures referred to, here, as attributes.

There are also sets which aren't listed because it's another dumb name and I'm not sure where the fall in the hierarchy of things. They are singletons that contain repeatables. A document may only contain one collection of sets, though it may have many parts. For example, a recipe for pie may be divided into two parts : the crust and the filling. Each part may contain more than the minimum set of repeatables : ingredients and directions followed by notes, equipment and other specifics.

Similarly, there is a framework for using an XInclude-like syntax to pull in pieces of another e(r)dfg document using SPARQL.

These last two are by no means required and will probably only ever enjoy limited use. But in the spirit of making easy things easy and hard things possible they still seem worth the effort.

Attributes

These are the guts of a given document (amount=5, measure=cup, etc.) and I'm not going to get in to them here. Part of the reason for this whole exercise has been to try and articulate a context in which attributes live to better identify what is and isn't necessary.

But how does it taste?

The reasons for spelling it all out like this are two-fold (if you exclude the obvious part where you want to figure out what the core elements of a recipe document are) :

  1. How to go about recording it as a written document and how much trouble it will be for a human being to do so.
  2. How much trouble it will be for a machine to parse that document into something useful and how much pain it will bring to a human who has to teach the computer what to do.

Did I mention how much I hate using non-XML formats for data exchange? I don't really have a pithy comment for that except to say : Yes. On measure, native XML just doesn't make as much sense as it once did. It is hard to read by eye, hard to write by hand and involves a sufficiently complicated setup to do anything otherwise that it's not really attractive to non-dorks and lazy dorks alike.

In the same vein, I prefer to not think of e(r)dfg, despites its clever name, as RDF either. More likely it will be presented as a plain-text format, with a formal set of markup rules and relationships, that conveniently happens to be RDF. In that regard, it will be more XML-ish in nature meaning that from a processor's point of view where the core set elements will have a fixed set of combinations. (And if you really need more variations on a document title than those described above maybe you need to be asking some larger questions about your life.)

I think I am comfortable with this. Or, at least, I know that the sacrifices demanded by any of the available extremes are not.

Waiter, there RDF in my soup!

The other day, Ed posted a recipe for chicken soup and I decided to use it as a test case for everything I've been working on. Would it be easy enough to enter by hand? How much of Ed's original recipe could I simply copy and paste? Could I read the formatted version easily afterwards? Could I write a quick and dirty computer program to dump a version of the recipe that read like the recipes we scribble on index cards?

Here's what I had to work with :

For the most part, it went well. The ingredients list while still a bit of a nuisance is actually an improvement. It is easier to read than the XML-ified Eatdrinkfeelgood 1.x documents. It is easier to write, and to remember what to write, than 1.x documents. It avoids both the natural language processing rabbit-hole and provides for richer semantics, than earlier versions of the spec, allowing users to define the values for measures and dstuffs as resources themselves.

The directions and notes were copied and pasted and while, in my version, I've lost the explicit semantics of the paragraphs I could have also just copied the raw HTML had I been so inclined. The Atom content model is handy that way.

The biggest problem, so far, is in processing the recipe as a traditional RDF document. Specifically, there's nothing that says an ingredients list can't have a list of anonymous nodes (the individual ingredient descriptions) but it does make them hard(er) to find. The first problem is the way that lists, in RDF, are interpreted as being a series of micro-lists consisting of two elements : The first item in the rest of the list, followed by a pointer to the rest of the list itself. The second problem is that the un-name-edness of individual ingredients means you have to test each element in the ingredients list first to see whether it's a node and then whether it contains something ingredient-ish like a dstuff.

Really, not so hot.

The obvious solution to this would be to assign a type to each ingredient description. Instead of : [e:dstuff "butter"] you'd write [e:dstuff "butter"; :a e:ingredient]. Or e:ingredients [ e:ingredient ( [...] [...] ) ]. This makes the computer happy but also makes the baby Jesus cry. Seriously, if I'd wanted to get into that kind of markup soup I would have just recast the whole language as a microformat.

Writing a document should be possible with nothing more than a text editor and a little bit of patience, a straightforward, albeit boring affair. Hiding the details of the markup behind a user-friendly graphical interface would be a welcome improvement for many but it should not be a requirement.

I wrote that and I still stand by it. It is the primary motivation behind the current iteration of the format and the pull between making it easy to write one-liners versus being able, or willing, to write the recipes at all is what I'm still trying to sort out.

I love to spend time thinking about fancy GUI apps, using sexy tricks like XForms and auto-completion-y scrumjax, to read and write recipes but I honestly don't think that's how it's going to happen most of the time. More often than not I will be reading recipes from something like the 770 which won't be connected to the Internet or writing them in a copy of Notepad and emailing the finished version to myself from someone else's computer.

If I can use the same short-hand (format) for both the reading and the writing then maybe the extra time, and hassle, it will take to teach a computer program to DWIM is worth it.

Another option is to define strict and casual modes for a document where the former would mandate the use of [:a e:thingy] attributes (careful readers will note that the same issues surrounding ingredients apply to both notes and directions) and the latter would not. Glue-code to map the latter to the former would be easy enough to write and maintain. To that end, you'd have a pipeline that looked something like this :

casual -> strict -> XML -> HTML, XSL-FO (PDF), etc.

Next steps

Or something like that.