A rose is a rose, except when you call it XHTML
For a variety of reasons, I've started baking all the web pages for a few sites. Any bells and whistles are just that; all the really important stuff is right there in the source file on the assumption that sooner or later the magic server tools will bork out. This means that things like copyright notices need to be updated by 'hand'. I knew this when I started and today I sat down to write a little tool to do the heavy lifting for me. This is, after all, the magic lala-land we call XML/XHTML, right? I am generally a SAX-guy but decided this was a good opportunity to introduce myself to the DOM-ishness of XML::LibXML. You know, something simple like :my $parser = XML::LibXML->new(); my $doc = $parser->parse_html_file("/path/to/file"); my ($span) = $doc->findnodes("/xpath/to/span/with/curyear"); # I happen to know that the first child # is a text node containing the year my $oldyear = $span->firstChild(); $oldyear->replaceNode(XML::LibXML::Text->new("2003")); # Write changes to diskSimple, right? Keen observers will have already noticed that I had to read in the document using the
parse_html_file
method. If your root element says "html", regardless of
whether or not the DOCTYPE says XHTML, it is seemingly
impossible to parse a document using the standard
parse_*
methods. Which means that by the time you get around to
writing your DOM to disk you cant do things
like...drumroll...include an XML declaration. And if you
decide to simply write the declaration out by hand, don't
bother trying to call any of the encoding methods. Maybe
there is some deeper magic I have yet to learn but all I was
able to do was make the Perl interpreter dump core. Still
with me? Okay, so we're going to write the declaration by
hand and just assume we know what we're dealing with when it
comes to encodings. Now remember the bit about being in an
HTML context? Logically, the thing to do is follow the docs,
call the
toStringHTML
method and hope for the best. The best in this case is a
terrible journey back to 1998 because all the singletons in
the well-formed documents you've laboured over are suddenly
left open and dangling. And don't bother throwing caution to
wind and just calling
toString
, not if you use the clever C-style comments hack to hide the
<![CDATA[]]> blocks so that your XHTML can
valid and be understood by a web browser. Who the fuck knows
why, but libxml will turn in to this :
<![CDATA[ /* <![CDATA[ */ @import url(some.css) /* ]]> */ ]]>Oh yeah, and it will include an XML declaration for you. So long as you dont mind that it doesn't specify the encoding. Ugh.