today | current | recent | random ... categories | search ... who ... syndication

Sunday, September 08 2002

Subject: Glossaries - XPath, SAX and benchmarks




Date: Sun, 8 Sep 2002 15:35:43 -0400 (EDT)



From: Aaron Straup Cope 



To: Karl Dubost 



Cc: Steph



Subject: Glossaries: XPath, SAX and benchmarks







So, I sat down and did some tests this morning per our conversation 



about glossaries and XBEL and XPath.







It's a bit depressing given the nature of the XPath query you need to pull



stuff out of an XBEL document :







"/xbel//bookmark[title=\"$keyword\"]/\@href"







Since the <bookmark> element can be either next to the root <xbel> element



or contained in an arbitrary number of nested <folder> elements, there



isn't much too do except sniff around every node until you find what



you're looking for.







Which takes a long time. Longer than you'd normally want anyway...







On the other hand, if you just use a plain old SAX widget to find the



keyword, it takes roughly 1/4 to 1/5 of the time to do a lookup.







Below are benchmarks for 100 iterations of a subroutine that does 5



keyword lookups against an XBEL file.







Note that the XPath query doesn't even instantiate a new object; the same



object is shared across all 500 calls to 'find'. The SAX query on the



other hand, instantiates a new filter and a new parser for each lookup.







Obviously, some clever caching of lookups would speed things up as well.







****











101 ->./debug.xbel



Benchmark: timing 100 iterations of xpathquery...



    bquery: 765 wallclock secs (645.73 usr + 13.66 sys = 659.38 CPU) @



0.15/s (n=100)







101 ->./debug.xbel



Benchmark: timing 100 iterations of saxquery_pureperl...



saxquery_pureperl: 171 wallclock secs (148.23 usr +  0.62 sys = 148.86



CPU) @  0.67/s (n=100)







102 ->./debug.xbel



Benchmark: timing 100 iterations of saxquery_expat...



saxquery_expat: 171 wallclock secs (148.17 usr +  0.20 sys = 148.38 CPU) @



0.67/s (n=100)







****







package Foo;



use base qw (XML::SAX::Base);







sub keyword {



  my $self = shift;



  $self->{'__keyword'} = $_[0];



}







sub link {



  my $self = shift;



  return $self->{'__link'};



}







sub start_element {



  my $self = shift;



  my $data = shift;







  return if ($self->{'__match'});







  if ((! $self->{'__bookmark'}) && ($data->{Name} eq "bookmark")) {



    $self->{'__bookmark'} = 1;



  }







  return if (! $self->{'__bookmark'});







  if ($data->{Name} eq "bookmark") {



    $self->{'__link'} = $data->{Attributes}->{'{}href'}->{Value};



  }







  $self->{'__title'} = 1 if ($data->{Name} eq "title");



}







sub end_element {



  my $self = shift;



  my $data = shift;







  return if ($self->{'__match'});







  if ($data->{Name} eq "title") {



    $self->{'__title'} = 0;



  }



  if ($data->{Name} eq "bookmark") {



    $self->{'__bookmark'} = 0;



  }



}







sub characters {



 my $self = shift;



  my $data = shift;







  return if ($self->{'__match'});



  return if (! $self->{'__bookmark'});



  return if (! $self->{'__title'});







  if ($data->{Data} eq $self->{'__keyword'}) {



    $self->{'__match'} = 1;



  }



}







package main;







my $file = "/usr/home/asc/aaronland.net/asc/webdev.xbel";







use XML::SAX::ParserFactory;



$XML::SAX::ParserPackage = "XML::SAX::Expat";



use Benchmark;







my $count = 100;



my @keywords = (



		'FilterProxy Home Page',



		"REX XML Shallow Parsing with Regular Expressions",



		"aaronland",



		"Schematron - XML Validation Language",



		">RE ActivePerl mod_perl ppd available",



		);







timethese($count, {



		   saxquery_expat => sub {



		     foreach my $kw (@keywords) {



		       my $filter = Foo->new();



		       $filter->keyword($kw);



		       my $parser = XML::SAX::ParserFactory->parser(Handler=>$filter);



		       $parser->parse_uri($file);



		     }



		     },



		   });











****







use XML::XPath;



use Benchmark;







my $file = "/usr/home/asc/aaronland.net/asc/webdev.xbel";







my $count = 100;



my $xbel  = XML::XPath->new(filename=>$file);







my @keywords = (



		'FilterProxy Home Page',



		"REX XML Shallow Parsing with Regular Expressions",



		"aaronland",



		"Schematron - XML Validation Language",



		">RE ActivePerl mod_perl ppd available",



		);







timethese($count, {



		   xpathquery => sub {



		     foreach my $title (@keywords) {



		       my $query = "/xbel//bookmark[title=\"$title\"]/\@href";



		       my $r = $xbel->find($query);



		     }



		   },



		  });







meta

 
 
Saturday, September 07 2002 ←  → Monday, September 09 2002