Libwhisker XML doc v1.0
-------------------------------------------------------------------------

This document details the various aspects of Libwhisker's XML reader, in 
order to better understand what you can and can't do with it.

This XML reader ('reader' is a much more accurate term than 'parser') is
capable of reading basic XML documents into a specialized Perl hash for
programmatic use.

Target uses are:
- Reading in XML-based application configuration files
- Parsing VulnXML checks
- Handling WebDAV responses

What the XML reader *DOESN'T* do:
- Parse/understand DTDs
- Validate a document (against a DTD or general XML style)
- Parse XSL stylesheets (anything between <xsl:stylesheet> tags is ignored)
- Fetch/incorporate any external data/elements
- Allow elements to have both child elements and PCDATA (once a child
	element is encountered, any/all PCDATA is ignored).
- Create/write/produce XML

In order to better understand how the parser works, we'll walk through a few examples.

First, let's consider we have the following snippet of XML in $myxml:

<Configuration>
	<Option1>foo</Option1>
	<Option2>bar</Option2>
	<Option3 conditional="weekday" weekday="sunday">baz</Option3>
	<Option4></Option4>
	<Option5/>
</Configuration>

We can consider this to be a basic configuration file for our application.  
First we need to read in the data into a usable XML object:

$XML = xml_read_data(\$myxml);

The return value (in our case, $XML) is actually just a pointer to an 
anonymous hash.  In this hash all elements are stored using an 'XML path',
which is similar to a UNIX filesystem layout.  In our example, the elements
would have the following paths:

	/Configuration
	/Configuration/Option1
	/Configuration/Option2
	/Configuration/Option3
	...

You would give the path of the element you want to query.  Note that
everything in XML is case sensitive. To fetch the value of the Option1 
element (which should be 'foo'), we would run:

$option1 = xml_get_element_value($XML, "/Configuration/Option1");

$option1 is now set to 'foo'.  We can continue to do this for all the Option*
elements.  Option2 will return a value of 'bar', and Option3 will get 'baz'.  
Option4 will get '' (an empty string), since there is no data between the tags.
Option5 will return undef (Perl's 'undefined' value) due to Option5 being
an 'empty element'.  This lets you tell the difference between <blah/> (an 
empty element with no real value), and <blah></blah> (an element explicitly
set to an empty string value).  Note that 'undef' is also returned if the
named element doesn't exist, or there is error.  In this case, it is wise
to use the xml_if_exist() function to see if a value exists: if 
xml_if_exist() returns true, but xml_get_element_value returns undef, then
the element is valid and does not contain a value.

Next we want to access the parameters to element Option3.  We do this by:

$return = xml_get_element_parameter_value($XML, "/Configuration/Option3",
	"conditional");

This fetches the parameter named 'conditional' from the Option3 element,  which is 'weekday'.  If we don't know the names of the parameters ahead of time, we
can get a full list of them;

@parameters = xml_get_element_parameters($XML, "/Configuration/Option3");

Ok, so far so easy.  To make things a bit more complicated, we'll next try the
following XML:

<Configuration>
	<Option name="1">foo</Option>
	<Option name="2">bar</Option>
	<Option name="3">baz</Option>
</Configuration>

Things are a bit trickier now since we have multiple elements of the same name.
Normally we'd use the path "/Configuration/Option", but what happens when
there's more than one?  Well, the reader makes the following path entries:

	/Configuration
	/Configuration/Option
	/Configuration/Option[1]
	/Configuration/Option[2]

Further, "/Configuration/Option" is marked as 'multi', meaning it has multiple 
values.  Now, manually adding '[1]' and trying to resolve all of the proper 
paths is not your job--there's a function to handle everything:

@elements = xml_get_multi($XML, "/Configuration/Option");

This returns an array containing the path names of all of the 'Option'
elements.   Note: calling xml_get_multi on a non-multi element will safely
result in only the single element name being returned--so when in doubt,
use it liberally!

Of course it may be hard to see how this helps you, so please see the
more verbose code snippet below:

@options = xml_get_multi($XML, "/Configuration/Option");
foreach $option (@options){
	# do something with each option tag
	$value = xml_get_element_value($XML, $option);
	$name = xml_get_element_parameter_value($XML, $option, 'name');
}

Here all the path values for the various Options are enumerated through and
dealt with individually in a loop.  Let's look at a slightly more complex
piece of XML:

<Config>
	<Foo name="one">
		<Bar>one</Bar>
		<Bar>two</Bar>
	</Foo>
	<Foo name="two">
		<Bar>one</Bar>
		<Bar>two</Bar>
	</Foo>
</Config>

Here the XML reader creates paths which resemble:

	/Config
	/Config/Foo
	/Config/Foo/Bar
	/Config/Foo/Bar[1]
	/Config/Foo[1]/Bar
	/Config/Foo[1]/Bar[1]

Both "/Config/Foo", "/Config/Foo/Bar", and "/Config/Foo[1]/Bar" are multi
elements.  How do we access all the Bar values?

@foo = xml_get_multi($XML, "/Config/Foo");
@bar = (); # starts off empty
foreach $foo (@foo){
	push @bar, xml_get_multi($XML, "Bar", $foo);
}
# @bar now has the paths to the four 'Bar' elements

Here we make use of the optional 'root' value, which is available to many of
the XML functions.  By saying "xml_get_multi($XML,'Bar',$foo)", we are
requesting that the subpath 'Bar' be appended to $foo, and then all those
elements be returned.  This lets us access subelements based on a variable
parent path.

xml_get_multi() always returns the *absolute* element path value.  The values
in @bar will be of the full "/Config/Foo*/Bar*" sort, rather than just a
"/Bar*".
