XPath (XML Path Language) defines a query language for precisely addressing the individual entities in the structure of an XML document in order to refer to elements, text data, attributes, and other elements. XPath is part of the Extensible Stylesheet Language (XSL), which is responsible for formatting XML documents. XPath was already standardized by the World Wide Web Consortium (W3C) in 1999 in a version 1.0. In 2007, version 2.0 was published, the most important extension of which is a modified data model. Furthermore, an extended function library, new data types and operators as well as additional constructs for expressions are supported. The basis of XPath is a tree structure, which represents the information units contained in an XML document. The tree reads as a family tree, as is common in computer science - the root is at the top and the tree branches downward. XPath is also used as the basis for other standards such as XPointer, XQuery, and XSLT.
Analysis Tool for XML DocumentsIn order to be able to address and analyze parts of an XML document, a special query language was defined in addition to XML with XPath. XPath itself is not an XML application but a language of its own whose syntax supports the formation of strings, which in turn are used to address substructures in documents. The tree model used by XPath - the node tree - is similar to the DOM model, but is by no means identical to it. The node tree defines only an isolated view of a document, and is not, as in the DOM model, a set of objects with methods and properties that are made available to an application. In this context, one also speaks of a tree model with behaviorless nodes.
XPath distinguishes between different node types, including root nodes, element nodes, attribute nodes, namespace nodes, processing nodes (also control statement nodes), comment nodes, and text nodes. There is a strict rule for assigning nodes to a tree. A tree starts from a root node, which is a logical construct and does not correspond to the document element. The document element and any other nodes for processing or comments branch off from the root node. Text nodes contain the textual content of a document. As many characters as possible are used in each text node.
An expression is an instance of XPath. Here, an important expression of XPath is the so-called localization path, which selects a set of nodes relative to a context node. Again, a localization path consists of individual localization steps by which a set of nodes can be selected from a document, which itself may consist of one of the nodes mentioned above. So-called axes realize the navigation paths for the localization of nodes. Axes can be used to select certain structures of the tree of a document, starting from the context node.
The order of the nodes is also called document order. The order maps the nodes exactly as the character string corresponding to the node is arranged in the document. When searching in a document, the children and grandchildren of the current node are always processed first, and not the nodes of the same level. This is also known as a depth-first search. The figure below illustrates a possible document sequence.
The creation of a document from a tree is also called serialization. Conversely, when a tree is created from a document, it is called deserialization. When an ordered tree is serialized, it practically means storing the nodes in the described order in a document.
With XPath 2.0, the functionality and data model of XPath have been significantly extended. In addition to the explained tree model, so-called atomic values are also taken into account. These are data of different types like strings, values or numbers, logical values, names, URIs or even date or time information. All these values have no relation to any element or attribute of the XML document. In version 2.0, the data model also includes sequences or lists, but these may not be nested. Since three areas of the data model are now to be covered, the corresponding functions and operators have also been extended.