ITWissen.info - Tech know how online

simple API for XML (SAX)

Like the Document Object Model (DOM), the Simple API for XML(SAX) provides a way to access XML documents. In contrast to the DOM, SAX is a programmable interface (Application Programming Interface, API) that supports the sequential processing of XML documents. For this purpose, events are triggered in predefined situations when the document is parsed, and an application can then act on these events accordingly. This means, however, that the entire XML document is not available in memory, as is the case with DOM, but always only snapshots of the data, for example if very fast search processes are to be implemented in documents. However, the programming models of SAX and DOM differ considerably, so the respective conversion cannot be made without further ado.

The history of SAX

Already in 1997 the development of SAX was started under the direction of David Megginson. SAX is currently available as public domain software in version 2.0.2 (SAX2) at the link below. This version supports Document Type Definition( DTD) as well as namespaces and comment processing. Originally SAX was designed as an interface for Java, but meanwhile there are implementations for other languages like C++, Python or Perl. Unlike DOM, SAX does not conform to any specification drafted by the World Wide Web Consortium(W3C), yet it is a widely used de facto standard.

In SAX, different program components work together:

  • An XML parser, which generates different events when parsing the XML document.
  • A so-called handler, which reacts to the generated events with different methods.

One speaks in this connection also of the fact that SAX is based on an event model. The XML document is read serially like a data stream, and an event is generated for recognized elements. An event is in each case a signal, which indicates a change in the markup status. This can be caused by: element tags, processing instructions, the boundaries of the document itself, comments or character data to be filtered. The parser reports these events back to the calling program through call-backs. The handler now interprets the event and acts accordingly. This procedure is connected with the disadvantage that an independent access to an individual element is possible only with a previous buffering. Therefore, the developer himself must ensure that data to be analyzed is retained. In no case the complete XML document is available with SAX.

The further development of SAX

SAX2 is already part of the Java Development Kit( JDK) and implemented there as part of the Java API for XML Processing (JAXP). However, this abstraction layer also supports the technologies: DOM Level 3, StAX, XSLT and XPath. With StAX a so-called Pull API is defined, which does not work after the callback principle, but goes like a Tokenizer - well-known from the compiler construction - over the elements and selects these. XSLT (Extensible Stylesheet Language Transformations) is, according to the W3C standard, a language defined in XML for the transformation of XML documents.

If one now compares the interfaces DOM and SAX, then both are to be considered in each case specifically co-ordinated with the appropriate use case. For example, it is not important for the search for a certain character string to store the entire document in memory and therefore it is not necessary to build a complete node tree with DOM. In this case it offers itself to work with SAX. The approaches are also frequently used in combination with each other. It is therefore also possible to first filter out the required information from a document using SAX and then pass it on using DOM. SAX is also supported by Microsoft XML Core Services (MSXML). JAXP integrates the parser implementations Xerces and Crimson, the link below gives details about this. JAXP is in any case a flexible way to choose between different parsers and XLST transformers without having to modify the actual program code in any case.

Informations:
Englisch: simple API for XML - SAX
Updated at: 29.10.2013
#Words: 645
Links: document object model (DOM), application programming interface (API), indium (In), contrast, interface (I/F)
Translations: DE
Sharing:    

All rights reserved DATACOM Buchverlag GmbH © 2024