News Column

Patent Issued for Parallel XML Parsing Using meta-DFAs

July 30, 2014



By a News Reporter-Staff News Editor at Journal of Engineering -- From Alexandria, Virginia, VerticalNews journalists report that a patent by the inventor Chiu, Kenneth (Vestal, NY), filed on December 11, 2009, was published online on July 15, 2014.

The patent's assignee for patent number 8782514 is The Research Foundation for The State University of New York (Binghamton, NY).

News editors obtained the following quote from the background information supplied by the inventors: "By serving as a common language for loosely-coupled, message-oriented interactions, XML has greatly facilitated the development of large-scale, interoperable applications. At the same time, however, the very characteristics that lend themselves well to this purpose, such as XML's self-descriptive nature, have resulted in concerns about performance [3]. These concerns have led to work such as alternative encodings of XML to reduce serialization and deserialization costs [2], differential techniques [7, 12] to cache results from similar messages for reuse, schema-specific techniques to optimize parsing for specific schemas [4, 14], and hardware acceleration to speedup XML parsing [5].

"On the hardware front, recently manufacturers have increasingly opted to use chip real estate to provide multiple cores per CPU rather than faster CPUs. Parallel techniques will thus become increasingly important to exploit the march of Moore's Law [11], suggesting parallelism as another avenue for improving the performance of XML parsing.

"Parallelism can be introduced into an application at different levels of granularity. For example, under a naturally concurrent workload consisting of a stream of independent, incoming service requests, throughput can be improved by assigning a separate core to each incoming request. For some applications, however, the computation of a single job itself may be parallelized across all cores of a machine. Since typically the computation cannot proceed until the entire input has been read, most cores will be idle during input I/O if the input parsing is single-threaded. In this case, assigning multiple cores to parse the input in parallel may be beneficial, especially if the input contains a sizable amount of data. Thus, applying parallelism to processing such XML parsing could bring significant benefit.

"An XML document could be parsed in parallel by dividing the XML document into equal-sized chunks, and parsing each chunk in parallel, with one core per chunk. The term chunk is used here to refer to any contiguous sequence of characters. XML parsing is inherently sequential, however, because the state of an XML parser when reading a given character depends potentially on all preceding characters. Thus, each chunk cannot be unambiguously parsed independently, since the true state of an XML parser at the beginning of a chunk is unknown until preceding chunks are parsed.

"Previous work of the inventors [6, 9] addressed this problem by using a fast preparse scan to build an outline of the document that is called the skeleton. The preparse was then used to guide a full parse using libxml2 [16]. This preparse is a sequential stage, however, which fundamentally limits scalability per Amdahl's law [1].

"One approach to parallel XML parsing might be to pipeline the parsing by dividing it into stages. Each stage would then be executed by a different thread. This approach can provide speedup, but pipelines are usually designed specifically for a fixed number of execution units, and thus such an implementation could not easily adjust to varying numbers of cores.

"More promising is a data parallel approach. Here, the data, in this case the XML document, is divided into some number of chunks, which are then all parsed in parallel by separate threads, one for each chunk. As the chunks are finished, the results are merged back together to form the complete result. The difficulty with this scheme, however, is that dividing the document into chunks with equal numbers of characters would create chunks beginning at arbitrary locations within the XML document. Thus, all chunks except the first would begin at a character of unknown syntactic role, such as in a tag name, an attribute name or value, element content, etc. The parser would not know the state at which to begin the chunk.

"The area near the beginning of each chunk could be searched to attempt to unambiguously determine the syntactic state of the beginning of the chunk. For example, one could backtrack from the beginning of the chunk to find a < character, indicating the beginning of a tag. Unfortunately, the < character is also valid within a comment, where it does not indicate the beginning of a tag, so additional code logic would be needed to correctly handle all the possibilities. Thus, such ad hoc approaches are complex and error prone.

"Note that there may be application-level dependencies between elements within a single XML document. These dependencies may hinder application-level parallelism before the document is fully parsed, but do not preclude parser-level parallelism (especially DOM-based), which is an aspect addressed by the present invention.

"A number of approaches try to address the performance bottleneck of XML parsing. Software solutions include schema-specific parsing [4, 14] and lazy parsing [8]. Schema-specific parsing leverages XML schema information, by which the specific parser (automaton) is built to accelerate the XML parsing. With lazy parsing approach, a skeleton is also built from the XML document to indicate the basic tree structure, then, based on the user's access requirements, the required piece of the XML document will be located through the skeleton and be fully parsed. This approach seeks to do partial parsing on the whole XML document on demand so that time can be saved. Though all these approaches are trying to boost XML parsing speed, they differ fundamentally from parallel XML parsing since their focus is to speed the process of parsing, while the main point of the parallel XML parsing is understand how to parallelize the process.

"Hardware based solutions [5, 15] are also promising, particularly in the industrial arena. These approaches, however, require specialized hardware, while this approach can applied to any machine with multiple cores.

"See, for example, U.S. Pat. No. 6,892,237, expressly incorporated herein by reference."

As a supplement to the background information on this patent, VerticalNews correspondents also obtained the inventor's summary information for this patent: "The present invention demonstrates how to parallelize the preparse itself, and thus greatly reduce the sequential stages of the parallel parsing process. libxml2 is leveraged for the full parse, and thus the technique supports all XML features that libxml2 does. The output DOM is preferably simply the output of libxml2, thus simplifying the integration of parallel XML parsing into existing applications that use libxml2. Of course, other parsers may be employed as appropriate.

"The preparser is parallelized by first modeling it as a finite state automaton with actions on each transition. The term DFA is used to denote deterministic automata with actions on transitions, which is somewhat different from that suggested in most textbooks. The automaton is then transformed to a new DFA, which is called a meta-DFA, whose transition function effectively runs multiple copies of the original automaton simultaneously. Each contained copy of the original automaton is called a sub-DFA. For each state q of the original automaton, the meta-DFA incorporates a complete copy of the automaton as a sub-DFA which starts at the state q. So the meta-DFA would start execution at the beginning of a chunk with its first sub-DFA in state 0 of the original preparser automaton, its second sub-DFA in state 1 of the original preparser automaton, etc. The meta-DFA thus avoids the problem of determining the state at the beginning of a chunk by essentially pursuing all possibilities simultaneously.

"It is noted that, if information is available to make certain states more likely that others, than the algorithm may be modified to allocate more of its resources to the more likely states, to give a statistical speed advantage, if evaluating all parse paths simultaneously would strain resources. Thus, the full parsing can commence with partial results available, subject to a stall if a required evaluation is absent.

"Likewise, if some states are known or predicted to have greater complexity for evaluation, these may be distributed to computing cores which have a higher capability than cores to which predicted low complexity evaluation states are distributed. For example, a current trend is to perform computational tasks on massively parallel computing arrays, as are found in modern computer graphics processors (GPUs). These systems, such as the nVidia Tesla, which operates under CUDA, permit common instructions to be executed by a series of processors, and thus while the raw computational ability is high, the independent control is limited. However, some preparsing tasks may be appropriately implemented on these devices. It is noted that a typical system has both a GPU and CPU, each of which may have multiple cores, providing an opportunity for discrimination in the task allocation process.

"It is further noted that even with identical processors, the clock speed may be modulated as required by a task, to control power dissipation. Indeed, forthcoming chip designs may have a thermal envelope which may be exceeded by unconstrained operation of the device, and therefore rationing power dissipation for various tasks may be a requirement.

"Since the meta-DFA is also a DFA, the simultaneity is conceptual only, and can be executed by a single core. Because this technique is a generic transformation applied to the original preparsing DFA, it is also easier to implement, debug, and maintain than ad hoc attempts at parallel XML parsing.

"Using meta-DFA constructed from the preparser, the document is divided into equal-sized chunks, and the preparsing meta-DFA executed in parallel on each chunk. Each sub-DFA generates a separate partial skeleton per chunk, all of which but one correspond to incorrect preparser states used to begin the chunk. When execution is complete, the partial skeletons are merged, and incorrect ones are discarded.

"Tests on a 30 CPU Sun E6500 machine demonstrate good scalability at least to 30 CPUs. Indeed, while the present system does not attempt to exploit fast inter-core communications, there is no reason why this could not be implemented with associated expectation of improvement in performance.

"Operating systems usually provide access to multiple cores via kernel threads (or LWPs). It is generally assumed herein that threads are mapped to hardware threads to maximize throughput, using separate cores when possible. Further details of scheduling and affinity issues may be addressed using techniques known in the art.

"2.1 Preparsing

"In previous work of the inventors, [6, 9], it was shown how parallelization issues could be addressed by a two-stage parsing of the XML document, namely a quick sequential scan of the document which is then used to guide a full parsing stage producing liblxml2 DOM output. The initial scan is known as the preparse, and its purpose is to determine the tree structure of the elements in the document. With the structural information, the XML document can then be partitioned into well-balanced document fragments and then fully parsed in parallel. Well-balanced is used to mean that the XML fragment begins with a start-tag and ends with the corresponding end-tag. The full parse is performed using unmodified libxml2 [16], which provides APIs sufficient to parse well-balanced XML fragments. Finally parsing results of each parallel thread (the DOM fragments) are spliced together to form the final DOM object. Full details on PXP are given in [6, 9].

"2.2 Skeleton

"Many of the syntactic units defined by the XML specification [18], such as attributes, namespaces, and even the tag name, are not considered in the preparse. Neither does the preparse need to verify any well-formedness constraints, since these can be verified later during the full parse, for example by libxml2. The preparsing thus treats the XML document as simply a sequence of unnamed start- and end-tag pairs, which is called the skeleton of the XML document. This simplicity does not limit the XML which the parser can be used for, since all other XML features are handled in the subsequent full parse.

"It is noted that the present invention is not limited to parsing of XML documents, and indeed, no such limit is intended. Rather, XML parsing represents a class of problems in which an arbitrary portion of an object has an unknown or ambiguous state, and appropriate processing of that document requires a knowledge of the state. However, the speed of execution of the process may be improved if parallel speculative execution techniques are employed, in which some or all of the possible states are evaluated absent express knowledge of the correct state. Thereafter, as the actual state becomes known, the correct speculatively-executed transition sequence be promoted, and the incorrect speculatively-executed transition sequence demoted or killed. Indeed, this technique is also applicable to cases in which the actual state is not defined until a later time. Therefore, rather than stalling the evaluation until the state is defined, the process may proceed in parallel presuming that one of the presumptive states is the correct one, and that when the actual state is known, the result will then be available with low latency.

"It should be clear that by speculatively executing possibly incorrect transition paths (that are later abandoned), the system is inefficient with respect to computational 'work' per executed instruction. However, in many cases, computer systems are designed with an excess of computational capacity for the average condition, and therefore fully loading the processor(s) during this process exploits an otherwise expiring resource in a multi-purpose system. Indeed, even in a dedicated system, computational processing power is relatively low cost, and in many instances, the processing latency imposes a high cost, so the parallel speculative execution is deemed efficient.

"The skeleton is stored in memory as an array of items containing information about each XML element. There is one item for each element, and the item contains the start and end position of the element. The order of items in the skeleton corresponds to a depth-first ordering of the represented elements.

"In order to identify the textual range occupied by an element, the preparser maintains a stack of incomplete items. When a start-tag is seen, the preparser creates a skeleton item with the starting position of the element and appends it to the skeleton array. Meanwhile, it also pushes a reference to the item on its stack. Since start-tags and end-tags of an XML document must strictly match in a paired fashion, when the end-tag is encountered, it must belong to the top item on the stack. It thus fills in the ending position in the item, and pops the item reference off the stack. Therefore, when a well-balanced XML fragment is completely preparsed, the stack must be empty.

"It is therefore an object of the invention to provide a method of preparsing a data object having a data structure comprising a hierarchy, comprising dividing the data object into a plurality of chunks; evaluating a chunk based on a plurality of potentially valid initial states; generating, from a plurality of potentially valid initial states of the chunk, a skeleton representing a hierarchical arrangement of the data in the chunk and an end state of the chunk; selecting the skeleton for an actual initial state of the chunk; and using the hierarchical arrangement of the data in the chunk and end state of the chunk to interpret a subsequent chunk of the data object.

"It is a further object to provide a method for processing an input file or stream, comprising receiving a definition of an original finite state machine; transforming the original finite state machine to a plurality of corresponding finite state machines, each corresponding finite state machine having an output equivalent to the original finite state machine subject to a different respective starting state; receiving a data object subject to a plurality of possible starting states; processing the data object with the plurality of corresponding finite state machines; and selecting an output of one corresponding finite state machines corresponding to a proper starting state for the data object.

"It is a still further object to provide a computer readable medium, storing therein instructions for controlling a programmable processor to perform the steps of receiving a data object having a parsable hierarchical structure and one of a finite number of possible starting states; processing, the data object with a plurality of finite state machines each corresponding to a different respective one of the finite number of possible starting states; and selecting an output of one finite state machine corresponding to a proper starting state for the data object.

"Another object provides an apparatus adapted to process an input file, comprising an input adapted to receive at least a portion of a data object having a hierarchical structure parsable by a finite state machine, and having one of a finite number of possible starting states as an actual starting state; a memory storing a definition of the finite state machine, transformed for each of the possible starting states; at least one processor, having a plurality of processing resources available concurrently, adapted for: processing the at least a portion of the data object with the transformed finite state machines to analyze the hierarchical structure; determining a valid one of the possible starting states; and an output adapted to present at least the hierarchical structure associated with the analyzing by the transformed finite state machine associated with the valid one of the possible starting states.

"A set of remnants from state transitions of an earlier data object may be used to resolve a later data object, and the structure of a later data object may be interpreted based on remnants from an earlier data object. The remnants are, for example, unmatched results from state transitions. The processing of remnants may be separate from the processing with the finite state machine.

"In some cases, at least one of the plurality of corresponding finite state machines results in a dead state from which there is no valid transition, and therefore all subsequent processing predicated on the respective state may be terminated. In other cases, the starting state must be resolved in order to determine the correct output state of a processed data object or portion thereof.

"The data object is, for example, part of a series of data objects, each data object having an end state corresponding to a begin state of a next data object, and having a set of remnants from unmatched state transitions, wherein a hierarchical structure of a set of data objects is determined based at least on the state analysis with the finite state machines and the remnants.

"A system for executing the method is, for example, a multicore processor, having a plurality of processing cores each adapted to processor portions of a data object in parallel, memory for storing executable code, memory for storing the portions of the data objects, analysis results, and remnants, and an output, which, for example, may be used as an input for a definitive parsing of the data object in accordance with a determined hierarchical structure or skeleton of the data object. The apparatus may further be adapted to segment the data object into chunks for processing. Likewise, the apparatus may further define a set of transformed finite state machines, each representing a possible starting state for a data object, from a generic finite state machine."

For additional information on this patent, see: Chiu, Kenneth. Parallel XML Parsing Using meta-DFAs. U.S. Patent Number 8782514, filed December 11, 2009, and published online on July 15, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8782514.PN.&OS=PN/8782514RS=PN/8782514

Keywords for this news article include: The Research Foundation for The State University of New York.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Journal of Engineering


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters