Module jonix.xml

Class XmlChunkerIterator

java.lang.Object
com.tectonica.xmlchunk.XmlChunkerIterator
All Implemented Interfaces:
Iterator<Element>

public class XmlChunkerIterator extends Object implements Iterator<Element>
An iterator for XML data extraction, intended for XML source that has the following properties:

  • May be infinitely large (can't be held in memory in its entirety)
  • Has a repetitive structure, where sub-XML records of interest are all located at some constant depth/level

The XML source will be broken into 'chunks', each representing one XML sub-tree positioned at the target depth (assuming it is small enough to fit in memory). The chunk will be returned by this iterator's next() method as an in-memory DOM Element.

For example, given the following XML:

 <?xml version="1.0" encoding="UTF-8"?>
 <Level1>
     <Level2a>
         ..
         <Level3a>
             ..
             <Level4>
                 ..
             </Level4>
             ..
         </Level3a>

         <Level3b>
             ..
         </Level3b>
         ..
     </Level2a>

     <Level2b>
     ..
     </Level2b>
 </Level1> *
 

Requesting a target depth of 2 would yield two chunks, <Level2a>..</Level2a> (including its entire sub-tree), and <Level2b>..</Level2b>.

Author:
Zach Melamed