Module jonix.xml

Class XmlChunker

java.lang.Object
com.tectonica.xmlchunk.XmlChunker

public class XmlChunker extends Object
An XML data extraction class, intended for XML source that has the following properties:

  • May be infinitely large (can't be held in memory in its entirety)
  • Has a repetitive structure, where sub-XML records of interest are all located at some constant depth/level
  • Sub-XML records are small enough to be read and parsed in memory

The XML source will be broken into 'chunks', each representing one XML sub-tree positioned at the target depth. The chunk will be passed to the caller as an in-memory DOM Element.

For example, given the following XML:

 <?xml version="1.0" encoding="UTF-8"?>
 <Level1>
     <Level2a>
         ..
         <Level3a>
             ..
             <Level4>
                 ..
             </Level4>
             ..
         </Level3a>

         <Level3b>
             ..
         </Level3b>
         ..
     </Level2a>

     <Level2b>
     ..
     </Level2b>
 </Level1> *
 

Requesting a target depth of 2 would yield two chunks, <Level2a>..</Level2a> (including its entire sub-tree), and <Level2b>..</Level2b>.

Author:
Zach Melamed
  • Constructor Details

    • XmlChunker

      public XmlChunker()
  • Method Details

    • parse

      public static void parse(InputStream is, String encoding, int targetDepth, XmlChunker.Listener listener)
      Extracts 'chunks' of an XML source into a user-provided XmlChunker.Listener
      Parameters:
      is - the InputStream of the XML source
      encoding - the text encoding of the XML source (use "UTF-8" if not sure)
      targetDepth - the level at which the chunks are positioned in the XML source
      listener - an implementation of a XmlChunker.Listener for taking the chunks