java.lang.Object
com.tectonica.xmlchunk.XmlChunker
An XML data extraction class, intended for XML source that has the following properties:
- May be infinitely large (can't be held in memory in its entirety)
- Has a repetitive structure, where sub-XML records of interest are all located at some constant depth/level
- Sub-XML records are small enough to be read and parsed in memory
The XML source will be broken into 'chunks', each representing one XML sub-tree positioned at the target depth. The
chunk will be passed to the caller as an in-memory DOM Element.
For example, given the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<Level1>
<Level2a>
..
<Level3a>
..
<Level4>
..
</Level4>
..
</Level3a>
<Level3b>
..
</Level3b>
..
</Level2a>
<Level2b>
..
</Level2b>
</Level1> *
Requesting a target depth of 2 would yield two chunks, <Level2a>..</Level2a> (including its entire sub-tree),
and <Level2b>..</Level2b>.
- Author:
- Zach Melamed
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceAn interface that the user ofXmlChunkermust implement in order to get the 'chunks' extracted from the XML source -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidparse(InputStream is, String encoding, int targetDepth, XmlChunker.Listener listener) Extracts 'chunks' of an XML source into a user-providedXmlChunker.Listener
-
Constructor Details
-
XmlChunker
public XmlChunker()
-
-
Method Details
-
parse
public static void parse(InputStream is, String encoding, int targetDepth, XmlChunker.Listener listener) Extracts 'chunks' of an XML source into a user-providedXmlChunker.Listener- Parameters:
is- theInputStreamof the XML sourceencoding- the text encoding of the XML source (use"UTF-8"if not sure)targetDepth- the level at which the chunks are positioned in the XML sourcelistener- an implementation of aXmlChunker.Listenerfor taking the chunks
-