java.lang.Object
com.tectonica.xmlchunk.XmlChunker
An XML data extraction class, intended for XML source that has the following properties:
- May be infinitely large (can't be held in memory in its entirety)
- Has a repetitive structure, where sub-XML records of interest are all located at some constant depth/level
- Sub-XML records are small enough to be read and parsed in memory
The XML source will be broken into 'chunks', each representing one XML sub-tree positioned at the target depth. The
chunk will be passed to the caller as an in-memory DOM Element
.
For example, given the following XML:
<?xml version="1.0" encoding="UTF-8"?> <Level1> <Level2a> .. <Level3a> .. <Level4> .. </Level4> .. </Level3a> <Level3b> .. </Level3b> .. </Level2a> <Level2b> .. </Level2b> </Level1> *
Requesting a target depth of 2 would yield two chunks, <Level2a>..</Level2a>
(including its entire sub-tree),
and <Level2b>..</Level2b>
.
- Author:
- Zach Melamed
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interface
An interface that the user ofXmlChunker
must implement in order to get the 'chunks' extracted from the XML source -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
parse
(InputStream is, String encoding, int targetDepth, XmlChunker.Listener listener) Extracts 'chunks' of an XML source into a user-providedXmlChunker.Listener
-
Constructor Details
-
XmlChunker
public XmlChunker()
-
-
Method Details
-
parse
public static void parse(InputStream is, String encoding, int targetDepth, XmlChunker.Listener listener) Extracts 'chunks' of an XML source into a user-providedXmlChunker.Listener
- Parameters:
is
- theInputStream
of the XML sourceencoding
- the text encoding of the XML source (use"UTF-8"
if not sure)targetDepth
- the level at which the chunks are positioned in the XML sourcelistener
- an implementation of aXmlChunker.Listener
for taking the chunks
-