Module jonix

Class JonixRecords

java.lang.Object
com.tectonica.jonix.JonixRecords
All Implemented Interfaces:
Iterable<JonixRecord>

public class JonixRecords extends Object implements Iterable<JonixRecord>
This class provides the mechanism to scan one or more ONIX sources, and process the ONIX records they contain (typically an ONIX Header followed by one or more ONIX Product records).

The normal preparation steps of this class are as follows:

  1. Add one or more ONIX sources
  2. Set the expected encoding of the sources (default is UTF-8)
  3. Optionally, set event handlers to be fired during processing
  4. Optionally, set key-value pairs, which will be accessible conveniently during processing

Example:

 JonixRecords records = Jonix
     .source(new File("/path/to/folder-with-onix-files"), "*.xml", false)
     .source(new File("/path/to/file-with-short-style-onix-2.xml"))
     .source(new File("/path/to/file-with-reference-style-onix-3.onx"))
     .onSourceStart(src -> { // take a look at:
         // src.onixVersion()
         // src.header()
         // src.sourceName()
     })
     .onSourceEnd(src -> { // take a look at:
         // src.productsProcessedCount()
     })
     .failOnInvalidFile(false);
 

Once the JonixRecords is prepared, processing can be done in several ways:

Iteration

First and foremost, JonixRecords is an Iterable of JonixRecord. Hence, it can be iterated over with a simple for loop. The following loop iterates over the ONIX Products in all sources, and handles them whether they're of version Onix2 or Onix3.
 for (JonixRecord record : records) {
     if (record.product instanceof com.tectonica.jonix.onix3.Product) {
         com.tectonica.jonix.onix3.Product product3 = (com.tectonica.jonix.onix3.Product) record.product;
         // TODO: process the Onix3 <Product>
     } else if (record.product instanceof com.tectonica.jonix.onix2.Product) {
         com.tectonica.jonix.onix2.Product product2 = (com.tectonica.jonix.onix2.Product) record.product;
         // TODO: process the Onix2 <Product>
     } else {
         throw new IllegalArgumentException();
     }
 }
 
To continue this example of low-level handling (staying very close to the structure of the XML data), the following is an elaborate version of the code above, pulling out the ISBN and first contributor from all ONIX Products:
 for (JonixRecord record : records) {
     String isbn13;
     String personName = null;
     List<ContributorRoles> roles = null;
     if (record.product instanceof com.tectonica.jonix.onix2.Product) {
         com.tectonica.jonix.onix2.Product product2 = (com.tectonica.jonix.onix2.Product) record.product;
         isbn13 = product2.productIdentifiers()
             .find(ProductIdentifierTypes.ISBN_13)
             .map(pid -> pid.idValue().value)
             .orElse(null);
         List<com.tectonica.jonix.onix2.Contributor> contributors = product2.contributors();
         if (!contributors.isEmpty()) {
             com.tectonica.jonix.onix2.Contributor firstContributor = contributors.get(0);
             roles = firstContributor.contributorRoles().values();
             personName = firstContributor.personName().value;
         }
     } else if (record.product instanceof com.tectonica.jonix.onix3.Product) {
         com.tectonica.jonix.onix3.Product product3 = (com.tectonica.jonix.onix3.Product) record.product;
         isbn13 = product3.productIdentifiers()
             .find(ProductIdentifierTypes.ISBN_13)
             .map(pid -> pid.idValue().value)
             .orElse(null);
         List<com.tectonica.jonix.onix3.Contributor> contributors = product3.descriptiveDetail().contributors();
         if (!contributors.isEmpty()) {
             com.tectonica.jonix.onix3.Contributor firstContributor = contributors.get(0);
             roles = firstContributor.contributorRoles().values();
             personName = firstContributor.personName().value;
         }
     } else {
         throw new IllegalArgumentException();
     }
     System.out
         .println(String.format("Found ISBN %s, first person is %s, his roles: %s", isbn13, personName, roles));
 }
 

Streaming

It is sometime useful to invoke stream() and use the resulting Stream along with Java 8 Streaming APIs to achieve greater readability. The following examples retrieves the Onix3 Products from their sources and stores them in an in-memory List:
 import com.tectonica.jonix.onix3.Product;
 ...
 List<Product> products3 = records.stream()
     .filter(rec -> rec.product instanceof Product)
     .map(rec -> (Product) rec.product)
     .collect(Collectors.toList());
 

Streaming as Unified Record

One of Jonix's best facilities is the Unification framework, allowing to simplify the treatment in varied sources (Onix2 mixed with Onix3 files) and eliminate some of the intricacies of XML handling. The method streamUnified() returns a Stream, but not of the low-level JonixRecords. Instead it streams out BaseRecords, that contains typed and unified representation of the most essential data within typical ONIX source. The following examples shows how simple it is to extract data from ONIX source without the inherent complications of ONIX diversity:
 Set<PriceTypes> requestedPrices = JonixUtil.setOf(PriceTypes.RRP_including_tax, PriceTypes.RRP_excluding_tax);
 records.streamUnified()
     .map(rec -> rec.product)
     .forEach(product -> {
         String recordReference = product.info.recordReference;
         String isbn13 = product.info.findProductId(ProductIdentifierTypes.ISBN_13);
         String title = product.titles.findTitleText(TitleTypes.Distinctive_title_book);
         List<String> authors = product.contributors.getDisplayNames(ContributorRoles.By_author);
         List<BasePrice> prices = product.supplyDetails.findPrices(requestedPrices);
         List<String> priceLabels = prices.stream()
             .map(bp -> bp.priceAmountAsStr + " " + bp.currencyCode).collect(Collectors.toList());
         System.out.println(String.format("Found product ref. %s, ISBN='%s', Title='%s', authors=%s, prices=%s",
             recordReference, isbn13, title, authors, priceLabels));
     });
 
  • Field Details

    • globalProductCount

      protected final AtomicInteger globalProductCount
    • failOnInvalidFile

      protected boolean failOnInvalidFile
  • Method Details

    • source

      public JonixRecords source(List<File> files)
    • source

      public JonixRecords source(File file)
    • source

      public JonixRecords source(File folder, String glob, boolean recursive) throws IOException
      Throws:
      IOException
    • failOnInvalidFile

      public JonixRecords failOnInvalidFile(boolean fail)
      This method sets the streaming policy when invalid sources are encountered (e.g. file not found). The default behavior is to stop streaming when such error occurs.
    • store

      public <T> JonixRecords store(String key, T value)
      Stores an object for later use during the streaming process. The stored object can be retrieved with retrieve(String).
    • retrieve

      public <T> T retrieve(String key)
      Returns:
      an object stored with store(String, Object) during the streaming, or null if the key doesn't exist
    • retrieve

      public <T> T retrieve(String key, T defaultValue)
      Returns:
      an object stored with store(String, Object) during the streaming, or defaultValue if the key doesn't exist
    • getConfiguration

      public Map<String,Object> getConfiguration()
    • encoding

      public JonixRecords encoding(String encoding)
    • onSourceStart

      public JonixRecords onSourceStart(JonixRecords.OnSourceEvent onSourceStart)
      Registers a listener for SourceStart event, which occurs when a new source is about to be processed but only after the ONIX version and the (optional) ONIX Header have been parsed. These will be available in the JonixSource of the JonixRecords.OnSourceEvent.

      NOTE: this method can be called more than once to register several event-listeners

    • onSourceEnd

      public JonixRecords onSourceEnd(JonixRecords.OnSourceEvent onSourceEnd)
      Registers a listener for SourceEnd event, which occurs when after all records have been processed in the recently opened source. In addition to all the information that was available for event-listeners registered with onSourceStart(OnSourceEvent), the JonixSource when this event is fired also includes JonixSource.productCount(), with the final count of ONIX Products processed from the source.

      NOTE: this method can be called more than once to register several event-listeners

    • stream

      public Stream<JonixRecord> stream()
      Returns:
      a Stream of records, each containing a new Product object and a reference to the source from which it was taken
    • streamUnified

      public Stream<BaseRecord> streamUnified()
      Returns:
      a Stream of records, each containing a new BaseProduct object and a reference to the source from which it was taken
    • streamUnified

      public Stream<BaseRecord> streamUnified(BaseFactory2 baseFactory2, BaseFactory3 baseFactory3)
      Returns:
      a Stream of records, each containing a new BaseProduct object (which was created using the given factories) and a reference to the source from which it was taken
    • streamUnified

      public <P extends UnifiedProduct, H extends UnifiedHeader, R extends UnifiedRecord<P>> Stream<R> streamUnified(CustomUnifier<P,H,R> unifier)
      Returns:
      a Stream of records, each containing a new custom Product object (which was created using the given CustomUnifier) and a reference to the source from which it was taken
    • scanHeaders

      public JonixRecords scanHeaders()
      This will "peek" into the Headers of the indicated ONIX sources, without processing the Products. The onSourceStart() events will be fired as a result, allowing to handle the header information.
    • iterator

      public Iterator<JonixRecord> iterator()
      Specified by:
      iterator in interface Iterable<JonixRecord>