Overview
Data flow transformation for extraction of information from input column containing XML documents, using XPath expressions. More than one expression can be provided, and the number of output column names provided should match.
If the transformation runs in merging mode, the output is synchronous to the input, and all the matches from each of the XPath expressions, are joined with the given match separator, and put in the corresponding column.
If the MergeResults parameter is false, the output is asynchronous and each match, from each of the XPath expressions, occupies a separate row in the corresponding column. The results from each expression, are stacked together and send to the output, which means for every XML document from the input (i.e. for each input row), the number of rows in the output equals the maximal number of matches from all of the XPath expressions.
Here is an exemplary XML document, which corresponds to one row from the input:
If we've setup two XPath expressions, corresponding to two columns:
- Column
Title
is filled from this expressions/bookstore/book/title
- Column
Author
is filled from this expression:/bookstore/book/author
.
The resulting output rows, send for this input row , will be these:
Title | Author |
---|---|
Everyday Italian | Giada De Laurentiis |
Harry Potter | J K. Rowling |
XQuery Kick Start | James McGovern |
NULL | Per Bothner |
NULL | Kurt Cagle |
NULL | James Linn |
NULL | Vaidyanathan Nagarajan |
If we run with the same setup, but MergeResults set to true, and ResultSeparator set to ,
the output would be:
Title | Author |
---|---|
Everyday Italian,Harry Potter,XQuery Kick Start | Giada De Laurentiis,J K. Rowling,James McGovern,Per Bothner,Kurt Cagle,James Linn,Vaidyanathan Nagarajan |
Setup
The script has the following parameters:
- DocumentColumn - the input column containing XML documents to process.
- XPathExpressions - the list of XPath expressions for extraction, specified one per line.
- XPathNamespaces - the namespaces of elements, which are referred in the XPath expression. The format is
[namespace prefix]
=[namespace]
. Multiple namespaces are separated with newline. - ErrorRowDisposition - what happens, when there is an error in processing - usually in document parsing. The possible values are
IgnoreFailure
,FailComponent
andRedirectRow
. If the latter is selected, an error output is added, which is synchronous with the input, and has theErrorDescription
column added. - ResultColumns - the list of output column names - one for each XPath expression provided. Comma separated.
- MergeResults - whether to merge multiple results for a XPath expression, or run in an asynchronous mode.
- ResultSeparator - if the merging mode is on, this one specified what separator to be used when joining the multiple matches.
Configuration
To use this script, you would need to load it in COZYROC JavaScript Component. If you are using COZYROC SSIS+ 2.0 or later, after selecting the corresponding script type and opening the component editor, you can select the script from a dropdown list with the pre-built scripts. For COZYROC SSIS+ 1.9, you can download the JavaScript file and browse to it via the "Import JavaScript code" button.
XPath_Transformation.jsKnowledge Base
COZYROC SSIS+ Components Suite is free for testing in your development environment.
A licensed version can be deployed on-premises, on Azure-SSIS IR and on COZYROC Cloud.