COZYROC Avro components, part of COZYROC SSIS+ suite, are third-party plug-ins for Microsoft SSIS that make it easy to parse and generate Apache Avro files. The toolkit is easy to use and follows the same guidelines and principles used by the standard out-of-the-box SSIS components.
The Apache Avro integration package consists of a Avro Source and Avro Destination components that enable reading/generating Avro files.
Avro file schema
- Both the Source and the Destination components can deduce the data schema from the provided sample file. However, if for some reason a sample file is not available at the time the package is designed, the schema can be entered into the Destination editor in JSON format.
-
Each element of the schema is represented by two mandatory attributes:
name
,type
and one optionalfields
. Thetype
property can represent a primitive data type (possible types and their corresponding SSIS types are given in the table below) or complex types such as:record
meaning nested object andarray
meaning collection (which can also consist of record type objects). Only elements of a complex type such as arecord
orarray
have afields
attribute in which the description of nested objects is stored.
name
, age
, salary
, email
properties and a list of objects of type record
containing properties: street
, city
and country
record with name
and code
properties as pictured below
can be presented with a schema string like this:
Avro Data Type | SSIS Data Type |
---|---|
null | DT_WSTR |
boolean | DT_BOOL |
string | DT_WSTR |
int | DT_I4 |
long | DT_I8 |
float | DT_R4 |
double | DT_R8 |
bytes | DT_BYTES |
fixed | DT_BYTES |
enum | DT_WSTR |
union["null", "{avro_data_type}"] | Detects the Avro data type from the primitive types above |
union["{avro_data_type1}", "{avro_data_type2}"...] | Detects all Avro data types and for each of them creates an output column that corresponds to the specified data type. The format of the column name is ColumnName_AvroDataType |
map | DT_NTEXT |
Overview
Avro Source Component is SSIS Data Flow Component for retrieving data from an Apache Avro file that supports multiple outputs via the composite records pattern.
- Supports reading the Apache Avro files.
- Supports the following Avro sources: File and Variable.
- Component metadata is automatically retrieved from the provided Avro file.
- Apache Avro file schema is automatically retrieved from the provided file and gets populated in the component editor.
- Supports composite outputs. Besides the root Avro Source Output that contains the top-level fields, for any nested arrays, corresponding composite outputs get populated.
- Supports an error output for redirecting problematic records (in case of a failure processing the field values).
Quick Start
In this section we will show you how to set up an Avro Source component.
- When clicking on Columns tab the component would prepare the outputs and external columns by analyzing the existing data in the Schema text editor. Please note that the Avro Source can have multiple outputs (see the article about composite records), which columns you can see. The data in these outputs can be processed by downstream transformation and destination components(e.g. multiple OLE DB Destinations can store the data in SQL Server database).
- Click OK to close the component editor.
Congratulations! You have successfully configured the Avro Source component.
Parameters
Configuration
Use the parameters below to configure the component.
Indicates the source of Avro data. The following options are available:
Value Description File Select an existing File Connection Manager or create a new one. Variable The Avro data is available in a variable. Select a variable or create a new one. JSON string representing the schema of the Avro file.
Knowledge Base
What's New
- New: Introduced component.
Overview
Avro Destination Component is SSIS Data Flow Component for generating Apache Avro files.
- The component metadata is either automatically retrieved from a sample Avro file or can be manually specified in JSON format.
- The generated Avro file can contain nested arrays of objects following the composite records pattern), where the fields for the arrays are fed via separate inputs.
- The generated Avro content can be written to a file or stored in a variable.
Quick Start
In this section we will show you how to set up an Avro Destination component.
- When clicking on Mapping tab the component would prepare the inputs and external columns by analyzing the schema in the Schema text editor. Please note that the Avro Destination can have multiple inputs (see the article about composite records), which columns you can see. The data in these inputs can be processed by upstream transformation and source components (e.g. a Query Transformation can be used to retrieve the necessary data from SQL Server database).
- Click OK to close the component editor.
Congratulations! You have successfully configured the Avro Destination component.
Parameters
Configuration
Use the parameters below to configure the component.
Indicates the destination of Avro data. The following options are available:
Value Description File The Avro data will be stored in a file. Select an existing File Connection Manager or create a new one. Variable The Avro data will be stored in a variable. Select a variable or create a new one. JSON string representing the schema of the Avro file.
Knowledge Base
What's New
- New: Introduced component.
Knowledge Base
COZYROC SSIS+ Components Suite is free for testing in your development environment.
A licensed version can be deployed on-premises, on Azure-SSIS IR and on COZYROC Cloud.