The SQL Snippet transformer is a module which allows you to use a SQL
SELECT statement to transform data flowing through a workflow.
The data send as input to this module, accessible as a SQL table
The result of your
Why is this useful?
You can use SQL to transform data into the right shape for a model. For instance, you may need to change field names, remove unneeded fields, or aggregate and group your data.
You may want to post-process the outputs of the model. For instance, if a model scores customers on their propensity to purchase again (on a scale of 0 to 1), you may want to bucket users into risk cohorts such as
Using SQL on CSVs and Excel
Data often lives outside of a data warehouse, which can make it difficult to transform, process, and analyse. It is often useful to simply run SQL on sources such as CSVs and Excels for reporting and analytics which is hard to do in traditional desktop tooling due to data size, processing time, or complexity.
Imagine you had a CSV of order data with the following columns:
date, customer_id, price, profit, country, premium_user, and customer_age.
But you only need
price, and you want
customer_id to be renamed to
You could add a SQL snippet after your CSV Source with the following SQL snippet.
SELECT "customer_id" as id, "price", "date" from FLOWFILE
Does NStack provide any useful snippets?
Yes! NStack has an open-source library of user contributed snippets for common data processing tasks. You can find them here.
What kind of SQL can I write?
NStack's SQL Snippet accepts SQL92. You can find a full reference in the Apache Calcite documentation.