These SQL snippets are contributed by the NStack team and community. They can be run in NStack using the SQL Transformation module.
These snippets are based on an example e-commerce dataset which you can find here.
Bucket by value
SELECT *, CASE WHEN price <= 0 THEN CAST('invalid' as VARCHAR(10)) WHEN price > 0 and price <= 25 THEN CAST('low' as VARCHAR(10)) WHEN price > 25 and price <= 100 THEN CAST('medium' as VARCHAR(10)) ELSE CAST('high' as VARCHAR(10)) END AS bucket FROM FLOWFILE
Treat any value greater than 100 the same (for instance, 100 and 1000 are treated the same).
SELECT CASE WHEN price < 100 THEN price ELSE 100 END AS bucket FROM FLOWFILE
Average, maxima, minima, and variance
Generate useful statistics for a column.
SELECT category_1, avg(price) AS avg_price, max(price) AS max_price, min(price) AS min_price, ((sum(price*price)/(count(*)-1))-(avg(price)*avg(price))) AS var_price FROM FLOWFILE WHERE price IS NOT NULL GROUP BY category_1 ORDER BY avg_price
Replace null with the average
null values with the average in price column
SELECT CASE WHEN price is not null THEN price ELSE (SELECT avg(price) as avgg FROM FLOWFILE) END AS price_filled FROM FLOWFILE
Replace null with the mode
null values with the mode (i.e, the value occurring most often)
SELECT CASE WHEN category_2 is not null THEN category_2 ELSE (SELECT category_2 FROM FLOWFILE GROUP BY category_2 ORDER BY count(*) DESC LIMIT 1) END AS category_2_filled FROM FLOWFILE