curl https://clickhouse.com/ | sh
/clickhouse local
.local version 24.1.1.1017 (official build).
ClickHouse
local :) select tx_id, item_price from file('./data/sales.csv') where tx_id = 'YMEVOKU194' settings format_csv_delimiter='|';
c02g90ftml85.
SELECT
tx_id,
item_priceFROM file('./data/sales.csv')
WHERE tx_id = 'YMEVOKU194'
= '|'
SETTINGS format_csv_delimiter
Query id: 214b4d6f-bfde-4fff-98da-d9d9ace6d9f6
┌─tx_id──────┬─item_price─┐28.85 │
│ YMEVOKU194 │ 4.12 │
│ YMEVOKU194 │
└────────────┴────────────┘
2 rows in set. Elapsed: 0.032 sec.
./clickhouse local -q "SELECT * FROM file('data/sales.csv') settings format_csv_delimiter='|'"
./clickhouse server
And in another window connect to it
./clickhouse connect
Read CSV and output in Parquet:
./clickhouse local -q "SELECT * FROM file('data/sales.csv', CSVWithNames) INTO OUTFILE 'data/sales.parquet' FORMAT Parquet" \
--format_csv_delimiter='|' \
--output_format_parquet_compression_method=gzip
And then read that Parquet file and output json lines:
./clickhouse local -q "select * from file('data/sales.parquet') format JSONEachRow"
That would print a json object for each row in the table.
Assuming you have a server running.
When ClickHouse server is running, data can only be read from
user_files
folder. First step would be to copy csv
files to it:
cp ./data/sales.csv ./user_files/
Create an script named import_data.sql
that will
create the table and import data to it:
DROP TABLE IF EXISTS compras;
CREATE TABLE IF NOT EXISTS compras (
10) NOT NULL,
tx_id FixedString(NOT NULL,
tx_time DateTime NOT NULL,
buyer String 3) NOT NULL,
currency_code FixedString(NOT NULL,
payment_type String NOT NULL,
credit_card_number String 2) NOT NULL,
country FixedString(NOT NULL,
department String NOT NULL,
product String NOT NULL,
item_price Float64 4) NULL,
coupon_code FixedString(NULL
was_returned String
)= MergeTree
ENGINE ORDER BY tx_time;
SET format_csv_delimiter = '|';
SET date_time_input_format = 'best_effort';
INSERT INTO compras SELECT * FROM file('sales.csv', CSVWithNames);
And then you can run the file to do the import:
./clickhouse client --queries-file import_data.sql
Afterwards, the data can be queried from compras
table:
./clickhouse client -q "select currency_code, sum(item_price) from compras group by currency_code"