Welcome to our restaurant! This site was designed as an experiment to help people understand how to use simple tools to process data locally.
The file sales.csv
contains a list of purchase
receipts with all the articles they contain. Each purchase is
identified by the field tx_id
and each article bought
has a price item_price
and a two-letter code
identifying the country
.
In this recipe, you are required to produce a list with the
receipts, but adding the full country name at the end. The
translation between country codes and labels can be found in
countries.csv
.
Result should look like:
SBXWHCG156,1.99,MV,Maldives
BEIGKQD194,93.87,MV,Maldives
BYXCGLP161,4.44,MW,Malawi
Solve the exercise by sending the ante-penultimate row, sorted by
item_price
descending.
curl -sX POST https://kitchen.luisbelloch.es/api/:team/e1 -d 'FPQINMY120,0.04,SD,Sudan'
Using awk
, read the file sales.csv
and
produce a list of the top-ten countries with most sales. Sorting is
not a good operation for awk
, you may want to use the
sort
command.
Solve the exercise by sending the second row
curl -sX POST https://kitchen.luisbelloch.es/api/:team/e2a -d 'KM|595.81'
How to prepare a Panda for cooking
Using Pandas, produce a list of the top-ten countries with more sales.
One easy way to get started is to use a Jupyter environment in Docker:
docker run -v $(pwd):/home/jovyan/ -p 8888:8888 -p 4040:4040 jupyter/scipy-notebook
Then navigate to the provider URL. ctrl+enter
executes the current cell, use a
or b
keys
to add cells. Alternatively you may use VS
Code Jupyter extensions, but itβs a bit painful to install.
Solve the exercise by sending the second row
curl -sX POST https://kitchen.luisbelloch.es/api/:team/e2b -d 'KM|595.81'
The file pancake_orders.10M.csv.gz
contains 10
million registers, with the following columns:
ββββββββββββββββ¬βββββββββ¬βββββββββββββ¬βββββββββββββββββ¬ββββββββββββββ
β ts β price β item_count β source_country β coupon_code βtimestamp β double β int64 β varchar β varchar β
β
ββββββββββββββββΌβββββββββΌβββββββββββββΌβββββββββββββββββΌββββββββββββββ€16:36:19.794 β 3.85 β 1 β BR β 501826 β
β 16:49:30.072 β 6.36 β 4 β PY β 2bd108 β
β 16:51:26.371 β 6.36 β 1 β MF β cfaed6 β
β ββββββββββββββββ΄βββββββββ΄βββββββββββββ΄βββββββββββββββββ΄ββββββββββββββ
Data file can be found here: pancake_orders.10M.csv.gz. Do not decompress it, DuckDB is able to read compressed files on the fly.
You are required to get the top-ten countries with more sales,
mixing the data with countries.jsonl
file. You can only
use DuckDB, no external tooling.
Result should look like this:
country,total
Guinea-Bissau,297833.1000000053
"Cocos (Keeling) Islands",298836.7600000056
Namibia,299101.0800000052
...
Solve the exercise by sending the 4th row (Finland).
curl -sX POST https://kitchen.luisbelloch.es/api/:team/e3 -d 'Namibia,299101.0800000052'
Optional: Try to query directly the CSV and also
load the data into a some.duckdb
file. Does the later
make a difference in performance?
Optional: Save the results as Parquet and repeat the performance experiment.
Repeat the DuckDB exercise, but this time using ClickHouse:
clickhouse local
orders
and
countries
clickhouse client
Solve the exercise by sending the 5th row (Peru), like in the previous exercise.
curl -sX POST https://kitchen.luisbelloch.es/api/:team/e4 -d 'Namibia,299101.0800000052'
Using jq
only, read the file
sales.csv
and produce a list the top-ten countries with
most sales. Because reasons.
Food Allergy Warning: Please be advised that our food may have come in contact or contain unix nuts, terminal traces, bash fish or console peanuts.
πΎ Gluten-Free