4.7 KiB
theoludwig/billion_row_challenge
My Solution for the 1 Billion Row Challenge, implemented in the Rust Programming Language.
About
1️⃣🐝🏎️ The One Billion Row Challenge (1BRC) is a fun exploration of how far modern programming languages (initally only Java) can be pushed to calculate the min, max, and average of 1 billion measurements as fast as possible.
The repository contains my solution for the 1BRC challenge, implemented in the Rust programming language.
Links
- https://github.com/gunnarmorling/1brc
- https://www.morling.dev/blog/one-billion-row-challenge/
- https://1brc.dev/
Getting Started
Prerequisites
Installation
# Clone the repository
git clone git@github.com:theoludwig/billion_row_challenge.git
# Go to the project root
cd billion_row_challenge
# Rust related commands
cargo run
cargo build --release
cargo test
cargo clippy --verbose -- -D warnings
cargo fmt -- --check
Usage
# Build (optimized)
cargo build --release
# Usage: ./target/release/billion_row_challenge <input_file_path>
# Example with fixture data
./target/release/billion_row_challenge ./tests/fixtures/10/input.txt
# Example with the 1 billion row data (not included in the repository, needs to be generated)
./target/release/billion_row_challenge ./1brc/measurements.txt
Generate the 1 Billion Row Data (~12GB)
# Clone the 1brc repository
git clone git@github.com:gunnarmorling/1brc.git
# Go to the project root
cd 1brc
# Build the project using Apache Maven
./mvnw clean verify
# Create the `measurements.txt` file with 1B rows
./create_measurements.sh 1000000000
Challenge Instructions
The text file contains temperature values for a range of weather stations. Each row is one measurement in the format <string: station name>;<double: measurement>
, with the measurement value having exactly one fractional digit. The following shows ten rows as an example:
Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
St. John's;15.2
Cracow;12.6
Bridgetown;26.9
Istanbul;6.2
Roseau;34.4
Conakry;31.2
Istanbul;23.0
The task is to write a program which reads the file, calculates the min, mean, and max temperature value per weather station, and emits the results on stdout like this (i.e. sorted alphabetically by station name, and the result values per station in the format <min>/<mean>/<max>
, rounded to one fractional digit):
{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, Abéché=-10.0/29.4/69.0, Accra=-10.1/26.4/66.4, Addis Ababa=-23.7/16.0/67.0, Adelaide=-27.8/17.3/58.5, ...}
Limits
- Input value ranges are as follows:
- Station name: non null UTF-8 string of min length 1 character and max length 100 bytes, containing neither
;
nor\n
characters. (i.e. this could be 100 one-byte characters, or 50 two-byte characters, etc.). - Temperature value: non null double between -99.9 (inclusive) and 99.9 (inclusive), always with one fractional digit.
- Station name: non null UTF-8 string of min length 1 character and max length 100 bytes, containing neither
- There is a maximum of
10,000
unique station names. - Line endings in the file are
\n
characters on all platforms. - The rounding of output values must be done using the semantics of IEEE 754 rounding-direction "roundTowardPositive".
Examples
See the tests/fixtures
folder for examples of input/output.
Input
Halifax;12.9
Zagreb;12.2
Cabo San Lucas;14.9
Adelaide;15.0
Ségou;25.7
Pittsburgh;9.7
Karachi;15.4
Xi'an;24.2
Dodoma;22.2
Tauranga;38.2
Output
{Adelaide=15.0/15.0/15.0, Cabo San Lucas=14.9/14.9/14.9, Dodoma=22.2/22.2/22.2, Halifax=12.9/12.9/12.9, Karachi=15.4/15.4/15.4, Pittsburgh=9.7/9.7/9.7, Ségou=25.7/25.7/25.7, Tauranga=38.2/38.2/38.2, Xi'an=24.2/24.2/24.2, Zagreb=12.2/12.2/12.2}