theoludwig/billion_row_challenge

My Solution for the 1 Billion Row Challenge, implemented in the Rust Programming Language.

CI Rust Conventional Commits Licence MIT

## About 1️⃣🐝🏎️ The One Billion Row Challenge (1BRC) is a fun exploration of how far modern programming languages (initally only Java) can be pushed to **calculate** the **min, max, and average of 1 billion measurements** as fast as possible. The repository contains **my solution** for the [1BRC](https://1brc.dev/) challenge, implemented in the [Rust programming language](https://www.rust-lang.org/). ![1BRC](./1brc.png) ### Links - - - ## Getting Started ### Prerequisites - [Rust](https://www.rust-lang.org/) >= v1.79.0 - [Java](https://openjdk.org/) v21 (used to generate the 1 billion row data) ### Installation ```sh # Clone the repository git clone git@github.com:theoludwig/billion_row_challenge.git # Go to the project root cd billion_row_challenge # Rust related commands cargo run cargo build --release cargo test cargo clippy --verbose -- -D warnings cargo fmt -- --check ``` ### Usage ```sh # Build (optimized) cargo build --release # Usage: ./target/release/billion_row_challenge # Example with fixture data ./target/release/billion_row_challenge ./tests/fixtures/10/input.txt # Example with the 1 billion row data (not included in the repository, needs to be generated) ./target/release/billion_row_challenge ./1brc/measurements.txt ``` ### Generate the 1 Billion Row Data (~12GB) ```sh # Clone the 1brc repository git clone git@github.com:gunnarmorling/1brc.git # Go to the project root cd 1brc # Build the project using Apache Maven ./mvnw clean verify # Create the `measurements.txt` file with 1B rows ./create_measurements.sh 1000000000 ``` ## Challenge Instructions The text file contains temperature values for a range of weather stations. Each row is one measurement in the format `;`, with the measurement value having exactly one fractional digit. The following shows ten rows as an example: ```txt Hamburg;12.0 Bulawayo;8.9 Palembang;38.8 St. John's;15.2 Cracow;12.6 Bridgetown;26.9 Istanbul;6.2 Roseau;34.4 Conakry;31.2 Istanbul;23.0 ``` The task is to write a program which reads the file, calculates the **min**, **mean**, and **max** temperature value **per weather station**, and emits the results on stdout like this (i.e. sorted alphabetically by station name, and the result values per station in the format `//`, rounded to one fractional digit): ```txt {Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, Abéché=-10.0/29.4/69.0, Accra=-10.1/26.4/66.4, Addis Ababa=-23.7/16.0/67.0, Adelaide=-27.8/17.3/58.5, ...} ``` ### Limits - Input value ranges are as follows: - **Station name:** non null UTF-8 string of min length 1 character and max length 100 bytes, containing neither `;` nor `\n` characters. (i.e. this could be 100 one-byte characters, or 50 two-byte characters, etc.). - **Temperature value:** non null double between -99.9 (inclusive) and 99.9 (inclusive), always with one fractional digit. - There is a maximum of $10,000$ unique station names. - Line endings in the file are `\n` characters on all platforms. - The rounding of output values must be done using the semantics of IEEE 754 rounding-direction "roundTowardPositive". ### Examples See the [`tests/fixtures`](./tests/fixtures) folder for examples of input/output. #### Input ```txt Halifax;12.9 Zagreb;12.2 Cabo San Lucas;14.9 Adelaide;15.0 Ségou;25.7 Pittsburgh;9.7 Karachi;15.4 Xi'an;24.2 Dodoma;22.2 Tauranga;38.2 ``` #### Output ```txt {Adelaide=15.0/15.0/15.0, Cabo San Lucas=14.9/14.9/14.9, Dodoma=22.2/22.2/22.2, Halifax=12.9/12.9/12.9, Karachi=15.4/15.4/15.4, Pittsburgh=9.7/9.7/9.7, Ségou=25.7/25.7/25.7, Tauranga=38.2/38.2/38.2, Xi'an=24.2/24.2/24.2, Zagreb=12.2/12.2/12.2} ``` ## License [MIT](./LICENSE)