1
1
mirror of https://github.com/theoludwig/billion_row_challenge.git synced 2024-10-29 22:18:25 +01:00
My Solution for the 1 Billion Row Challenge, implemented in the Rust Programming Language.
Go to file
2024-06-23 11:21:26 +02:00
.github/workflows ci: add GitHub Action 2024-06-22 23:00:49 +02:00
.vscode chore: initial commit 2024-06-22 22:52:12 +02:00
src chore: initial commit 2024-06-22 22:52:12 +02:00
tests test: add integration 2024-06-23 11:21:26 +02:00
.editorconfig chore: initial commit 2024-06-22 22:52:12 +02:00
.gitattributes chore: initial commit 2024-06-22 22:52:12 +02:00
.gitignore chore: initial commit 2024-06-22 22:52:12 +02:00
1brc.png chore: initial commit 2024-06-22 22:52:12 +02:00
Cargo.lock chore: initial commit 2024-06-22 22:52:12 +02:00
Cargo.toml chore: initial commit 2024-06-22 22:52:12 +02:00
LICENSE chore: initial commit 2024-06-22 22:52:12 +02:00
README.md ci: add GitHub Action 2024-06-22 23:00:49 +02:00

theoludwig/billion_row_challenge

My Solution for the 1 Billion Row Challenge, implemented in the Rust Programming Language.

CI Rust Conventional Commits Licence MIT

About

1🐝🏎️ The One Billion Row Challenge (1BRC) is a fun exploration of how far modern programming languages (initally only Java) can be pushed to calculate the min, max, and average of 1 billion measurements as fast as possible.

The repository contains my solution for the 1BRC challenge, implemented in the Rust programming language.

1BRC

Getting Started

Prerequisites

  • Rust >= v1.79.0
  • Java v21 (used to generate the 1 billion row data)

Installation

# Clone the repository
git clone git@github.com:theoludwig/billion_row_challenge.git

# Go to the project root
cd billion_row_challenge

# Rust related commands
cargo run
cargo build --release
cargo test
cargo clippy --verbose -- -D warnings
cargo fmt -- --check

Usage

# Build (optimized)
cargo build --release

# Usage: ./target/release/billion_row_challenge <input_file_path>

# Example with fixture data
./target/release/billion_row_challenge ./tests/fixtures/10/input.txt

# Example with the 1 billion row data (not included in the repository, needs to be generated)
./target/release/billion_row_challenge ./1brc/measurements.txt

Generate the 1 Billion Row Data (~12GB)

# Clone the 1brc repository
git clone git@github.com:gunnarmorling/1brc.git

# Go to the project root
cd 1brc

# Build the project using Apache Maven
./mvnw clean verify

# Create the `measurements.txt` file with 1B rows
./create_measurements.sh 1000000000

Challenge Instructions

The text file contains temperature values for a range of weather stations. Each row is one measurement in the format <string: station name>;<double: measurement>, with the measurement value having exactly one fractional digit. The following shows ten rows as an example:

Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
St. John's;15.2
Cracow;12.6
Bridgetown;26.9
Istanbul;6.2
Roseau;34.4
Conakry;31.2
Istanbul;23.0

The task is to write a program which reads the file, calculates the min, mean, and max temperature value per weather station, and emits the results on stdout like this (i.e. sorted alphabetically by station name, and the result values per station in the format <min>/<mean>/<max>, rounded to one fractional digit):

{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, Abéché=-10.0/29.4/69.0, Accra=-10.1/26.4/66.4, Addis Ababa=-23.7/16.0/67.0, Adelaide=-27.8/17.3/58.5, ...}

Limits

  • Input value ranges are as follows:
    • Station name: non null UTF-8 string of min length 1 character and max length 100 bytes, containing neither ; nor \n characters. (i.e. this could be 100 one-byte characters, or 50 two-byte characters, etc.).
    • Temperature value: non null double between -99.9 (inclusive) and 99.9 (inclusive), always with one fractional digit.
  • There is a maximum of 10,000 unique station names.
  • Line endings in the file are \n characters on all platforms.
  • The rounding of output values must be done using the semantics of IEEE 754 rounding-direction "roundTowardPositive".

Examples

See the tests/fixtures folder for examples of input/output.

Input

Halifax;12.9
Zagreb;12.2
Cabo San Lucas;14.9
Adelaide;15.0
Ségou;25.7
Pittsburgh;9.7
Karachi;15.4
Xi'an;24.2
Dodoma;22.2
Tauranga;38.2

Output

{Adelaide=15.0/15.0/15.0, Cabo San Lucas=14.9/14.9/14.9, Dodoma=22.2/22.2/22.2, Halifax=12.9/12.9/12.9, Karachi=15.4/15.4/15.4, Pittsburgh=9.7/9.7/9.7, Ségou=25.7/25.7/25.7, Tauranga=38.2/38.2/38.2, Xi'an=24.2/24.2/24.2, Zagreb=12.2/12.2/12.2}

License

MIT