Tools

zearch: Our tool for regular expression searching on grammar-compressed text. Available at GitHub

grep: Version 3.1. Counting matching lines with -c option

ripgrep: Version 0.10.0. Counting matching lines with -c option

hyperscan: Version 5.0.0-3. Counting matching lines using a modified version of simplegrep, available at GitHub

lz4|hyperscan: Decompress the file with lz4 and count matching lines with hyperscan

zstd|hyperscan: Decompress the file with zstd and count matching lines with hyperscan

lz4|grep: Decompress the file with lz4 and count matching lines with grep

zstd|grep: Decompress the file with zstd and count matching lines with grep

lz4|ripgrep: Decompress the file with lz4 and count matching lines with ripgrep

zstd|ripgrep: Decompress the file with zstd and count matching lines with ripgrep

repair: Grammar based compressor implementin the Recursive Pairing algorithm. Can be downloaded from this link

lz4: Version 1.8.3. Used with maximum compression level enabled (-9)

zstd: Version 1.3.6. Used with maximum compression level enabled (--ultra -22)

gzip: Version 1.9. Used with maximum compression level enabled (-9)

Overview

The running time shown for each regular expression is the confidence interval computed over 30 runs, measured after a "warming up" run. When the confidence intervals of two experiments do not overlap then we have enough statistical evidence to claim that one tool outperforms the other on the given experiment. If an execution takes more than 10 times the time required by zearch it is considered a timeout.

Subtitles

Regular Expressions

r1: ".",
r2: "wosel",
r3: "but where are you",
r4: "have",
r5: "I love you",
r6: "a",
r7: "\.",
r8: "I .* you",
r9: "[a-z]{4}",
r10: "[0-9]{9}",
r11: " (19|20)[0-9]{2} ",
r12: " [a-z]{2} ",
r13: " [0-9]5[0-9]0[0-9]4[0-9]5[0-9] ",

Graphs

Gutenberg

Regular Expressions

r1: ".",
r2: "wosel",
r3: "but where are you",
r4: "have",
r5: "I love you",
r6: "a",
r7: "\.",
r8: "I .* you",
r9: "[a-z]{4}",
r10: "[0-9]{9}",
r11: " (19|20)[0-9]{2} ",
r12: " [a-z]{2} ",
r13: " [0-9]5[0-9]0[0-9]4[0-9]5[0-9] ",

Graphs

CSV

Regular Expressions

r1: ".",
r2: "wosel",
r3: "1993",
r4: "20[0-9]{2}",
r5: ".*5",
r6: "[a-z]{5}",
r7: " [0-9]{9} ",

Graphs

Logs

Regular Expressions

r1: ".",
r2: "wosel",
r3: "port",
r4: "20[0-9]{2}",
r5: "([0-9]{3}\.){3}[0-9]",
r6: "[0-9]{4}",
r7: "([a-z]+\.)+[a-z]+ - -",
r8: ""GET .*" ([13-9]|2[1-9]|2-[1-9])",
r9: "(([0-9])|([0-2][0-9])|([3][0-1]))/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/[0-9]{4}",
r10: "(([0-9])|([0-2][0-9])|([3][0-1]))-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]{4}",

Graphs

Qwerty

Regular Expressions

r1: ".",
r2: "qwerty",
r3: "qwerti",
r4: "wosel",
r5: "[a-z]{5}",