I need to scan very large JSONL files efficiently and am considering a parallel grep-style approach over line-delimited text.

Would love to hear how you would design it.

  • bizdelnick@lemmy.ml
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    6 days ago

    Sorry, I missed that L, and I’ve never heard about JSONL before (although worked with JSON logs that are effectively JSONL). So, well, you may use grep, however it can be inefficient (depends on regex engine and how good you are in regexes). It is also easy to make a mistake if you are not very proficient in regexes. So I’d prefer using JSON parser (jq or another, maybe lower level if performance matters) over grep anyway.

      • bizdelnick@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        3 days ago

        It will not if your parser is not overcomplicated and does not populate some huge structures with data. You only need to find tokens and compare them with field names and values you are looking for. Regexes are slower and don’t allow processing escaped characters correctly.