I usually do this kind of processing by linux pipes, head, tail, cut, sort, uniq, and inline Perl. It is kind of similar to using monads, but you have to handle the formatting to and from text. A few ones of my own creation are a tool for counting and a tool for generating histograms in text. I often chain 5 or 10 of these commands together. My basic data type is similar to CSV, but using "|" instead of comma as separator because it tends not to appear in text as much. On the other hand, not being put in a binary format, my data is very accessible.
It's really too bad that the ASCII codes 29, 30, and 31 (Group, Record, and Unit separators) never took off, as this is exactly what they were designed for.
When implemented, they'd let you include commas, line feeds/carriage returns, etc within your data records.
they'd let you include commas, line feeds/carriage returns, etc within your data records
And there would also be less ambiguity as to what seperator to use. I understand the popularity of CSV, but it's really not so nice to share data with. German customers want semicolons as a seperator, the US ones claims they are right 'because after all it is called comma-seperated and else I cannot import it in Excel' (sic). Etc.