The slow part of using awk is waiting for the disk to spin over the magnetic head.
And most laptops have 4 CPU cores these days, and a multiprocess operating system, so you don’t have to wait for random access on a spinning plate to find every bit in order, you can simply have multiple awk commands running in parallel.
Awk is most certainly a better user interface than whatever custom BrandQL you have to use in a textarea in a browser served from localhost:randomport
I haven't been using spinning disks for perf critical tasks for a looong time... but if I recall correctly, using multiple processes to access the data is usually counter-productive since the disk has to keep repositioning its read heads to serve the different processes reading from different positions.
Ideally if the data is laid out optimally on the spinning disk, a single process reading the data would result in a mostly-sequential read with much less time wasted on read head repositioning seeks.
In the odd case where the HDD throughput is greater than a single-threaded CPU processing for whatever reason (eg. you're using a slow language and complicated processing logic?), you can use one optimized process to just read the raw data, and distribute the CPU processing to some other worker pool.
> The slow part of using awk is waiting for the disk to spin over the magnetic head.
If we're talking about 6 TB of data:
- You can upgrade to 8 TB of storage on a 16-inch MacBook Pro for $2,200, and the lowest spec has 12 CPU cores. With up to 400 GB/s of memory bandwidth, it's truly a case of "your big data problem easily fits on my laptop".
- Contemporary motherboards have 4 to 5 M.2 slots, so you could today build a 12 TB RAID 5 setup of 4 TB Samsung 990 PRO NVMe drives for ~ 4 x $326 = $1,304. Probably in a year or two there will be 8 TB NVMe's readily available.
There are relatively cheap adapter boards which let you stick 4 M.2 drives in a single PCIe x16 slot; you can usually configure a x16 slot to be bifurcated (quadfurcated) as 4 x (x4).
To pick a motherboard at quasi-random:
Tyan HX S8050. Two M.2 on the motherboard.
20 M.2 drives in quadfurcated adapter cards in the 5 PCIe x16 slots
And you can connect another 6 NVMe x4 devices to the MCIO ports.
You might also be able to hook up another 2 to the SFF-8643 connectors.
This gives you a grand total of 28-30 x4 NVME devices on one not particularly exotic motherboard, using most of the 128 regular PCIe lanes available from the CPU socket.
And most laptops have 4 CPU cores these days, and a multiprocess operating system, so you don’t have to wait for random access on a spinning plate to find every bit in order, you can simply have multiple awk commands running in parallel.
Awk is most certainly a better user interface than whatever custom BrandQL you have to use in a textarea in a browser served from localhost:randomport