Hacker News new | past | comments | ask | show | jobs | submit login

At the scale you are talking about (10Gb+ files), it's far more efficient to put primitive filtering in the application generating the lines in the first place. you pay two penalties for using grep: having another process touch the data and having to generate superfluous lines in the first place.



This doesn't work if you're processing logs. You might need those other lines in other places.


In this case, alas, I'm processing third-party data I didn't generate, so one way or another I have to scan through it at least once.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: