Hacker News new | past | comments | ask | show | jobs | submit login

Not the op, but I'd guess it's the full text search index.



For 100k entries you can grep them instantaneously; there's no need to maintain an index.


Grep only works if you want an exact string match. If you want to find words out of order or support features like stemming, fts is necessary.


Maybe I have some sort of disease, but while reading "find words out of order or support features like stemming" the regexs for that immediately flashed before my eyes, so I think "necessary" is a little strong there.


FTS is not the same as regex.


I don't think I said it was. I was addressing the specific use cases mentioned. If there's another use case you think is important in searching command line history, feel free to describe it.


> feel free to describe it

Didn't they already? eg stemming


Most stemming use cases are trivially solved with a regex. That's the point he was making. The difference between a beginner and expert with regexes is quite a lot.


Ahhh, interesting point.

"We could learn advanced regexes... or we could just use FTS5".

Hard call. :)


Maybe! Full-text search is great for text. Command lines have some things in common with text, but they definitely aren't normal text. E.g., punctuation is much more significant. Stemming may not be appropriate. Case matters. Word boundaries are different, and many of the significant lumps aren't really words.


Well, I suppose what's trivial for me might be advanced for you :)


For regexes, definitely. ;)


With a small enough corpus, full text search does not require an index to be instantaneous, and 100k entries is easily small enough for that.

Additionally, everything you describe can be phrased as a regular expression.


Sometimes it's nice to not manually write a regexp to find all of the variants of every word or deal with arbitrary ordering of substrings. And if you're using SQLite and fts5 is installed, why not just create a virtual full text search table with one command and use that? With a small enough corpus, it's a meaningless distinction to bikeshed about the implementation: the easiest solution to build is the best. 500MB of disk space for a pet project that gives you convenience is a terrifically small amount of storage. I have videos that I recorded on my phone that take up more than double that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: