Ack is a grep-like tool, optimised for programmers

pie · on May 7, 2014

It's also worth checking out The Silver Searcher: https://github.com/ggreer/the_silver_searcher

ihodes · on May 7, 2014

Big problem with ag is that it appears to be broken on even moderately larger files where ack works just fine: https://github.com/ggreer/the_silver_searcher/issues/384

ggreer · on May 7, 2014

pcre_exec()'s length and offset parameters are ints, so there's not much I can do about files over 2GB. I really don't want to split the file into chunks and deal with matches across boundaries. That's just asking for bugs. I guess I could make literal string searches work, at least on 64 bit platforms.

Honestly though, I don't think ag is the right tool for that job. For a single huge file, grep is going to be the same speed. Possibly faster, since grep's strstr() has been optimized for longer than I've been alive.

ggreer · on May 7, 2014

I gave some thought to the right tool for the job of searching DNA.

DNA files don't change very often, which makes building an index worthwhile. Apparently, sequencing isn't perfect and neither are cells, so you'd want fuzzy matching. But repeats in DNA are also common, so that means fuzzy regex matching. There is already a fuzzy regex library[1], but I have no idea how fast it is. If the application requires performance above everything, an n-gram index sounds like the right tool for the job.

After writing the paragraph above, I searched for "DNA n-gram search." The original n-gram paper from 2006 used DNA sequences in their test corpus.[2] I don't know much about DNA or the applications built around it, so I'm glad I managed to recommend a tool that was designed for the job.

1. https://github.com/laurikari/tre/ (used by agrep)

2. Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures http://cedric.cnam.fr/~rigaux/papers/LMRS07.pdf

tveita · on May 7, 2014

If ag knows it can't search the whole file, it should at least give a warning. Or why not use search_stream?

Silently skipping parts of it seems like the worst thing to do.

ggreer · on May 7, 2014

Good point about the warning. I'll add that. With regards to search_stream() in search.c... all I can say is that I'm sorry:

https://github.com/ggreer/the_silver_searcher/blob/master/sr...

I built ag for myself; both as a tool and to improve my skills profiling, benchmarking, and optimizing. Had I known how popular it would become, I would have definitely held myself to a higher standard, or any standard. Most importantly, I'd have written tests. These days, I'm busy with a startup so progress on those fronts has been slow.

masterj · on May 7, 2014

For me ag has been magical. It does exactly what I want 99% of the time and is just blazingly fast.

So.. thanks :)

x0x0 · on May 7, 2014

it's an awesome tool I use dozens of times a day, so thank you

mpercy · on May 8, 2014

ag is incredible, especially paired with Ack.vim and a mapping. I use <leader>as to search for the current word under the cursor. The results are instantaneous. With ag and YouCompleteMe, I never fall back to cscope/ctags in C++ projects anymore.

One thing though, it skips certain source files seemingly arbitrarily without the -t param and I haven't figured out why... Doesn't seem related to any .gitignore entries that I have been able to identify.

ihodes · on May 7, 2014

Good to know, and that makes sense to me. Thank you for adding a warning, as well.

ack turns out to be much faster than grep on these large files, FWIW.

Thanks for making this superb tool :)

i_s · on May 7, 2014

The silver searcher is pretty good. but it has a couple of big problems. It does not parse the .gitignore correctly [0], so it frequently searches files that are not committed to your repo. This, combined with the decision to print 10000 character long lines mean a lot of search results are useless.

[0] https://github.com/ggreer/the_silver_searcher/issues/367 for example

michaelmior · on May 7, 2014

I noticed the issue you mentioned, but as the last comment mentions, I believe this has already been fixed. My specific case at least was resolved by updating from master.

davidgerard · on May 7, 2014

The author notes that git grep is faster than ag when you're lucky enough to be searching a repo.

tveita · on May 7, 2014

I switched to this after Ack 2 removed all options to search through binary files.

ag is less picky, and the increased speed is a nice bonus.

pudquick · on May 7, 2014

Binary file search was my primary motivator as well. I really do love the functionality of the tool.

pmelendez · on May 7, 2014

1000 times this... it is exactly like ack-grep but faster :)

djeikyb · on May 7, 2014

One thing I miss a little is that ack has the super convenient:

    ack --java "foo"

while with ag you write:

    ag -G"\.java$" "foo"

But yes, ack and ag feel pretty identical except for the speed. Most of the time the speed improvement is irrelevant to me, except sometimes now I'll use ag in my home folder, and it's still fairly snappy.

coffeeaddicted · on May 7, 2014

That was too much typing anyway. When you mostly work with one language something like this is nice (in my case c/c++): alias ack-cpp='ack-grep --type=cpp --type=cc'

llimllib · on May 7, 2014

I have that aliased to 'cack'. Then ruby is 'rack', python is 'pack', go is 'gack', etc.

(I've never needed to use the rackup "rack" command directly, fortunately, if you do you ought to use a different alias)

irahul · on May 7, 2014

> I've never needed to use the rackup "rack" command directly, fortunately, if you do you ought to use a different alias

Or escape alias \rack

alxndr · on May 7, 2014

or $(which rack)

alxndr · on May 8, 2014

Hm, I've recently begun using zsh primarily and this trick doesn't work there: zsh lets you know what the alias is... bash will happily find `rack` in your `$PATH` and then run it.

(Presumably because in zsh, `which which` says it's a shell built-in, whereas in bash it finds `/usr/bin/which`, so bash doesn't seem to be caring about your aliases.)

mhei · on May 8, 2014

If you have the EQUALS option set (by default it is), you can use =rack instead: http://zsh.sourceforge.net/Doc/Release/Expansion.html#g_t_00...

alxndr · on May 15, 2014

Lovely, thanks!

beaumartinez · on May 7, 2014

ag has had this for a while now (I'm on version 0.21.0).

djeikyb · on May 7, 2014

nice! i need to pull and recompile!

tsenkov · on May 7, 2014

Judging by this: https://news.ycombinator.com/item?id=7710269 I guess ag now supports that, too.

rickyc091 · on May 7, 2014

Thanks for the ack tip! Anything else super useful come to mind?

petdance · on May 7, 2014

ag is not "exactly like" ack. There are features that ack has that ag has chosen not to replicate. ack is also more portable.

That's not a knock on ag at all, and if ag fits your needs, then by all means use it.

5h · on May 7, 2014

I normally tell people to use ack because it's like grep but faster (owing to it's sensible defaults) ... if I use this I'm worried I might go too fast and travel backwards in time or something.

csgavino1 · on May 7, 2014

Give ag a shot, you'll be able to relive the emotions you felt when you switched to ack from grep, but this time you're switching to ag from ack.

sillysaurus3 · on May 7, 2014

How is it so fast? Files are mmap()ed instead of read into a buffer.

It's hard to believe this would give a significant performance boost. Is there evidence of this?

ggreer · on May 7, 2014

In my benchmarking, mmap() was about 20% faster than read() on OS X, but the same speed on Ubuntu. Pretty much everything else in the list (pthreads, JIT regex compiler, Boyer-Moore-Horspool strstr(), etc) improves performance more than mmap().

Also, mmap() has the disadvantage that it can segfault your process if something else makes the underlying file smaller. In fact, there have been kernel bugs related to separate processes mmapping and truncating the same file.[1] I mostly use mmap() because my primary computer is a mac.

A side note: Parts of OS X's kernel seem... not very optimized to say the least. See the bar graph at http://geoff.greer.fm/2012/09/07/the-silver-searcher-adding-... for an example.

1. http://lwn.net/Articles/357767/

tobinfricke · on May 7, 2014

There is a nice (and often posted) mailing list post explaining some of the reasons GNU Grep is so fast:

http://lists.freebsd.org/pipermail/freebsd-current/2010-Augu...

He mentions: "So even nowadays, using --mmap can be worth a >20% speedup."

sillysaurus3 · on May 7, 2014

Now I'm burning with curiosity. I have to know why! My plan:

- replicate the experiment, confirm --mmap shaves off a non-negligible amount of time. It could be that his computer happened to be running something in the background that was using his harddrive, for example, which would skew the results.

- look at the code, figure out the exact difference between what --mmap is doing and what it does by default. Confirm that the problem isn't in grep itself (it's probably not, but it's important to check).

- dig into the kernel source to figure out the difference under the hood and why it might be faster.

makmanalp · on May 7, 2014

I wonder if it has to do with not having to copy data back and forth between kernel and userspace. My mildly uneducated thought is that you could do this with splice() or whatever, but mmap is an easy drop-in replacement.

edit: I've been reading your posts for a while and I like them, but I keep wondering, why do you have sillysaurus1-2-3?

sillysaurus3 · on May 7, 2014

That's what has me so curious, because it doesn't seem like copying between kernel/userspace should account for a 20% speed drop. Once data is in the L3 CPU cache, it should be inexpensive to move it around.

Regarding my ancestry, I'm sillysaurus3 because I've (rightfully) been in trouble twice with the mods for getting too personal on HN. I apologized and changed my behavior accordingly, and additionally created a new account both times to serve as a constant reminder to be objective and emotionless. There's rarely a reason to argue with a person rather than with an idea. Debating ideas, not people, has a bunch of nice benefits: it's easier to learn from your mistakes, it makes for better reading, etc. It's pretty important, because forgetting that principle leads to exchanges like https://news.ycombinator.com/item?id=7700145

Another nice benefit of creating a new account is that you lose your downvoting privilege for a time, which made me more thoughtful about whether a downvote is actually justified.

kbenson · on May 7, 2014

Possibly the OS is doing interesting things with file access and caching and opting out of that has benefits for this particular workload?

...

I just skimmed the bsd mailing list email on why grep is fast which was linked up-thread, and it seems that's somewhat the case. It sounds like since they are doing advanced search techniques on what matches or can match, they use mmap to avoid requiring the kernel copy every byte into memory, when they know they only need to look at specific ranges of bytes in some instances. At least that was the case at some point in the past.

Finally, when I was last the maintainer of GNU grep (15+ years ago...), GNU grep also tried very hard to set things up so that the _kernel_ could ALSO avoid handling every byte of the input, by using mmap() instead of read() for file input. At the time, using read() caused most Unix versions to do extra copying.

P.S. Nice attitude, it earned an upvote from me. Which is probably one reason why your third account has more karma than my first.

makmanalp · on May 7, 2014

Right, I think the point of boyer-moore is that it allows to eliminate / skip large chunks of the text during the search.

So the assumption is that those pages don't even ever get swapped in, but I think that'd only be the case when the pattern size is at least as large as the page size (usually 4KB!), which is not the case in the example in the mailing list. So the mystery continues!

ori_b · on May 7, 2014

You can do this with large reads:

     read(fd, buf, 100 megs)

can, in the kernel, do something like:

     read(fd, buf[0:first-page-boundary])
     remap(fd, buf[first-page-boundary:last-page-boundary])
     read(fd, buf[last-page-boundary:end])

There you go, zero copy reads. Or at least, minimal copy reads -- at most you will get 2 pages worth of copying.

bcoates · on May 7, 2014

The last time I had to do fast, large sequential disk reads on Linux it was surprisingly complex to get all the buffering/caching/locking to not do the wrong thing and slow me down a lot. I wouldn't be surprised if non-optimized mmap() is a whole lot faster than non-optimized use of high level file i/o libraries.

on May 7, 2014

[deleted]

sillysaurus3 · on May 7, 2014

If anything, that post is evidence of how tricky optimization is, and how easy it is to fool yourself about what matters. It's probably best to be skeptical about mmap() as a performance optimization over reading into a buffer unless evidence demonstrates otherwise. Most OS's do a pretty good job of caching at the filesystem level, and under the hood paging is essentially reading into a buffer anyway. mmap() might make the code simpler, but it's hard to imagine it makes it faster. If it does, I'd like to understand why.

EDIT: The post was http://geoff.greer.fm/2012/08/25/the-silver-searcher-benchma...

duaneb · on May 7, 2014

> It's hard to believe this would give a significant performance boost.

Why is that so hard to believe? It's a standard optimization—the kernel can almost certainly coordinate reading better than your userspace C can.

sillysaurus3 · on May 7, 2014

So are we talking about constant-time optimization, then? I.e. it shaves off a few milliseconds regardless of how complex the search is, or how many files it's reading, or how large each file is. I'll happily concede that mmap() might do that. But a performance boost linear w.r.t. search complexity/number of files/filesize? Hard to believe, and I should go measure it myself to prove the point or learn why I'm mistaken.

Crito · on May 7, 2014

Likely a constant time improvement... for each file being searched.

I don't think that anybody is claiming that mmapping actually changes the algorithmic complexity of the actual search operations.

duaneb · on May 8, 2014

Constant-time improvements are still improvements, especially if they're in an inner loop. Otherwise we would all be using Python and just writing great algorithms.

djeikyb · on May 7, 2014

I use ag[2], which is pretty much the same as ack, but even faster. The other day I was using it to find all instances in all projects of a list of problematic method names[1], in case anyone wants to see a real world use case.

[1]: http://unix.stackexchange.com/questions/108471/no-output-usi...

[2]: https://github.com/ggreer/the_silver_searcher

masklinn · on May 7, 2014

The only annoyance with ag is it does not have ack's quick filters e.g. ack --py versus ag -G '\.py$' (and ack's type flags can include multiple file extensions).

wahnfrieden · on May 7, 2014

It does have them now, update to the latest release :)

revscat · on May 7, 2014

For Java programmers who use Silver Searcher or ack, this lets you search all jars in a directory tree for a given string. Requires GNU Parallel:

    function ffjar() { 
      jars=(./**/*.jar)
      print "Searching ${#jars[*]} jars for '${*}'..."
      parallel --no-notice --tag unzip -l ::: ${jars} | ag ${*} | awk '{print $1, ":", $5}'
    }

Because it uses parallel it spreads the workload across CPUs. I use this frequently when I have to update/rewrite/create build scripts, and I know a class exists but not which jar file it lives in.

garblegarble · on May 7, 2014

That's pretty neat! I use the following (slower, but short enough that it sticks in my memory so I can do it at a colleague's machine):

  find . -name "*.jar" | xargs -tn1 unzip -l | grep SomeClass

burntsushi · on May 7, 2014

`xargs` also has a `-P` flag which will instruct it to spread work over multiple processes. Given that you already have `-n1`, adding a `-P 0` will have it automatically spread out over all your CPUs.

ole_tange · on May 7, 2014

Be aware that xargs does not deal nicely with race conditions: (Parallel grep) http://www.gnu.org/software/parallel/man.html#differences_be...

garblegarble · on May 7, 2014

Awesome, that's good to know!

ole_tange · on May 7, 2014

Consider running 'parallel --bibtex' once.

gegtik · on May 7, 2014

It should be noted that "faster" is due to the fact that it limits itself to searching the subset of files that have code extensions.

I've tested grep against ack and ag for large text files and grep won handily, especially the latest version of grep.

Also note you can use GNU Parallel to run multiple greps

djeikyb · on May 7, 2014

Yeah, I think it's important not to throw grep away. Ack is really for when you don't know or can't be bothered to explicitly mention the (several) specific files to search.

petdance · on May 7, 2014

> I think it's important not to throw grep away

Exactly. There's no reason that you can't have grep AND ack AND ag in your toolbox to choose from.

pekk · on May 7, 2014

Sure, there's no reason, but it still sucks to have 3 tools for essentially the same task due to defects in each.

gegtik · on May 7, 2014

Can you elaborate on these defects

fra · on May 7, 2014

Right, and grep lets you filter files by extensions so there's really no reason to use ack or ag...

alias ack="grep --include=*.c"

616c · on May 7, 2014

For some reason, it took me one or two minutes of rereading to realize it was ack, not awk. I think this website was going to be some ironic trash-talking about grep. Then I saw "written in Perl" and I got so confused my head almost exploded.

Anyway, neat tool. Will check it out soon.

prakashk · on May 7, 2014

> I think this website was going to be some ironic trash-talking about grep.

Andy Lester, the primary author of ack, is one of the nicest guys I know of. You wouldn't see any trash-talking on that site. He even changed the name of the site from "better than grep" to "beyond grep" [1].

In fact, he gives props to similar tools like ag and others [2].

[1] https://news.ycombinator.com/item?id=5578304 [2] http://beyondgrep.com/more-tools/

jaredmcateer · on May 7, 2014

I'm assuming the title got rewritten? It is probably important to note:

"ack versions 2.00 to 2.11_02 are susceptible to a code execution exploit. Please upgrade to 2.12 or higher ASAP."

tyilo · on May 7, 2014

The 2.12 version was posted December 3 2013, so no.

cake · on May 7, 2014

I couldn't live without this tool today, very useful to quickly find where that method is used for example.

Kiro · on May 7, 2014

Is it better than just searching the project in your IDE?

rbonvall · on May 7, 2014

In my case, my IDE is the command line. ack is one of its plugins. The builtin plugins are also ok (find, ls, mv, etc.). I can create my own plugins for my IDE, and there are even package managers to install new plugins (yum, apt-get).

joel_perl_prog · on May 7, 2014

I've often wondered about that. For me, the development environment is extremely minimalistic, by some types of standards. Linux itself, including tools like ack + vim. I use various vim tricks, not up to the point of it being my de-facto OS (as is possible to do!).

From what I can observe, I am generally faster than my co-workers. But it could be possible that with a great IDE, I could be faster yet. I don't feel any tug to leave, but could just mean I'm ignorant of that truly better way.

coffeeaddicted · on May 7, 2014

It's better if you don't have the IDE for that project open already. Or if you search for a project which doesn't come with project files for your IDE of choice. Or if you want to pipe the results. I work a lot with IDE's but still use ack-grep regularly.

vidarh · on May 7, 2014

Don't know about "cake", but my "IDE" is a command line.

efuquen · on May 7, 2014

It's a liberating feeling to not be a slave to your IDE. It will allow you to explore more things outside some IDE walled garden, live a little ;-)

base698 · on May 7, 2014

I seem to find things faster than my coworkers. The ability to quickly filter out non relevant files and do nested searches of the searches is the strong point. Unix as an IDE and all.

sillysaurus3 · on May 7, 2014

Examples, please? If you have the time.

base698 · on May 7, 2014

These are just a few examples I do pretty frequently:

Nested Search:

  ag functionName | ag moreSpecificContextLikeArgs

Find variable changed yesterday:

  git log -p --since yesterday | ag varName

Find controllers changed yesterday:

  git log --oneline --showfile | ag controllers

What files did I work on last week:

  git log --name-only --oneline --author me --since 1.weeks

How many JS commits did I do last month?

  git log --since 1.months --author me  --name-only | ag -i '\.js$' | wc -l

How many JS commits did I do on each file last month?

   git log --since 1.months --author me  --name-only | ag -i '\.js$' | awk '{arr[$1]++} END {for(i in arr) print arr[i]," - ",i}' | sort -r -n

Change a "classname" from MyClass to BetterName:

  ag MyClass # verify it only finds what you think it will
  ag MyClass | awk -F':' '{print $1}' | sort | uniq | while read line 
  do
    sed -i' ' 's/MyClass/BetterName/g' $line
  done

a_e_k · on May 7, 2014

Nested searches are handy. They can also be used to find cases of one thing within a few lines of proximity to another:

    ack -C5 firstThing | less
    /secondThing

One bonus ack trick that I like also like is bulk loading into Emacs for further manipulation (e.g., multi-occur and then occur-edit-mode):

    emacsclient -n `ack -l functionName`

senthilnayagam · on May 7, 2014

https://github.com/monochromegane/the_platinum_searcher I use pt for stuff which is not in git, written in go and it is fast

Flenser · on May 8, 2014

Pt also has first party builds for Windows and Mac

pretz · on May 7, 2014

It's worth noting that if you're always searching in a git repo, git-grep is pretty comparable to ack and you already have it installed.

badman_ting · on May 7, 2014

Totally, I use it all the time now. Here's a pretty great writeup on git grep that was on HN recently: http://travisjeffery.com/b/2012/02/search-a-git-repo-like-a-...

antisocial · on May 7, 2014

My ex-colleague introduced this to me and I thank him every time I use ack. It is really so much better than grep. I have set up a bunch of aliases to search by file type and it makes me so productive.

foz · on May 7, 2014

I used Ack a lot when I was coding Perl (it's been a while). After I switched to Ruby, I used rak [1], which seemed easier to use most of the time, and nearly identical.

However, when you just want to find stuff fast, it's annoying to have to deal with Perl/CPAN or RVM/Rubygems, especially when the dependencies are not installed on your server/workstation.

That's why I've switched to silver searcher (ag) [2], as it can be installed with any OS package manager (brew, apt, yum).

[1] http://rak.rubyforge.org [2] https://github.com/ggreer/the_silver_searcher

ediblenergy · on May 7, 2014

ag is not available as a package on debian stable. Ack is though; as ack-grep. So if you don't want to mess with CPAN that's fine. The non-CPAN instructions are right on the website.

gejjaxxita · on May 7, 2014

The problem with such tools is often their lack of ubiquity. I don't want to start using ack, forget a lot of my grep knowledge, only to ssh into a server and need grep.

The benefit of grep's ubiquity outweighs any small advantage ack has in usability.

davorb · on May 7, 2014

ack is a tiny perl script that you can simply wget and add to your path. I hear what you are saying, and I think that it applies to a lot of utilities, but not ack. Imho ack is so much better than grep that it is worth the hassle of having to install it every now and then.

mdavidn · on May 7, 2014

Try git-grep. It ignores the .git directory, and it's as ubiquitous as Git.

andrelaszlo · on May 7, 2014

Old thread here: https://news.ycombinator.com/item?id=6075083

eliben · on May 7, 2014

Similar tool in pure Python: https://github.com/eliben/pss

abind · on May 7, 2014

Thank you for this! `pip install pss` is one of the first things I do on a new computer :)

petdance · on May 7, 2014

There's a list of other tools for search source code besides ack at http://beyondgrep.com/more-tools/, including other grepalikes and indexing tools like ctags and cscope.

I suggest that you need not limit yourself to only one tool for your code searching. Toolboxes FTW.

sitaramc · on May 9, 2014

Grep is just as good, and with the recent order of magnitude speed improvement on non-C locales that they made -- see https://lwn.net/Articles/586899/ -- which may not have made its way into distros yet, it's easily the best option.

I have a simple wrapper over egrep (see https://github.com/sitaramc/ew ) that adds those little extras (ignoring binary files, ignoring VCS directories...).

I'm sure it's improved since the days I tried it, but I tend to be permanently prejudiced against tools where the author can't/won't document the file selection logic and says "there's really no English that explains how it works" when someone asks.

TorKlingberg · on May 7, 2014

Ack is great, but watch out if you have any source files with unusual file name extensions. Ack will only search file types it knows about. Also if you have your whole source tree in your editor or IDE, then you may as well search there instead.

npongratz · on May 7, 2014

Addressed in ack's FAQ [0], and in its own section of the manual [1].

The manual explains: "This is done with command line options that are best put into an .ackrc file - then you do not have to define your types over and over again." Then comprehensively describes options for both command line and .ackrc.

[0] http://beyondgrep.com/documentation/ack-2.12-man.html#faq

[1] http://beyondgrep.com/documentation/ack-2.12-man.html#defini...

TorKlingberg · on May 7, 2014

Yes, I should have added "by default".

mmcclimon · on May 7, 2014

That's true by default, but it's trivial to add your own filetypes with a .ackrc file.

ritchiea · on May 7, 2014

You can also include the -a flag and search all files

hoelzro · on May 7, 2014

That flag has been removed as of Ack 2.0.

yohanatan · on May 7, 2014

Might want to mention that it is also the default (hence the removal).

raverbashing · on May 7, 2014

Yeah, I think grep sucks as well, that's why I created my own little ack thing

(since it's in a sorry state I won't post it here, and it will attract the rage of people for not being compatible with grep/in python/no docs/etc)

bch · on May 7, 2014

How does this compare to cscope[0] ?

edit: or ctags[1] ?

---

[0] http://en.wikipedia.org/wiki/Cscope

[1] http://en.wikipedia.org/wiki/Ctags

astine · on May 7, 2014

cscope and ctags are language syntax dependent searching tools for c-like programming languages. They let you search specifically for all instances of a function names 'foo' for example. Ack is instead just a normal pattern matcher like grep except that it has some cleverness by which it knows not to search certain file types and directories. It will return all lines which match a string rather than just variable names or functions.

bch · on May 7, 2014

cscope lets you search for arbitrary text strings and egrep patterns as well.

  "The fuzzy parser supports C, but is flexible enough to be useful for C++ and Java, and for use as a generalized 'grep database' (use it to browse large text documents!)"[0]

Exuberant Ctags supports 41 languages[1], incl. javascript, Tcl, Ruby, TeX, awk, etc.

I was wondering if there are obvious ack killer features or areas where it's remarkably superior.

[0] http://cscope.sourceforge.net/

[1] http://ctags.sourceforge.net/languages.html

petdance · on May 7, 2014

They solve different problems. Ctags and cscope index a corpus of source code, usually tied into another tool, like Ctrl-] in vim. ack searches the files every time.

VeejayRampay · on May 7, 2014

I recently found out about git-grep. It's good and quite fast.

herokusaki · on May 7, 2014

Is there a tool that is to find what ack is to grep?

e12e · on May 7, 2014

It depends what you mean... as others have mentioned[1], neither ack or ag are particularly fast compared to grep, they just give you a lot of specialized context (search the right files). As such, what would be to find as ack is to grep? A find that automatically filters out files that are not source code files?

[1] Things might have changed since the last time I personally tried this, at the time grep was significantly faster, especially for fixed string searches -- but then again, I never tried to coerce up a command line that gave the same kind of output that ack/ag does (which could probably be hammerd out with help of awk). So don't take my comment to suggest that these tools aren't valuable, just maybe not for the reason some people (notably not the authors of said tools) claim.

herokusaki · on May 7, 2014

> find that automatically filters out files that are not source code

Not just that but an extensible set of file type filters that are simple to invoke is what I had in mind. E.g., the tool would let you perform searches like

  find++ --Python  projects/archive/200?

or

  find++ --video trailer

where in the latter case the hypothetical find++ would refer to my config to get a list of video file extensions and then print a list of all files in the current directory and its subdirectories with the word "trailer" in their name. For better effect it would ship with useful filters like "--video" by default.

e12e · on May 7, 2014

Right. It's not entirely straight forward to link up the mime database (via eg: file) and generating filters for use by find. Basing filters off of filenames isn't a very good idea -- and actually a little regressive in my opinion -- after all project/bin/foo (executable) might be a python or perl or whatever script -- not just a binary file.

But first getting all files via find, then testing with file, and finally matching against mime-type doesn't sound like something that's going to be as fast as possible...

I tried to see if maybe gvfs (gio - gnome io) could help, but couldn't really find anything directly applicable (although there is a set of gvfs command line tools, like gvfs-ls, gvfs-info, gvfs-mime).

petdance · on May 8, 2014

> after all project/bin/foo (executable) might be a python or perl or whatever script -- not just a binary file.

That's one of the big features of ack that the find/grep combo can't replicate is checking the shebang of the file to detect type. In ack's case, Perl and shell programs are detected both by extension:

  --type-add=perl:ext:pl,pm,pod,t,psgi
  --type-add=shell:ext:sh,bash,csh,tcsh,ksh,zsh,fish

And by checking the shebang:

  --type-add=perl:firstlinematch:/^#!.*\bperl/
  --type-add=shell:firstlinematch:/^#!.*\b(?:ba|t?c|k|z|fi)?sh\b/

Run `ack --dump` to see a list of all the definitions.

e12e · on May 8, 2014

I'd prefer checking the magic numbers in general (or resource forks) -- and list based on mime-types -- rather than just shebang/extension. I'm sure there's frameworks ready for doing this -- both gnome and kde (among others) have been working on this for a while. You need it do be able to display (correct) file icons, for example. And once one goes down that route, it might be beneficial to leverage one of the frameworks for file-search (from locate db to something based on xapian or what-not) -- rather than find-style traversal.

herokusaki · on May 7, 2014

You may be right about the extensions.

Thanks for suggesting gvfs. I'll investigate it and similar databases from other packages (I know at least KDE has its own).

e12e · on May 10, 2014

I suppose this might be too late, but it might be worth having a look a tracker[1], and tracker-search[2]. Alternatives include recoll and Beagle (now defunct?).

[1] https://wiki.gnome.org/Projects/Tracker

[2] http://www.mankier.com/1/tracker-search

hoelzro · on May 7, 2014

I usually use ack -g, which lists files that match the regex, rather than lines of files that match it.

dnr · on May 8, 2014

This may not be quite what you're looking for, but here's mine:

http://dnr.im/tech/articles/ff/

aidenn0 · on May 7, 2014

I have a fairly simple alias that does a find, but excludes directories like .svn .git etc. and a separate one that excludes common binary extensions as well (.o .fas .fasl etc.)

0x0 · on May 7, 2014

I used this for a while, but got bitten by the fact that, by default, it does not search all file types. :-/

omegote · on May 7, 2014

Been using ack-grep for years. In systems where it's not installed I just use grep -Hnir

olalonde · on May 7, 2014

Is there a code repository somewhere? Can't seem to find one...

mmcclimon · on May 7, 2014

https://github.com/petdance/ack2

I'll send Andy a PR about putting the Github link on the site somewhere.

petdance · on May 7, 2014

https://github.com/petdance/beyondgrep is the repo for the site.

davexunit · on May 7, 2014

I'll stick with grep, thanks. M-x rgrep in Emacs works great.

michaelsbradley · on May 7, 2014

'M-x ag', is just a package install away:

https://github.com/Wilfred/ag.el

NicoJuicy · on May 7, 2014

This has found it's way to HN a long long time ago:

https://news.ycombinator.com/item?id=975511

username42 · on May 7, 2014

Why did they choose the same name as minix compiler ack (http://tack.sourceforge.net/) ?

anotherevan · on May 8, 2014

And just to warp things even further, use ack to find the files you're after, then run through grep:

    ack -f --css | parallel grep search_term

fdsary · on May 7, 2014

I thought we were done with this already? https://github.com/ggreer/the_silver_searcher

Antwan · on May 7, 2014

Seen http://beyondgrep.com/security/

Didn't read further.