Hacker News new | past | comments | ask | show | jobs | submit login

Please DO NOT mix string interpolation and command execution, especially when a command is processed through the shell. Whatever your language, use a list-based or array-based execution API that passes arguments straight through to execv(2), execvp(2), etc, bypassing the shell.



Was waiting for this comment :P

The API used handles string interpolation correctly: the string literal is parsed at compile time, and the interpolated arguments are never concatenated or escaped, and end up directly as an element of arv array passed to a child. See

https://github.com/tigerbeetle/tigerbeetle/blob/7053ecd2137a...


This approach creates an odd mini language, which is incomplete:

    comptime assert(std.mem.indexOfScalar(u8, cmd, '\'') == null); // Quoting isn't supported yet.
    comptime assert(std.mem.indexOfScalar(u8, cmd, '"') == null);
But you can do correct interpolation with simple shell variables, rather than generating shell code strings:

    $ today='foo; bar' sh -c 'argv git switch --create "release-$today" origin/main'
    ['git', 'switch', '--create', 'release-foo; bar', 'origin/main']
So that is a test that we can use a plain shell string, without any shell injection bug. (argv is my command to test quoting: python -c 'import sys; print(sys.argv)' "$@" )

Note that there's no escaping function needed, because we're not generating any shell code. We're generating an argv array for `/bin/sh` instead.

---

So by invoking with an env var, you can easily create a correct API that uses plain shell

    git switch --create "release-$today"
rather than

    git switch --create release-{today}  # what language is this?  It's not obvious
If you don't want to use the env var, you can also use

    git switch --create "release-$1"
And invoke with

    ['sh', '-c', shell_string, 'unused-arg0', today_string]
With this approach, you don't need

    1. any kind of shell escaping
    2. any analyzing of pseudo-shell strings, which can't contain quotes
Because you are not generating any shell code. The shell code is constant.


Why would they even change the language and the commands in the example? It confuses and undermines the point. Just say “use `git switch -c my-new-branch` for interactive usage and `git switch --create my-new-branch` in scripts”. It makes no sense to introduce other unexplained information.


Another approach is to have powerful enough language that allows you to guard against the shell injection. I wrote a syntax form allowing to do this:

    (sh "cat " file " >" output)
With file being bound to "foo'bar" and output to "x", it is automatically translated into

    cat 'foo'\''bar' >'x'
This gives you the flexibility to use shell (sometimes it just is the most concise way) while being safe against injection.

I believe for example in rust you should be able to do the same.


How do you know which shell you're escaping for? You could query the system, but now you end up implementing escaping for every shell out there.


Good question. I care only about POSIX compatible shells, so the escaping just follows the POSIX rules. In practice that means it works on any actually used system except windows, which is fine with me.


Miniature, in-line sh scripts are also fine as long as you use the provided parameter substitution.

If you’re averse to this:

  q(“select x where y = ‘“ + v + “‘“)
And instead do this:

  q(“select x where y = %s”, v)
Then you should be averse to this:

  x(“foo --option ‘“ + v + “‘“)
And instead do this:

  x(‘foo --option “$1”’, v)
This is particularly useful when it’s expedient to have one thing piping into another. Like it or not the sh DSL for pipes is excellent compared to doing things natively with execve() and pipe(), just as doing group by and count is far more concise in SQL than doing so natively.

Most SQL libraries give you something like q. Writing your own x is as simple as calling sh correctly. In Python, for example:

  def x(script, *args):
    run([“sh”, “-c”, script, “--“, *args])


Neither of those are equivalent to variable binding, which is what most SQL libraries provide, specifically because they don't actually solve the problem since they're still doing string substitution. Putting a double quotes in $1 in your "good" execute example will allow you break out of what's expected and then you're Bobby Tables.

Your python example at the bottom is correct, in that each separate element is more correct in that it allows each arg to be passed as an element, so there's no option to break out through quoting characters. SQL binds are like that in most ljbraries, even if they don't look like it. The parser knows a single item below there so if it passes it along as such. You cannot escape it in the same way.


I don’t really follow. My “good” example and the code at the bottom are the same.

sh is smarter than just doing string interpolation and ”$1” is passed on as a single argument, no matter what:

  > run(["sh", "-c", 'echo "$1"', "--", 'a"'])
  a”
Whereas if it were simple string interpolation, you’d see this:

  > run(["sh", "-c", 'echo "a""')
  --: 1: Syntax error: Unterminated quoted string
It’s the same special casing that gets "$@" right.


That requires you quote the param in the sintr to ensure that params are groups as expected. E.g.

    # cat pvars.sh
    #!/bin/bash
    echo "\$1=$1"
    echo "\$2=$2"
    echo "\$3=$3"
    # sh -c './pvars.sh "$1" $2 $3' -- 'a b' 'c d'
    $1=a b
    $2=c
    $3=d
The whole point of passing in an array and using something like exec (or system(), if provided as it handled the fork and wait for you) is that you avoid the overhead of the shell starting up at all and parsing the command line, and it lets you define each param exactly as needed since each param is its own array item. You don't need to worry about splitting on space or the shell splitting params on space, or quoting to group items. If you want the param to be:

    foo "bar baz" quux
as one singular parameter, you just make that the contents of that array item, since no parsing need be done at all.

If you have an array of params and you're jumping through hoops to make sure they're interpreted correctly by the shell you call execute a process, you're likely (depending on language and capabilities) wasting both cycles and over complicating the code when you can just call the program you actually want to execute directly and supply the params. Alternatively, if you have all the params as one long string and you want it to be pased as a shell would, then execute the shell and pass that as a param. e.g.

    # perl -E 'system("./pvars.sh","a b","c d");'
    $1=a b
    $2=c d
    $3=
    # perl -E 'system("./pvars.sh","a b","c","d");'
    $1=a b
    $2=c
    $3=d


Thanks for explaining. I feel like we’re talking past each other but it’s my mistake. I should have said it is only useful (not “particularly useful”) if one has compound statements like a pipe or multiple commands. Invoking sh just to run a single command is superfluous and you are right that reaching directly for execve() is better.


Ah, yes. If you want to take advantage of piping commands together through the shell as a subcommand of your program, then a way to make params behave more consistently regardless of content is useful.


    SyntaxError: invalid character '“' (U+201C)


For anything involving file paths, user input, etc. -- yes of course. It's not even a question because they would need to be escaped otherwise which nobody wants to do.

But for a simple example like this where it's inserting a date which has known properties, it seems fine, and is much more readable.


Tbf this input does not need escaping.

But at the very least the shell is unnecessary here.


Why not?


Any time you send commands and data down a single channel, user input that's intended to be data can be misinterpreted as a command. For example, if your program wants to:

    run("program --option '{user_input}' > file")
to save some input to a file, and the user's input is:

    '; bad_command #
then when run() sends that string to the shell, the shell will run:

    program --option '';
    bad_command #' > file
Most languages have something like a safe_exec() that separates the shape of the command from the values of the options, executing "program" with the options and the user_input in the arguments array as data. Skipping the shell step, which would just be building an exec call anyway, removes the opportunity for users to confuse it into doing something else.

The list-based API alternative they recommend might look like this:

    safe_exec(["program", "--option", user_input], stdout="file")
and it would always exec "program" with argv[1] == "--option" && argv[2] == user_input. If the user_input happens to be:

    '; bad_command #
...well, then, the user can enjoy the contents of their file.


Yes of course. But why would you expect me to run shell commands with random person's input? Also:

    safe_exec(["rm", user_input])
This isn't safe either! Despite clearly saying "safe_exec"!


Yeah, "safe_exec" is a useless name without context. But the context was you need to call a program from another program. Many people would call system() or whatever because usually it's obvious and easy, and the pitfalls are less so.

Shelling out is not the only option. People are just saying not to use that option. Better ones won't save you if you purposely do something stupid. They will save you if the user wants to trick you into doing something else.


A nearby comment mentioned escaping. I guess that might be a good reason to use execv?


SQL injection on steroids.


Only if you are getting input from untrusted users


imo it's best to just avoid it altogether. Requirements change, and what was once a trusted input can become untrusted input.


Generally you shouldn't be passing random data from the web to shell scripts. Maybe I haven't done the right type of work but having to deal with fidlg bits it's much more likely not passing it to be shell will cause issues (with stuff like executable paths)


Or if your trusted users are fallible and could be tricked into providing unsafe inputs.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: