Introduction

In the past two weeks I saw a few cases where programmers were telling their programs to exit(), but without fully understanding what their exit codes meant and how they were represented. In this post I’ll explain what I believe to be the right way of doing it.

The first time I started to think about this was when I saw a script similar to the one below. But what really motivated my was this reddit thread. If a practice I believe is bad starts to appear on best practices guides, then it’s time to express my opinion, right?

So, let’s consider foobar.sh, which won’t work in bash >= 4.0:

#!/bin/bash
foo() {
    # Whatever. An error occurred and I'm returning -1
    return -1
}
bar() {
    foo
    if [ "$?" = "-1" ]; then
        # -1 is my "error code" and I'm returning it
        exit -1
    fi
}
bar # Calls the "bar" function

Diving in

So, now, suppose I called foobar.sh in another script and sticked to my convention of using -1 as error code:

# ... whatever ...
foobar.sh
if [ "$?" = "-1" ]; then
    # Oops, I'll never get here
fi
# ... whatever ...

As mentioned in the snippet above, it’ll never get to the error-handling code. But why? Well, because in POSIX systems, the exit status of a program can only be in the range 0-255. So “-1” is not acceptable. But how do we know bash is calling exit()? By looking at its source code, bash’s “exit” builtin, as well as bash’s reader_loop(), eventually calls exit_shell(), which calls sh_exit(), which, then, calls exit(). That’s how.

So now we can be pretty sure that “exit -1” in bash will behave like calling “exit(-1)” in C, like the code below.

include <stdlib.h>

int main(int argc, const char *argv[])
{
    exit(-1);
    return 0;
}

So, what happens when you call exit(-1), anyway? If you compile the above program and run it, you’ll get “255” as exit status, not -1, QED.

$ gcc foo.c -o foo
$ ./foo
$ echo $?
255

Oh, and by the way, in Linux and other POSIX systems, a successful program execution gives 0 as exit status and any non-zero status is considered “non-success”. But if you’re writing a program in C(++), it is better to use the standard macros EXIT_SUCCESS and EXIT_FAILURE.

We need to go deeper

But since we are here, how do we know that:

  1. Bash’s $? value is defined from a child process exit status?
  2. What about calling exit()?

Bash’s $?

This one is interesting because of the decoding steps performed by the shell, but the finish is not really exciting. We know that the bash interpreter is running a REPL.

So, when we type something like echo $?, the reader_loop() will eventually try to execute_command(). This function will call execute_command_internal(), which actually does the hard work, like decoding the command that has just been entered and calling the appropriate “handler”, e.g. if a while loop was entered, call the function that actually executes the while.

The decoding process will go on and eventually a simple command will be found, so execute_simple_command(), the “meaty part of all executions” will be called. Assuming execute_simple_command() is decoding $?, then it will call expand_words(), which will call expand_word_list_internal(), the one that does all the substitutions, like brace expansion, tilde expansion, etc.

Hmmm, actually I lied in my last statement… shell_expand_word_list() is the function that would expand our variable, but since $? is an internal one, it will defer the expansion to expand_word_internal(), which will call param_expand() to actually expand it.

Phew! Why did I say the last part wouldn’t be exciting? Take a look at the code below.

static WORD_DESC *
param_expand (string, sindex, quoted, expanded_something,
              contains_dollar_at, quoted_dollar_at_p, had_quoted_null_p,
              pflags)
  /* ... */
  /* $? -- return value of the last synchronous command. */
  case '?':
    temp = itos (last_command_exit_value);
    break;
  /* ... */
  if (ret == 0)
  {
    ret = alloc_word_desc ();
    ret->flags = tflag;
    ret->word = temp;
  }
  return ret;
}

Do you see the “case ‘?’”? That’s bash fetching the value of “$?”. param_expand() only uses the previously defined exit value, which, in the the builtin exit’s case, this will be done in exit_or_logout().

static int exit_or_logout (WORD_LIST *list)
{
  /* ... */
  last_command_exit_value = exit_value;

  /* Exit the program. */
  jump_to_top_level (EXITPROG);
  /*NOTREACHED*/
}

Just to be sure, let’s see what really happens with the help of a gdb session:

➜  ~/tmp/bash git:(master) ✗
± gdb ./bash
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
Reading symbols from /home/trovao/tmp/bash/bash...done.
(gdb) break subst.c:6854
Breakpoint 1 at 0x45ab0f: file subst.c, line 6854.
(gdb) r
Starting program: /home/trovao/tmp/bash/bash
[[email protected] bash]$ echo $?

Breakpoint 1, param_expand (string=0x710608 "$?", sindex=0x7fffffffdc0c, quoted=0,
    expanded_something=0x7fffffffdcd4, contains_dollar_at=0x7fffffffdc00,
    quoted_dollar_at_p=0x7fffffffdc08, had_quoted_null_p=0x7fffffffdc04, pflags=0)
    at subst.c:6854
6854   temp = itos (last_command_exit_value);
(gdb) bt
 0  param_expand (string=0x710608 "$?", sindex=0x7fffffffdc0c, quoted=0,
    expanded_something=0x7fffffffdcd4, contains_dollar_at=0x7fffffffdc00,
    quoted_dollar_at_p=0x7fffffffdc08, had_quoted_null_p=0x7fffffffdc04, pflags=0)
    at subst.c:6854
 1  0x000000000045bde0 in expand_word_internal (word=0x7466a8, quoted=0, isexp=0,
    contains_dollar_at=0x7fffffffdcd0, expanded_something=0x7fffffffdcd4)
    at subst.c:7461
 2  0x000000000045dd56 in shell_expand_word_list (tlist=0x746188, eflags=31)
    at subst.c:8541
 3  0x000000000045e010 in expand_word_list_internal (list=0x746668, eflags=31)
    at subst.c:8658
 4  0x000000000045d6ae in expand_words (list=0x746668) at subst.c:8287
 5  0x0000000000439689 in execute_simple_command (simple_command=0x748ec8, pipe_in=-1,
    pipe_out=-1, async=0, fds_to_close=0x748e68) at execute_cmd.c:3552
 6  0x000000000043460b in execute_command_internal (command=0x748e88, asynchronous=0,
    pipe_in=-1, pipe_out=-1, fds_to_close=0x748e68) at execute_cmd.c:720
 7  0x0000000000433dfc in execute_command (command=0x748e88) at execute_cmd.c:369
 8  0x00000000004219f8 in reader_loop () at eval.c:152
 9  0x000000000041f5f8 in main (argc=1, argv=0x7fffffffe078, env=0x7fffffffe088)
    at shell.c:741
(gdb) print last_command_exit_value
$1 = 0
(gdb) c
Continuing.
0
[[email protected] bash]$ exit
[Inferior 1 (process 22677) exited normally]

So “exit” and “$?” are really related, which is quite reasonable, if you think about it.

What about exit()?

So, what about the exit() function found in the standard C library? It’s manpage tells us that “the exit() function causes normal process termination and the value of status & 0377 is returned to the parent”. “status & 0377“? What does that mean? Well, 0377 is 377 in octal, which is 255 in decimal, or 0xFF in hexadecimal.

#include <stdlib.h>
void exit(int status);

So, if exit takes a signed argument, but returns a 0xFF masked one to its parent, and “-1” is represented as all ones in two’s-complement, this only means that the value the parent is going to see (and, thus, $? is going to store) is (supposing an int is stored using 8 bytes):

0xffffffff & 0xff = 0xff

Recall that 0xff is 255, and that’s pretty much it. The OS won’t see that -1, so there’s no point in using it.

Conclusion

Even if you are aware that all you’ll get is 255 when you call exit(-1), please don’t do that. Your code might eventually reach someone that will misunderstand the meaning of the -1 argument and will only cause pain.

Related content