Command line too long

When a collegue asked me how to get the full cmdline of a process, I simply replied:

#ps -fewww # as many w as lines you need

When he told me it was not enough I was confident to have the right answer:

# cat /proc/$PID/cmdline

I didn’t know  that /proc/cmdline is limited, and the limit is lower than argv length. Specifically it’s the output of

# getconf PAGE_SIZE

4096

Enjoy with this script

# cat > /tmp/fulla.sh << EOF

#!/bin/bash

sleep 10

EOF

# chmod +x /tmp/fulla.sh; /tmp/fulla.sh {0..2000} &

# cat /proc/${!}/cmdline | wc -c

4096

PAGE_SIZE is the kernel defined page size, and to grow it you have to rebuild your kernel… but you may find some drawbacks 😉

To know something more about argv length – as it depends on POSIX and linux kernel version – you can see:

# man 2 execve

An accepted value is usually 32*PAGE_SIZE = 128k, but some parameters like

#ulimit -s  # stack size

# getconf ARG_MAX

may influence the allowed size.

If you’re interested you can look at the exec() source code 😉

Reentrant not safe (aka don’t syslog() on fork())

Today I experience a deadlock using localtime_r() after forking a multi-thread program. I was quite surprised, because – well – using localtime_r() I thought to be in some way safe. But I was not 😉

Shortly I use a logging function which calls localtime_r(): my_log(“message”);

And the following happens even if you use syslog(), because syslog() on most system uses localtime() [yes, the non reentrant one too].

After the child hung, I found that it was in futex_wait state using:
# ps -elf ;
Looking for a mutex I discovered that – as you (don’t) know – localtime_r() calls tzset(). It’s a libc function using a mutex (call it _tz_mutex) while mangling the current timezone.

The “bad” code was basically doing the following:

main() {
... spawn many threads using my_log() ...
if (fork()==0) {
my_log("I am the child");
execv("/bin/bash", ...);
}
}

This is bad, because could happen the following:

  1. one of the parent threads runs my_log(), locking the _tz_mutex (which is global and not thread local) ;
  2. before the _tz_mutex is released, the main thread forks;
  3. fork() preserves the locked mutex, because it’s a global one, but closes the thread that locked it: causing the deadlock;
  4. child runs my_log(), trying to lock _tz_mutex and hanging.

This behavior is described in the rationale of pthread_atfork():

When fork() is called, only the calling thread is duplicated in the child process. Synchronization variables remain in the same state in the child as they were in the parent at the time fork() was called. Thus, for example, mutex locks may be held by threads that no longer exist in the child process, and any associated states may be inconsistent.

Moral:
1. don’t trust functions just because they end with “_r”;
2. run execv() ASAP after fork(), as stated in man pthread_atfork():

It is suggested that programs that use fork() call an exec function very soon afterwards in the child process, thus resetting all states. In the meantime, only a short list of async-signal-safe library routines are promised to be available.

3. between fork() and execv() use only simple functions: printf(), dup(),… you can find a list of async-signal-safe functions in
#man 7 signal