Mixing static and dynamic linking

Due to some limitation of profiling software, it may be useful to link statically some libs into our programs. The following command links statically only lcrypto – while using dynamic linking on all other libs.

# gcc test.c -o test -Wl,-Bstatic -lcrypto -Wl,-Bdynamic

If you forget the ending¬†-Wl,-Bdynamic you’ll get the following error:

 /usr/bin/ld: cannot find -lgcc_s
collect2: ld returned 1 exit status

This because gcc tries to link statically libgcc too – but using the wrong file: libgcc_s.a. To link statically libgcc you have to add the -static-libgcc flag

# gcc test.c -o test -Wl,-Bstatic -lcrypto  -static-libgcc

Annotate C with GCC Function Attributes

A nice feature of GCC – which is not standard – is support for annotate C code and selectively avoid or raise warnings.

Example 1. A function which has a willingly unused variable can avoid a warning at compile time

int my_function(int a, int b, __attribute__((unused)) int c_only_for_debug) {

printf(“print %d\n”, a+b);

#ifdef DEBUG

printf(“on debug print %d too\n”, c_only_for_debug);

#endif

}

More examples soon ūüėČ

Reentrant not safe (aka don’t syslog() on fork())

Today I experience a deadlock using localtime_r() after forking a multi-thread program. I was quite surprised, because – well – using localtime_r() I thought to be in some way safe. But I was not ūüėČ

Shortly I use a logging function which calls localtime_r(): my_log(“message”);

And the following happens even if you use syslog(), because syslog() on most system uses localtime() [yes, the non reentrant one too].

After the child hung, I found that it was in futex_wait state using:
# ps -elf ;
Looking for a mutex I discovered that – as you (don’t) know – localtime_r() calls tzset(). It’s a libc function using a mutex (call it _tz_mutex) while mangling the current timezone.

The “bad” code was basically doing the following:

main() {
... spawn many threads using my_log() ...
if (fork()==0) {
my_log("I am the child");
execv("/bin/bash", ...);
}
}

This is bad, because could happen the following:

  1. one of the parent threads runs my_log(), locking the _tz_mutex (which is global and not thread local) ;
  2. before the _tz_mutex is released, the main thread forks;
  3. fork() preserves the locked mutex, because it’s a global one, but closes the thread that locked it: causing the deadlock;
  4. child runs my_log(), trying to lock _tz_mutex and hanging.

This behavior is described in the rationale of pthread_atfork():

When fork() is called, only the calling thread is duplicated in the child process. Synchronization variables remain in the same state in the child as they were in the parent at the time fork() was called. Thus, for example, mutex locks may be held by threads that no longer exist in the child process, and any associated states may be inconsistent.

Moral:
1. don’t trust functions just because they end with “_r”;
2. run execv() ASAP after fork(), as stated in man pthread_atfork():

It is suggested that programs that use fork() call an exec function very soon afterwards in the child process, thus resetting all states. In the meantime, only a short list of async-signal-safe library routines are promised to be available.

3. between fork() and execv() use only simple functions: printf(), dup(),… you can find a list of async-signal-safe functions in
#man 7 signal

Send that message, quickly! (SystemV)

Thanks to ffiore I discovered a nice way for communicating data between processes and thread: Message Queues. MQs are a Linux feature that implement a kernel-based push-pop stack. Linux has two types of MQs: Posix and SystemV. This post is about SystemV.

Instead of reading/writing data to a standard pipe/socket where you have to manage the split of the messages and other issues, you can simply use MQs with the following pseudocode procedure:

# pusher thread

struct mq_msg msg;

msg.mtext = “my message”

mq = mq_create()

msgsnd(mq, msg, ..)

while the pop-er thread gets messages

# popper thread

msg = msgrcv(mq,..)

print msg.mtext

Linux provides some command line tools to manage the queue, while queue size is managed by sysctl and ulimit.

ananke # sysctl -a | grep msg

kernel.msgmnb = 65536 # max queue size = (msgsize*#msgs)

kernel.msgmni = 16

kernel.msgmax = 65536 # maximum  number  of  messages  in  a  queue.

fs.mqueue.msgsize_max = 8192 # maximum message size.

fs.mqueue.msg_max = 10

If we want to queue 10k messages of 12kbytes each, we should set:

# sysclt -w fs.mqueue.msgsize_max=12000

# sysclt -w kernel.msgmnb=120000000

You can view message queues with the #ipcs command:

ananke# sudo apt-get install util-linux

an MQ follows, containing 12 messages summing up 390 bytes

ananke #  ipcs -q

—— Message Queues ——–

key        msqid      owner      perms      used-bytes   messages

0x00000000 0          rpolli     666        390          13

For further info on System V and Posix MQs

#man svipc mq_overview

TCP to UDP: a top-down focus on the differences – I

While writing a simple UDP server, I had the idea to focus on the differences with a TCP one.

The standard TCP Server flow is the following:

  1. create TCP socket()
  2. bind() it to an ip:port address
  3. listen() for incoming connection
  4. accept() them

The 1. is done via a socket(.., STREAM, TCP) system call. Replacing STREAM, TCP with DATAGRAM, UDP, and going thru point 2. to 4. we’ll get the following errors:

  • listen(): Operation not supported
  • accept(): Operation not supported

What happened? Yep, easy: as UDP is a connection-less protocol, we cannot “listen for” or “accept” connection.

While TCP establishes a fixed connection between client and server, an UDP server just gets every single packet which arrives from a client.

You can check this behavior running netcat in udp mode:

#nc -u localhost 12345

This command is successful even if there’s nothing listening on that port: it’s like a cannon ready to fire everything against a castle’s port.

Each time you send something, it fires. And you need to switch it off to “close” it!

Nested man exception

While updating keycluster to RHEL5 I stumbled upon some strange behavior.

A service didn’t start due a failure to mount a dummy filesystem. Browsing the code I found that the error was raised by a
system(“/bin/mount”) call.

To avoid issues with the shell environment, I re-implemented with fork()+execv(“/bin/mount”, …).
Still errors, this time it was a execv(): Permission Denied.

Again I read # man execv; without finding any particular issues …

First step: reproduce

wraping the execv() in a standalone executables, like the following, gave no issues

main:

execv(“/bin/mount”, …)

Second Step: suid-bit

as the error happened only with /bin/mount which has the suid-bit set, I tried

#chmod u-s /bin/mount

and BINGO: it worked. This lead me to think to…

Third step: selinux

I thought that it could be a selinux issue: tried

  1. checking selinux logs (audit)
  2. tuning selinux (I learned something about context – this was good!)
  3. disabling selinux

no way

The old good times: Recursive RTFM

I was bored, and started to play seriously: I did what I should have done since the first time.
1. man execv() referred to man execve()

All of these functions may fail and set errno for any of the errors specified for the library function execve(2).

2. man execve() has EPERM error

EPERM The file system is mounted nosuid, the user is not the superuser, and the file has an SUID or SGID bit set.

EPERM The process is being traced, the user is not the superuser and the file has an SUID or SGID bit set.

3. and suddenly I remember: the guilty program was running under valgrind (I’m in test environment!). Well, even Valgrind was running as root, but I don’t effectively know what happens into valgrind

The solution: that kind of test shouldn’t use¬†valgrind. I’ll investigate further whether valgrind drops roots privileges while running its children.

Guard your memory with Valgrind

Valgrind is an apt-and-play (or rpm-and-play) tool for finding memory leaks in your sw.

I recommend using it for code-reviewing and before deploying!

Just run it agains # ls -d /tmp

# valgrind  --leak-check=full --track-origins=yes ls -d /tmp

The output is [extract]

==27666== Memcheck, a memory error detector
==27666== Command: ls -d /tmp
/tmp
==27666== HEAP SUMMARY:
==27666==     in use at exit: 12,653 bytes in 8 blocks
==27666==   total heap usage: 1,396 allocs, 1,388 frees, 77,395 bytes allocated
==27666==
==27666== 120 bytes in 1 blocks are definitely lost in loss record 7 of 8
==27666==    at 0x4024C1C: malloc (vg_replace_malloc.c:195)
==27666==    by 0x40CFACD: getdelim (in /lib/tls/i686/cmov/libc-2.10.1.so)
==27666==    by 0x405B73A: ??? (in /lib/libselinux.so.1)
==27666==    by 0x4064BFC: ??? (in /lib/libselinux.so.1)
==27666==    by 0x40530C7: ??? (in /lib/libselinux.so.1)
==27666==    by 0x400D8BB: ??? (in /lib/ld-2.10.1.so)
==27666==    by 0x400DA20: ??? (in /lib/ld-2.10.1.so)
==27666==    by 0x400088E: ??? (in /lib/ld-2.10.1.so)
==27666==
==27666== LEAK SUMMARY:
==27666==    definitely lost: 120 bytes in 1 blocks
==27666==    indirectly lost: 0 bytes in 0 blocks
==27666==      possibly lost: 0 bytes in 0 blocks
==27666==    still reachable: 12,533 bytes in 7 blocks
==27666==         suppressed: 0 bytes in 0 blocks
==27666== Reachable blocks (those to which a pointer was found) are not shown.
==27666== To see them, rerun with: --leak-check=full --show-reachable=yes
==27666==
==27666== For counts of detected and suppressed errors, rerun with: -v
==27666== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 25 from 8)


Funnily we found ONE leak ūüėČ

==27666== LEAK SUMMARY:
==27666==    definitely lost: 120 bytes in 1 blocks
==27666==    indirectly lost: 0 bytes in 0 blocks

ad it’s in

==27666== 120 bytes in 1 blocks are definitely lost in loss record 7 of 8
==27666==    at 0x4024C1C: malloc (vg_replace_malloc.c:195)
==27666==    by 0x40CFACD: getdelim (in /lib/tls/i686/cmov/libc-2.10.1.so)
==27666==    by 0x405B73A: ??? (in /lib/libselinux.so.1)
...
==27666==    by 0x400088E: ??? (in /lib/ld-2.10.1.so)

but this time we shouldn’t care too much: ls is a one-shot command!

Comments welcome!