While updating keycluster to RHEL5 I stumbled upon some strange behavior.
A service didn’t start due a failure to mount a dummy filesystem. Browsing the code I found that the error was raised by a
system(“/bin/mount”) call.
To avoid issues with the shell environment, I re-implemented with fork()+execv(“/bin/mount”, …).
Still errors, this time it was a execv(): Permission Denied.
Again I read # man execv; without finding any particular issues …
First step: reproduce
wraping the execv() in a standalone executables, like the following, gave no issues
main:
execv(“/bin/mount”, …)
Second Step: suid-bit
as the error happened only with /bin/mount which has the suid-bit set, I tried
#chmod u-s /bin/mount
and BINGO: it worked. This lead me to think to…
Third step: selinux
I thought that it could be a selinux issue: tried
- checking selinux logs (audit)
- tuning selinux (I learned something about context – this was good!)
- disabling selinux
no way
The old good times: Recursive RTFM
I was bored, and started to play seriously: I did what I should have done since the first time.
1. man execv() referred to man execve()
All of these functions may fail and set errno for any of the errors specified for the library function execve(2).
2. man execve() has EPERM error
EPERM The file system is mounted nosuid, the user is not the superuser, and the file has an SUID or SGID bit set.
EPERM The process is being traced, the user is not the superuser and the file has an SUID or SGID bit set.
3. and suddenly I remember: the guilty program was running under valgrind (I’m in test environment!). Well, even Valgrind was running as root, but I don’t effectively know what happens into valgrind
Pingback: preche? « Va una spada!
il problema che hai riscontrato su valgrind è dettato da una scelta degli sviluppatori e,
l’averlo scambiato per un problema del sistema è comprensibile, in quanto, l’EACCES
tornato dalla execve(), a quello faceva pensare; ma un paio di grep nei sorci di valgrind hanno svelato la natura dell’inghippo, in particolare le funzioni VG_(pre_exec_check) in
coregrind/m_ume.c e VG_(check_executable) in coregrind/m_libcfile.c
Quest’ultima in particolare ha un commento, che spiega bene i motivi di questa scelta:
/*
Emulate the normal Unix permissions checking algorithm.
If owner matches, then use the owner permissions, else
if group matches, then use the group permissions, else
use other permissions.
Note that we can’t deal properly with SUID/SGID. By default
(allow_setuid == False), we refuse to run them (otherwise the
executable may misbehave if it doesn’t have the permissions it
thinks it does). However, the caller may indicate that setuid
executables are allowed, for example if we are going to exec them
but not trace into them (iow, client sys_execve when
clo_trace_children == False).
If VKI_EACCES is returned (iow, permission was refused), then
*is_setuid is set to True iff permission was refused because the
executable is setuid.
*/
La patch per abilitare il tracciamento dei binari setuid è banale,
basta aggiungere un ‘allow_setuid = True;’ all’inizio della
VG_(pre_exec_check) (in coregrind/m_ume.c), si ricompila e…
mini:/tmp/valgrind-3.3.1# ls -al /bin/mount
-rwsr-xr-x 1 root root 64112 29 apr 2008 /bin/mount
mini:/tmp/valgrind-3.3.1# valgrind /bin/mount
==5870== Memcheck, a memory error detector.
==5870== Copyright (C) 2002-2007, and GNU GPL’d, by Julian Seward et al.
==5870== Using LibVEX rev 1854, a library for dynamic binary translation.
==5870== Copyright (C) 2004-2007, and GNU GPL’d, by OpenWorks LLP.
==5870== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
==5870== Copyright (C) 2000-2007, and GNU GPL’d, by Julian Seward et al.
==5870== For more details, rerun with: -v
==5870==
/dev/md0 on / type ext3 (rw,noatime,nodiratime,user_xattr,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
/dev/mapper/md1_crypt on /home type ext3 (rw,noatime,nodiratime,user_xattr)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
==5870==
==5870== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 21 from 1)
==5870== malloc/free: in use at exit: 2,106 bytes in 66 blocks.
==5870== malloc/free: 145 allocs, 79 frees, 8,305 bytes allocated.
==5870== For counts of detected errors, rerun with: -v
==5870== searching for pointers to 66 not-freed blocks.
==5870== checked 92,860 bytes.
==5870==
==5870== LEAK SUMMARY:
==5870== definitely lost: 1,256 bytes in 11 blocks.
==5870== possibly lost: 0 bytes in 0 blocks.
==5870== still reachable: 850 bytes in 55 blocks.
==5870== suppressed: 0 bytes in 0 blocks.
==5870== Rerun with –leak-check=full to see details of leaked memory.
mini:/tmp/valgrind-3.3.1#
ora funziona anche con i binari setuid.
Tornando al fraintendimento relativo all”EACCES (non EPERM) tornato dalla execve;
il fatto che a tornare errore sia una funzione delle libc (che chiama l’omonima syscall),
si spiega analizzando il funzionamento di valgrind; quest’ultimo funziona utilizzando
degli hook alle funzioni standard della libc e caricandoli in preload, grazie alla variabile
d’ambiente LD_PRELOAD (in realtà fa anche uso della ptrace() ma questa è un’altra
storia :).
Quindi, effettivamente l’execve che torna -1 e setta errno a EACCES non è quella delle
libc, ma l’hook caricato in preload da valgrind.