One of my servers at home runs an OpenSolaris distribution, called Nexenta. It's an alpha build of this OS; you may call it 0.99 instead of 1.0, since it was done a few months before the first "official" release of it. Various motives attracted me to this particular distribution, as of april 2007: ZFS, Solaris Zones (with Linux; I have one Linux zone on it always running, just for fun), the Ubuntu-like interface (for the occasional waste of time using Firefox on a server ;) ), ipf, and compatibility with this puny Dell hardware. It's been running since the middle of the last year, and so far so good.
Until this morning. I wanted to rm a file, and it couldn't. Yeah, that's right. I couldn't remove files. At all. So I truss-ed it.
execve("/usr/bin/rm", 0x08047B2C, 0x08047B38) argc = 2
resolvepath("/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
resolvepath("/usr/bin/rm", "/usr/bin/rm", 1023) = 19
sysconfig(_CONFIG_PAGESIZE) = 4096
xstat(2, "/usr/bin/rm", 0x080478E8) = 0
open("/var/ld/ld.config", O_RDONLY) = 3
fxstat(2, 3, 0x08047818) = 0
mmap(0x00000000, 96, PROT_READ, MAP_SHARED, 3, 0) = 0xFEFB0000
close(3) = 0
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFA0000
xstat(2, "/lib/libc.so.1", 0x080470A8) = 0
resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/lib/libc.so.1", O_RDONLY) = 3
mmap(0x00010000, 4096, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEF90000
mmap(0x00010000, 1044480, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE80000
mmap(0xFEE80000, 939729, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE80000
mmap(0xFEF76000, 27414, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 942080) = 0xFEF76000
mmap(0xFEF7D000, 5704, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF7D000
munmap(0xFEF66000, 65536) = 0
memcntl(0xFEE80000, 203020, MC_ADVISE,
MADV_WILLNEED, 0, 0) = 0
close(3) = 0
munmap(0xFEF90000, 4096) = 0
mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF90000
getcontext(0x080476A0)
getrlimit(RLIMIT_STACK, 0x08047698) = 0
getpid() = 1704 [1703]
lwp_private(0, 1, 0xFEF92000) = 0x000001C3
setustack(0xFEF92060)
sysi86(SI86FPSTART, 0xFEF7DC78, 0x0000133F, 0x00001F80) = 0x00000001
fstat64(0, 0x080475E0) = 0
fstat64(1, 0x080475E0) = 0
close(2) = 0
open("/dev/null", O_RDWR) = 2
read(0, 0xFEF7DC84, 1) (sleeping...)
And it stays there. Solaris' truss only yields results from blocking syscalls if they spend more than two seconds blocked. And it stays there. Staaaays there.
It's a clear sign of a daemon, but hubris comes that way. Stay with me on this one. I had not given up yet.
Since the last call was awaiting for some data from /dev/null (which doesn't make sense at all for rm) I checked /dev/null permissions, its original char-special for permissions, checked /etc/group for permissions on that file, well, everything to no avail. Then I had to do a local truss in my work's OpenSolaris machine to see what happened locally, since I have the same version of the OS installed. Same version, almost-the-same customizations, mind you.
execve("/usr/bin/rm", 0x08047804, 0x08047810) argc = 2
resolvepath("/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
resolvepath("/usr/bin/rm", "/usr/bin/rm", 1023) = 11
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFB0000
xstat(2, "/usr/bin/rm", 0x080475C8) = 0
open("/var/ld/ld.config", O_RDONLY) = 4
fxstat(2, 4, 0x080474F8) = 0
mmap(0x00000000, 96, PROT_READ, MAP_SHARED, 4, 0) = 0xFEFA0000
close(4) = 0
sysconfig(_CONFIG_PAGESIZE) = 4096
xstat(2, "./libintl.so.3", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/local/lib/libintl.so.3", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/X11R6/lib/libintl.so.3", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/openwin/lib/libintl.so.3", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/dt/lib/libintl.so.3", 0x08046D88) Err#2 ENOENT
xstat(2, "/lib/libintl.so.3", 0x08046D88) = 0
resolvepath("/lib/libintl.so.3", "/lib/libintl.so.3.4.3", 1023) = 21
open("/lib/libintl.so.3", O_RDONLY) = 4
mmap(0x00010000, 4096, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_ALIGN, 4, 0) = 0xFEF90000
mmap(0x00001000, 36864, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF80000
mmap(0xFEF80000, 30254, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 4, 0) = 0xFEF80000
mmap(0xFEF88000, 2228, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 4, 28672) = 0xFEF88000
memcntl(0xFEF80000, 4744, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(4) = 0
xstat(2, "./libiconv.so.2", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/local/lib/libiconv.so.2", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/X11R6/lib/libiconv.so.2", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/openwin/lib/libiconv.so.2", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/dt/lib/libiconv.so.2", 0x08046D88) Err#2 ENOENT
xstat(2, "/lib/libiconv.so.2", 0x08046D88) = 0
resolvepath("/lib/libiconv.so.2", "/lib/libiconv.so.2.4.0", 1023) = 22
open("/lib/libiconv.so.2", O_RDONLY) = 4
mmap(0xFEF90000, 4096, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED, 4, 0) = 0xFEF90000
mmap(0x00001000, 999424, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE8A000
mmap(0xFEE8A000, 992476, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 4, 0) = 0xFEE8A000
mmap(0xFEF7D000, 3208, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 4, 995328) = 0xFEF7D000
memcntl(0xFEE8A000, 4484, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(4) = 0
xstat(2, "./libc.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/local/lib/libc.so.1", 0x08046D88) = 0
resolvepath("/usr/local/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/usr/local/lib/libc.so.1", O_RDONLY) = 4
mmap(0xFEF90000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 0) = 0xFEF90000
mmap(0x00010000, 1044480, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFED80000
mmap(0xFED80000, 939729, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 4, 0) = 0xFED80000
mmap(0xFEE76000, 27414, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 4, 942080) = 0xFEE76000
mmap(0xFEE7D000, 5704, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEE7D000
munmap(0xFEE66000, 65536) = 0
memcntl(0xFED80000, 203020, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(4) = 0
xstat(2, "./libgen.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/local/lib/libgen.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/X11R6/lib/libgen.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/openwin/lib/libgen.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/dt/lib/libgen.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/lib/libgen.so.1", 0x08046D88) = 0
resolvepath("/lib/libgen.so.1", "/lib/libgen.so.1", 1023) = 16
open("/lib/libgen.so.1", O_RDONLY) = 4
mmap(0xFEF90000, 4096, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED, 4, 0) = 0xFEF90000
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFED70000
mmap(0x00010000, 94208, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFED50000
mmap(0xFED50000, 23341, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 4, 0) = 0xFED50000
mmap(0xFED66000, 1791, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 4, 24576) = 0xFED66000
munmap(0xFED56000, 65536) = 0
memcntl(0xFED50000, 7256, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(4) = 0
xstat(2, "./libsocket.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/local/lib/libsocket.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/X11R6/lib/libsocket.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/openwin/lib/libsocket.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/usr/dt/lib/libsocket.so.1", 0x08046D88) Err#2 ENOENT
xstat(2, "/lib/libsocket.so.1", 0x08046D88) = 0
(.... loads and loads of stuff ....)
unlink("bla") = 0
close(1) = 0
_exit(0)
Well, it unlinks the inode, for sure. Also, both trusses' output differ from each other radically. It's a sign of different executables running on different machines, also almost for sure. Then I realised that something too strange had happened, maybe even invasion-related. Afterall, compared to the local rm executable, it was 10k smaller and linked to fewer libraries (according to ldd). Since security-wise it doesn't make sense, but it could happen -- definitely something involving quantum mechanics to let this happen -- I started checking security logs and stuff.
Until BLAM! I realised that few days before I swapped squid's unlinkd daemon -- which was pointing to /bin/rm -- to the real unlinkd daemon, compiled for Solaris. Well, I did it in the first place because squid's unlinkd daemon weren't shipped properly on Nexenta... ahhhh the hacks!
Since it was happening in both UFS and ZFS filesystems, it was no ZFS bug at first; and as I always have those nifty backups, it wasn't that worrisome. It surely felt strange to have a rm-less system, even with /bin/unlink working...
|