#include <pthread.h>
#include <stdio.h>
#include <sys/select.h>
#include <unistd.h>

void *f (void*foo)
{
    char buf[128];
//pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
while (1) {
    read (0, buf, sizeof(buf));
}
}
int main (void) {
    pthread_t t;
pthread_create (&t, NULL, f, NULL);
sleep (1);
pthread_cancel (t);
pthread_join (t, NULL);
exit(0);
}

read() is not behaving as a cancellation point, only setting the cancel type to asynchronous permits this testcase to terminate. We do have the pthread_setcanceltype glibc/libpthread hook in the forward structure, but we are not using it: the LIBC_CANCEL_ASYNC macros are void, and we're not using them in the mig msg call either.

Provenance

IRC, OFTC, #debian-hurd, 2013-04-15

<paravoid> so, let me say a few things about the bug in the first place
<paravoid> the package builds and runs a test suite
<paravoid> the second test in the test suite blocks forever
<paravoid> a blocked pthread_join is what I see
<paravoid> I'm unsure why
<paravoid> have you seen anything like it before?
<youpi> whenever the thread doesn't actually terminate, sure
<youpi> what is the thread usually blocked on when you cancel it?
<paravoid> this is a hurd-specific issue
<paravoid> works on all other arches
<youpi> could be just that all other archs have more relaxed behavior
<youpi> thus the question of what exactly is supposed to be happening
<youpi> apparently it is inside a select?
<youpi> it seems select is not cancellable here
<pinotree> wasn't the patch you sent?
<youpi> no, my patch was about signals
<youpi> not cancellation
<pinotree> k
<youpi> (even if that could be related, of course)
<paravoid> how did you see that?
<paravoid> what's the equivalent of strace?
<youpi> thread 3 is inside _hurd_select
<paravoid> thread 1 is blocked on join
<paravoid> but the code is
<paravoid>     if(gdmaps->reload_thread_spawned) {
<paravoid>         pthread_cancel(gdmaps->reload_tid);
<paravoid>         pthread_join(gdmaps->reload_tid, NULL);
<paravoid>     }
<paravoid> so cancel should have killed the thread
<youpi> cancelling a thread is a complex matter
<youpi> there are cancellation points
<youpi> e.g. a thread performing while(1); can't be cancelled
<paravoid> thread 3 is just a libev event loop
<youpi> yes, "just" calling poll, the most complex system call of unix :)
<youpi> paravoid: anyway, don't look for a bug in your program, it's most
  likely a bug in glibc, thanks for the report
<paravoid> I think it all boils down to a problem cancelling a thread in
  poll()
<youpi> yes
<youpi> paravoid: ok, actually with the latest libc it does work
<paravoid> oh?
<youpi> where latest = not uploaded yet :/
<paravoid> did you test this on exodar?
<youpi> pinotree: that's the libpthread_cancellation.diff I guess
<paravoid> because I commented out the join :)
<youpi> paravoid:  in the root, yes
<youpi> well, I tried my own program
<paravoid> oh, okay
<youpi> which is indeed hanging inside select (or just read) in the chroot
<youpi> but not in the root
<pinotree> ah, richard's patch
<paravoid> url?
<youpi> I've installed the build-dep in the root, if you want to try
<paravoid> strange that root is newer than the chroot :)
<youpi> paravoid: it's the usual eglibc debian source
<paravoid> tried in root, still fails
<youpi> could you keep the process running?
<paravoid> done
<youpi> Mmm, but the thread running gdmaps_reload_thread never set the
  cancel type to async?
<youpi> that said I guess read and select are supposed to be cancellation
  points
<youpi> thus cancel_deferred should be working, but they are not
<youpi> it seems it's cancellation points which have just not been
  implemented
<youpi> (they happen to be one of the most obscure things in posix)

IRC, freenode, #hurd, 2013-04-15

<youpi> but yes, there is still an issue, with PTHREAD_CANCEL_DEFERRED
<youpi> how calls like read() or select() are supposed to test
  cancellation?
<pinotree> iirc there are the LIBC_CANCEL_* macros in glibc
<pinotree> eg sysdeps/unix/sysv/linux/pread.c
<youpi> yes
<youpi> but in our libpthredaD?
<pinotree> could it be we lack the libpthread → glibc bridge of
  cancellation stuff?
<youpi> we do have pthread_setcancelstate/type forwards
<youpi> but it seems the default LIBC_CANCEL_ASYNC is void
<pinotree> i mean, so when you cancel a thread, you can get that cancel
  status in libc proper, just like it seems done with LIBC_CANCEL_* macros
  and nptl
<youpi> as I said, the bridge is there
<youpi> we're just not using it in glibc
<youpi> I'm writing an open_issues page

IRC, freenode, #hurd, 2013-04-16

<braunr> youpi: yes, we said some time ago that it was lacking

userspace-rcu

With 2.13-39+hurd.3.rbraun.1 (that is, 2.13-39+hurd.3 plus hurd-i386/0001-Mask-options-implemented-by-the-userspace-side-of-ma.patch.) installed.

During make check of the userspace-rcu package.

[...]
./test_urcu_gc 4 4 10 -d 0 -b 4096
[hangs]

(gdb) thread apply all bt

Thread 5 (Thread 14933.5):
#0  0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
#1  0x01068074 in __mach_msg (msg=0x27fff2c, option=3, send_size=24, rcv_size=32, rcv_name=120, timeout=0, notify=0) at msg.c:115
#2  0x011ed35c in __thread_suspend (target_thread=115) at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/RPC_thread_suspend.c:84
#3  0x01045016 in __pthread_thread_halt (thread=0x80744a8) at ../libpthread/sysdeps/mach/pt-thread-halt.c:43
#4  0x01041365 in __pthread_exit (status=0x2) at ./pthread/pt-exit.c:118
#5  0x01040e78 in entry_point (start_routine=0x80494b0 <thr_writer>, arg=0x3) at ./pthread/pt-create.c:50
#6  0x00000000 in ?? ()

Thread 4 (Thread 14933.4):
#0  0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
#1  0x01068074 in __mach_msg (msg=0x25fff2c, option=3, send_size=24, rcv_size=32, rcv_name=119, timeout=0, notify=0) at msg.c:115
#2  0x011ed35c in __thread_suspend (target_thread=113) at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/RPC_thread_suspend.c:84
#3  0x01045016 in __pthread_thread_halt (thread=0x8073aa8) at ../libpthread/sysdeps/mach/pt-thread-halt.c:43
#4  0x01041365 in __pthread_exit (status=0x2) at ./pthread/pt-exit.c:118
#5  0x01040e78 in entry_point (start_routine=0x80494b0 <thr_writer>, arg=0x2) at ./pthread/pt-create.c:50
#6  0x00000000 in ?? ()

Thread 3 (Thread 14933.3):
#0  0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
#1  0x01068074 in __mach_msg (msg=0x23ffe34, option=1282, send_size=0, rcv_size=40, rcv_name=122, timeout=10, notify=0) at msg.c:115
#2  0x0106ece3 in _hurd_select (nfds=0, pollfds=0x0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x23ffefc, sigmask=0x0) at hurdselect.c:382
#3  0x0115875b in __poll (fds=fds@entry=0x0, nfds=nfds@entry=0, timeout=timeout@entry=10) at ../sysdeps/mach/hurd/poll.c:48
#4  0x0804a1bc in urcu_adaptative_busy_wait (wait=0x23fff48) at ../urcu-wait.h:164
#5  synchronize_rcu_mb () at ../urcu.c:329
#6  0x0804946c in rcu_gc_clear_queue (wtidx=wtidx@entry=1) at test_urcu_gc.c:241
#7  0x080495e6 in rcu_gc_reclaim (old=<optimized out>, wtidx=1) at test_urcu_gc.c:264
#8  thr_writer (data=0x1) at test_urcu_gc.c:295
#9  0x01040e70 in entry_point (start_routine=0x80494b0 <thr_writer>, arg=0x1) at ./pthread/pt-create.c:50
#10 0x00000000 in ?? ()

Thread 2 (Thread 14933.2):
#0  0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
#1  0x01068074 in __mach_msg (msg=0x17fdf30, option=3, send_size=32, rcv_size=4096, rcv_name=95, timeout=0, notify=0) at msg.c:115
#2  0x01068799 in __mach_msg_server_timeout (demux=0x1079150 <msgport_server>, max_size=4096, rcv_name=95, option=0, timeout=0) at msgserver.c:151
#3  0x0106886b in __mach_msg_server (demux=0x1079150 <msgport_server>, max_size=4096, rcv_name=95) at msgserver.c:196
#4  0x0107911f in _hurd_msgport_receive () at msgportdemux.c:68
#5  0x01040e70 in entry_point (start_routine=0x10790b0 <_hurd_msgport_receive>, arg=0x0) at ./pthread/pt-create.c:50
#6  0x00000000 in ?? ()

Thread 1 (Thread 14933.1):
#0  0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
#1  0x01068074 in __mach_msg (msg=0x15ff93c, option=2, send_size=0, rcv_size=24, rcv_name=94, timeout=0, notify=0) at msg.c:115
#2  0x010451a2 in __pthread_block (thread=0x805e600) at ../libpthread/sysdeps/mach/pt-block.c:35
#3  0x010443a8 in __pthread_cond_timedwait_internal (cond=0x80730dc, mutex=0x80730bc, abstime=0x0) at ./pthread/../sysdeps/generic/pt-cond-timedwait.c:130
#4  0x01043fcc in __pthread_cond_wait (cond=0x80730dc, mutex=0x80730bc) at ./pthread/../sysdeps/generic/pt-cond-wait.c:36
#5  0x010414ef in pthread_join (thread=8, status=status@entry=0x15ffa6c) at ./pthread/pt-join.c:46
#6  0x08048f9b in main (argc=8, argv=0x15ffb08) at test_urcu_gc.c:466
(gdb) thread 3
[Switching to thread 3 (Thread 14933.3)]
#0  0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
2       /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S: Datei oder Verzeichnis nicht gefunden.
(gdb) call pthread_self()
$1 = 8

That is, Thread 1 is waiting for Thread 3 (8) to join, which is stuck in poll.

Is this really the libpthread cancellation points issue -- there doesn't seem to be any thread cancellation involved?