There are a lot of reports about this issue, but no thorough analysis.

Short Timeouts

elinks

IRC, unknown channel, unknown date:

<paakku> This is related to ELinks... I've looked at the select()
  implementation for the Hurd in glibc and it seems that giving it a short
  timeout could cause it not to report that file descriptors are ready.
<paakku> It sends a request to the Mach port of each file descriptor and
  then waits for responses from the servers.
<paakku> Even if the file descriptors have data for reading or are ready
  for writing, the server processes might not respond immediately.
<paakku> So if I want ELinks to check which file descriptors are ready, how
  long should the timeout be in order to ensure that all servers can
  respond in time?
<paakku> Or do I just imagine this problem?

dbus

IRC

IRC, freenode, #hurd, 2012-01-31

<braunr> don't you find vim extremely slow lately ?
<braunr> (and not because of cpu usage but rather unnecessary sleeps)
<jkoenig> yes.
<braunr> wasn't there a discussion to add a minimum timeout to mach_msg for
  select() or something like that during the past months ?
<youpi> there was, and it was added
<youpi> that could be it
<youpi> I don't want to drop it though, some app really need it
<braunr> as a debian patch only iirc ?
<youpi> yes
<braunr> ok
<braunr> if i'm right, the proper solution was to fix remote servers
  instead of client calls
<youpi> (no drop, unless the actual bug gets fixed of course)
<braunr> so i'm guessing it's just a hack in between
<youpi> not only
<youpi> with a timeout of zero, mach will just give *no* time for the
  servers to give an answer
<braunr> that's because the timeout is part of the client call
<youpi> so the protocol has to be rethought, both server/client side
<braunr> a suggested solution was to make it a parameter
<braunr> i mean, part of the message
<braunr> not a mach_msg parameter
<jkoenig> OTOH the servers should probably not be trusted to enforce the
  timeout.
<braunr> why ?
<jkoenig> they're not necessarily trusted. (but then again, that's not the
  only circumstances where that's a problem)
<braunr> there is a proposed solution for that too (trust root and self
  servers only by default)
<jkoenig> I'm not sure they're particularily easy to identify in the
  general case
<braunr> "they" ? the solutions you mean ?
<braunr> or the servers ?
<youpi> jkoenig: you can't trust the servers in general to provide an
  answer, timeout or not
<jkoenig> yes the root/self servers.
<braunr> ah
<youpi> jkoenig: you can stat the actual node before dereferencing the
  translator
<jkoenig> could they not report FD activity asynchronously to the message
  port? libc would cache the state
<youpi> I don't understand what you mean
<youpi> anyway, really making the timeout part of the message is not a
  problem
<braunr> 10:10 < youpi> jkoenig: you can't trust the servers in general to
  provide an answer, timeout or not
<youpi> we already trust everything (e.g. read() ) into providing an answer
  immediately
<braunr> i don't see why
<youpi> braunr: put sleep(1) in S_io_read()
<youpi> it'll not give you an immediate answer, O_NODELAY being set or not
<braunr> well sleep is evil, but let's just say the server thread blocks
<braunr> ok
<braunr> well fix the server
<youpi> so we agree
<braunr> ?
<youpi> in the current security model, we trust the server into achieve the
  timeout
<braunr> yes
<youpi> and jkoenig's remark is more global than just select()
<braunr> taht's why we must make sure we're contacting trusted servers by
  default
<youpi> it affects read() too
<braunr> sure
<youpi> so there's no reason not to fix select()
<youpi> that's the important point
<braunr> but this doesn't mean we shouldn't pass the timeout to the server
  and expect it to handle it correctly
<youpi> we keep raising issues with things, and not achieve anything, in
  the Hurd
<braunr> if it doesn't, then it's a bug, like in any other kernel type
<youpi> I'm not the one to convince :)
<braunr> eh, some would say it's one of the goals :)
<braunr> who's to be convinced then ?
<youpi> jkoenig: 
<youpi> who raised the issue
<braunr> ah
<youpi> well, see the irc log :)
<jkoenig> not that I'm objecting to any patch, mind you :-)
<braunr> i didn't understand it that way
<braunr> if you can't trust the servers to act properly, it's similar to
  not trusting linux fs code
<youpi> no, the difference is that servers can be non-root
<youpi> while on linux they can't
<braunr> again, trust root and self
<youpi> non-root fuse mounts are not followed by default
<braunr> as with fuse
<youpi> that's still to be written
<braunr> yes
<youpi> and as I said, you can stat the actual  node and then dereference
  the translator afterwards
<braunr> but before writing anything, we'd better agree on the solution :)
<youpi> which, again, "just" needs to be written
<antrik> err... adding a timeout to mach_msg()? that's just wrong
<antrik> (unless I completely misunderstood what this discussion was
  about...)

IRC, freenode, #hurd, 2012-02-04

<youpi> this is confirmed: the select hack patch hurts vim performance a
  lot
<youpi> I'll use program_invocation_short_name to make the patch even more
  ugly
<youpi> (of course, we really need to fix select somehow)
<pinotree> could it (also) be that vim uses select() somehow "badly"?
<youpi> fsvo "badly", possibly, but still
<gnu_srs1> Could that the select() stuff be the reason for a ten times
  slower ethernet too, e.g. scp and apt-get?
<pinotree> i didn't find myself neither scp nor apt-get slower, unlike vim
<youpi> see strace: scp does not use select
<youpi> (I haven't checked  apt yet)

IRC, freenode, #hurd, 2012-02-14

<braunr> on another subject, I'm wondering how to correctly implement
  select/poll with a timeout on a multiserver system :/
<braunr> i guess a timeout of 0 should imply a non blocking round-trip to
  servers only
<braunr> oh good, the timeout is already part of the io_select call

IRC, freenode, #hurdfr, 2012-02-22

<braunr> le gros souci de notre implé, c'est que le timeout de select est
  un paramètre client
<braunr> un paramètre passé directement à mach_msg
<braunr> donc si tu mets un timeout à 0, y a de fortes chances que mach_msg
  retourne avant même qu'un RPC puisse se faire entièrement (round-trip
  client-serveur donc)
<braunr> et donc quand le timeout est à 0 pour du non bloquant, ben tu
  bloques pas, mais t'as pas tes évènements ..
<abique|work> peut-être que passer le timeout de 10ms à 10 us améliorerait
  la situation.
<abique|work> car 10ms c'est un peut beaucoup :)
<braunr> c'est l'interval timer système historique unix
<braunr> et mach n'est pas préemptible
<braunr> donc c'est pas envisageable en l'état
<braunr> ceci dit c'est pas complètement lié
<braunr> enfin si, il nous faudrait qqchose de similaire aux high res
  timers de linux
<braunr> enfin soit des timer haute résolution, soit un timer programmable
  facilement
<braunr> actuellement il n'y a que le 8254 qui est programmé, et pour
  assurer un scheduling à peu près correct, il est programmé une fois, à
  10ms, et basta
<braunr> donc oui, préciser 1ms ou 1us, ça changera rien à l'interval
  nécessaire pour déterminer que le timer a expiré

IRC, freenode, #hurd, 2012-02-27

<youpi> braunr: extremely dirty hack
<youpi> I don't even want to detail :)
<braunr> oh
<braunr> does it affect vim only ?
<braunr> or all select users ?
<youpi> we've mostly seen it with vim
<youpi> but possibly fakeroot has some issues too
<youpi> it's very little probable that only vim has the issue :)
<braunr> i mean, is it that dirty to switch behaviour depending on the
  calling program ?
<youpi> not all select users
<braunr> ew :)
<youpi> just those which do select({0,0})
<braunr> well sure
<youpi> braunr: you guessed right :)
<braunr> thanks anyway
<braunr> it's probably a good thing to do currently
<braunr> vim was getting me so mad i was using sshfs lately
<youpi> it's better than nothing yes

See Also

See also select bogus fd and select vs signals.