Issues with the current 2.17 version of glibc/EGLIBC in Debian experimental. Now in unstable.

IRC, OFTC, #debian-hurd, 2013-03-14

<markus_w1nner> I have a strange tcp via localhost question:
<markus_wanner> The other side closes the connection, but I haven't read
  all data, yet. I should still be able to read the pending data, no?
<markus_wanner> At least it seems to work that way on Linux, but not on
<markus_wanner> Got a simple repro with nc, if you're interested...
<youpi> markus_wanner: yes, we're interested
<markus_wanner> youpi: okay, here we go:
<markus_wanner> session 1: nc -l -p 7777 localhost
<markus_wanner> session 2: nc 7777
<markus_wanner> session 2: a <RET> b <RET> c <RET>
<markus_wanner> session 1: [ pause with Ctrl-Z ]
<markus_wanner> session 2: [ send more data ] d <RET> e <RET> f <RET>
<markus_wanner> session 2: [ quit with Ctrl-C ]
<markus_wanner> session 1: [ resume with 'fg' ]
<markus_wanner> The server on session 1 doesn't get the data sent after it
  paused and before the client closed the connection.
<markus_wanner> I'm not sure if that's a valid TCP thing. However, on
  Linux, the server still gets the data. On hurd it doesn't.
<markus_wanner> I'm working on a C-code test case, ATM.
<youpi> markus_wanner: on which box are you seeing this behavior?
<youpi> exodar does not have it
<youpi> i.e. I do get the d e f
<markus_wanner> a private VM (I'm not a DD)
<markus_wanner> ..updated to latest experimental stuff.
<markus_wanner> GNU lematur 0.3 GNU-Mach 1.3.99-486/Hurd-0.3 i686-AT386 GNU
<youpi> ok, I can't reproduce it on my vm either
<youpi> maybe the C program will help
<markus_wanner> Hm.. cannot corrently reproduce that in C. (Netcat still
  shows the issue, though).
<markus_wanner> I'll try to strace netcat...
<markus_wanner> ..Meh. strace not available on Hurd?
<pinotree> no, but there is rpctrace to show the various rpc
<markus_wanner> Cool, looks helpful.
<markus_wanner> Thx
<markus_wanner> Uh.. that introduces another error:
<markus_wanner> rpctrace: ../../utils/rpctrace.c:1287: trace_and_forward:
  Assertion `reply_type == 18' failed.


<youpi> I'm checking on a box without ipv6 configuration
<youpi> maybe that's the difference between you and me
<youpi> I guess your /etc/alternatives/nc is /bin/nc.traditional ?
<markus_wanner> Yup, nc.traditional.
<markus_wanner> Looks like that box only has IPv4 configured.
<markus_wanner> Something very strange is going on here. No matter how hard
  I try, I cannot reproduce this with netcat, anymore.
<pinotree> not even after a reboot?
<markus_wanner> Woo.. here, it happened, again! This is driving me crazy!
<markus_wanner> Now, nc seemingly connects, but is unable to send data
  between the two. Netcat would somehow complain, if it failed to connect,
<markus_wanner> No it worked.
<markus_wanner> So this seems to be an intermittent issue. So far, I could
  only ever repro it as a normal user, not as root. May be coincidental,
<markus_wanner> Now, 'a' and 'b' made it through, but not the 'c' sent
  manually just after that. Something with that TCP/IP stack is definitely
<markus_wanner> Anything I can try to investigate? Or shall I simply
  restart and see if the problem persists?
<youpi> maybe restart, yes
<youpi> did you restart since the upgrade ?
<markus_wanner> Yes, I restarted after that.
<markus_wanner> Hm.. okay, restarted. Some problem persists.
<markus_wanner> I currently have two netcat processes connected, the
  listening one got some first two messages and seems stuck now.
<markus_wanner> With the client, I tried to send more data, but the server
  doesn't get it, anymore.
<markus_wanner> Any idea on what I can do to analyze the situation?
<youpi> for the netcat issue, I haven't experienced this
<youpi> are you running in kvm or virtualbox or something else?
<markus_wanner> I'm currently puzzled about what "experimental" actually
<markus_wanner> On kvm.
<markus_wanner> My libc0.3 used to be 2.13-39+hurd.3.
<markus_wanner> But packages.d.o already shows 2.17.0experimental2.
<youpi> experimental ships experimental versions, which you aren't supposed
  to use
<youpi> unless you know what you are doing
<youpi> iirc 2.17 is known to be quite broken for now
<markus_wanner> Okay. So I guess I'll try to "downgrade" to unstable, then.
<markus_wanner> Phew, okay, successfully downgraded to unstable.
<markus_wanner> Hopefully monotone's test suite runs through fine, now.
<markus_wanner> Yup, WORKING! Looks like some experimental packages caused
  the problem. The netcat test as well as that one failing monotone test
  work fine, now.

IRC, OFTC, #debian-hurd, 2013-03-19

<tschwinge> pinotree, youpi: Is there anything from that markus_wanner
  discussion about pfinet/netcat/signals that needs to be filed?  I guess
  we don't know what exactly he changed so that everything workedd fine
  eventually?  (Some experimental package(s), but which?)
<youpi> that was libc0.3 packages
<youpi> which are indeed known to break the network

IRC, freenode, #hurd, 2013-06-18

<braunr> root@darnassus:~# dpkg-reconfigure locales
<braunr> Generating locales (this might take a
  while)... en_US.UTF-8...Segmentation fault
<braunr> is it known ?
<youpi> uh, no

IRC, OFTC, #debian-hurd, 2013-06-19

<pinotree> btw i saw too the segmentation fault when generating locales

IRC, freenode, #hurd, 2014-02-04

<bu^> hello
<bu^> I just updated
<bu^> Setting up locales (2.17-98~0) ...
<bu^> Generating locales (this might take a while)...
<bu^>   en_US.UTF-8...Segmentation fault
<bu^>  done
<gnu_srs> bu^: That's known, it still seems to work, though. If you have
  the time please debug. I've tried but not found the solution yet:-(
<bu^> ok, just wanted to notify

IRC, freenode, #hurd, 2014-02-19

<braunr> for info, the localedef segfault has been fixed upstream
<braunr> or rather, upstream has been written in a way that won't trigger
  the segfault
<braunr> it is caused by the locale archive code that maps the locale
  archive file in the address space, enlarging the mapping as needed, but
  unmaps the complete reserved size of 512M on close
<braunr> munmap is implemented through vm_deallocate, but it looks like the
  latter doesn't allow deallocating unmapped regions of the address space
<braunr> (to be confirmed)
<braunr> upstream code tracks the mapping size so vm_deallocate won't whine
<braunr> i expect we'll have that in eglibc 2.18
<braunr> hm actually, posix says munmap must refer to memory obtained with
  mmap :)
<braunr> (or actually, that the behaviour is undefined, which most unix
  systems allow anyway, but not us)

<braunr> also, before i leave, i have partially traced the localedef
<youpi> ah, cool
<braunr> localedef maps the locale archive, and enlarges the mapping as
<braunr> but munmaps the complete 512m reserved area
<braunr> and i strongly suspect it unmaps something it shouldn't on the
<braunr> since linux mmap has different boundaries depending on the mapping
<braunr> while our glibc will happily maps stacks below text
<braunr> the good news is that it looks fixed upstream
<youpi> ah :)
<braunr> see the change about close_archive
<braunr> i haven't tested it though

IRC, freenode, #hurd, 2014-02-21

<gg0> just upgraded to 2.18, locales still segfaults
<braunr> ok

IRC, freenode, #hurd, 2014-02-23

<braunr> ok, as expected, the localdef bug is because of some mmap issue


<braunr> looks like our mmap doesn't like mapping files with PROT_NONE
<braunr> shouldn't be too hard to fix
<braunr> gg0: i should have a fix ready soon for localedef

<braunr> youpi: i have a patch for glibc about the localedef segfault
<youpi> is that the backport we talked about, or something else?
<braunr> something else
<braunr> in short
<braunr> mmap() PROT_NONE on files return 0
<youpi> ok
<youpi> seems like fixable indeed
<braunr> nothing is mapped, and the localdef code doesn't consider this an
<braunr> my current fix is to handle PROT_NONE like PROT_READ
<youpi> doesn't vm_protect allow to map something without giving read
<braunr> it probably does
<braunr> the problem is in glibc
<youpi> ok
<braunr> when i say like PROT_READ, i mean a memory object gets a reference
<braunr> on the read port returned by io_map
<braunr> since it's not accessible anyway, it shouldn't make a difference
<braunr> but i preferred to have the memory object referenced anyway to
  match what i expect is done by other systems

IRC, freenode, #hurd, 2014-02-24

<youpi> braunr: ah ok

<braunr> ok that mmap fix looks fine, i'll add comments and commit it soon

IRC, freenode, #hurd, 2014-03-03

<youpi> braunr: did you test whether;a=commitdiff;h=17db6e8d6b12f55e312fcab46faf5d332c806fb6
  does indeed fix locale generation?
<braunr> youpi: it doesn't, which is why i applied

IRC, OFTC, #debian-hurd, 2013-06-20

<youpi> damn
<youpi> hang at ext2fs boot
<youpi> static linking issue, clearly

IRC, freenode, #hurd, 2013-06-30

<youpi> Mmm
<youpi>  __access ("/etc/", F_OK) at startup of ext2fs
<youpi> deemed to fail....
<pinotree> when does that happen?
<youpi> at hwcap initialization
<youpi> at least that's were ext2fs.static linked against libc 2.17 hangs
  at startup
<youpi> and this is indeed a very good culprit :)
<pinotree> ah, a debian patch
<youpi> does anybody know a quick way to know whether one is the / ext2fs ?
<pinotree> isn't the root fs given a special port?
<youpi> I was thinking about something like this, yes
<youpi> ok, boots
<youpi> I'll build a 8~0 that includes the fix
<youpi> so people can easily build the hurd package
<youpi> Mmm, no, the bootstrap port is also NULL for normally-started
  processes :/
<youpi> I don't understand why
<youpi> ah, only translators get a bootstrap port :/
<youpi> perhaps CRDIR then
<youpi> (which makes a lot of sense)

IRC, freenode, #hurd, 2013-07-01

<braunr> youpi: what is local-no-bootstrap-fs-access.diff supposed to fix ?
<youpi> ext2fs.static linked againt debian glibc 2.17
<youpi> well, as long as you don't build & use ext2fs.static with it...
<braunr> that's thing, i want to :)
<braunr> +the
<youpi> I'd warmly welcome a way to detect whether being the / translator
  process btw
<youpi> it seems far from trivial

glibc 2.18 vs. GCC 4.8

IRC, freenode, #hurd, 2013-11-25

<youpi> grmbl, installing a glibc 2.18 rebuilt with gcc-4.8 brings an
  unbootable system

IRC, freenode, #hurd, 2013-11-29

<teythoon> so, what do I do? rebuild the glibc 2.18 package with gcc4.8 and
  see what breaks ?
<teythoon> when I boot a system with that libc that is ?
<teythoon> I wish youpi would have been more specific, I've never built the
  libc before...
<braunr> debian/rules build in the debian package
<braunr> ctrl-c when you see gcc invocations
<braunr> cd buildir; make lib others
<braunr> although hm
<braunr> what breaks is at boot time right ?
<teythoon> yes
<braunr> heh ..
<braunr> then dpkg-buildpackage
<braunr> DEB_BUILD_OPTIONS=nocheck speeds things up
<braunr> just answer on the mailing list and ask him
<braunr> he usually answers quickly

IRC, freenode, #hurd, 2013-12-18

<gnu_srs> teythoon: k!, any luck with eglibc-2.18?
<teythoon> tbh i didn't look into this after two unsuccessful attempts at
  building the libc package
<teythoon> there was a post over at the libc-alpha list that sounded
<braunr> wow
<teythoon> ?
<braunr> this looks tricky
<braunr> and why ia64 only
<teythoon> indeed
<braunr> it's rare to see aurel32 ask such questions

IRC, freenode, #hurd, 2014-01-22

<youpi> btw, did anybody investigate the glibc-built-with-gcc-4.8 issue?
<youpi> oddly enough, a subhurd boots completely fine with it
<braunr> i didn't
<teythoon> no, sorry
<youpi> I was wondering whether the bogus deallocation at boot might have
  something to do
<braunr> which one ?
<braunr> ah
<braunr> yes
<braunr> maybe
<youpi> quoted earlier here