IRC, freenode, #hurd, 2011-11-18
<nocturnal> I'm learning about GNU Hurd and was speculating with a friend
who is also a computer enthusiast. I would like to know if Hurds
microkernel can recover services should they crash? and if it can, does
that recovery code exist in multiple services or just one core kernel
<braunr> nocturnal: you should read about passive translators
<braunr> basically, there is no dedicated service to restore crashed
<etenil> Hi everyone!
<braunr> services can crash and be restarted, but persistence support is
limited, and rather per serivce
<braunr> actually persistence is more a side effect than a designed thing
<braunr> etenil: hello
<etenil> braunr: translators can also be spawned on an ad-hoc basis, for
instance when accessing a particular file, no?
<braunr> that's what being passive, for a translator, means
<etenil> ah yeah I thought so :)
IRC, freenode, #hurd, 2011-11-19
<chromaticwt> will hurd ever have the equivalent of a rs server?, is that
even possible with hurd?
<youpi> chromaticwt: what is an rs server ?
<chromaticwt> a reincarnation server
<youpi> ah, like minix. Well, the main ground issue is restoring existing
information, such as pids of processes, etc.
<youpi> I don't know how minix manages it
<antrik> chromaticwt: I have a vision of a session manager that could also
take care of reincarnation... but then, knowing myself, I'll probably
never imlement it
<youpi> we do get proc crashes from times to times
<youpi> it'd be cool to see the system heal itself :)
<braunr> i need a better description of reincarnation
<braunr> i didn't think it would make core servers like proc able to get
resurrected in a safe way
<antrik> depends on how it is implemented
<antrik> I don't know much about Minix, but I suspect they can recover most
<antrik> essentially, the condition is to make all precious state be
constantly serialised, and held by some third party, so the reincarnated
server could restore it
<braunr> should it work across reboots ?
<antrik> I haven't thought about the details of implementing it for each
core server; but proc should be doable I guess... it's not necessary for
the system to operate, just for various UNIX mechanisms
<antrik> well, I'm not aware of the Minix implementation working across
reboots. the one I have in mind based on a generic session management
infrastructure should though :-)
IRC, freenode, #hurd, 2012-12-06
<Tekk_> out of curiosity, would it be possible to strap on a resurrection
server to hurd?
<Tekk_> in the future, that is
<Tekk_> cool :)
<braunr> but this requires things like persistence
<spiderweb> like a reincarnation server?
<braunr> it's a lot of works, with non negligible overhead
<Tekk_> spiderweb: yes, exactly. I didn't remember tanenbaum's wording on
<braunr> i'm pretty sure most people would be against that
<spiderweb> braunr: why so?
<Tekk_> it was actually the feature that convinced me that ukernels were a
<Tekk_> spiderweb: because then you need a process that keeps track of all
the other servers
<Tekk_> and they have to be replying to "useless" pings to see if they're
<braunr> spiderweb: the hurd community isn't looking for a system reliable
in critical environments
<braunr> just a general purpose system
<braunr> and persistence requires regular data saves
<braunr> it's expensive
<Tekk_> as well as that
<braunr> we already have performance problems because of the nature of the
system, adding more without really looking for the benefits is useless
<spiderweb> so you can't theoretically have both?
<braunr> persistence and performance ?
<braunr> it's hard
<Tekk_> spiderweb: you need to modify the other translators to be
<braunr> only the ones you care about actually
<braunr> but it's just better to make the critical servers very stable
<Tekk_> so it's not just turning on and off the reincarnation
<braunr> (there isn't that much code there)
<braunr> and the other servers restartable
<mcsim> braunr: I think that if there will be aim to make something like
resurrection server than it will be needed rewrite most servers to make
them stateless, isn't it?
<braunr> that's a lot easier and already works with non essential passive
<Tekk_> mcsim: pretty much
<braunr> mcsim: only those you care about
<braunr> mcsim: the proc auth exec servers for example, perhaps the file
system servers that can act as root fs, but the others would simply be
restarted by the passive translator mechanism
<spiderweb> what about restarting device drivers, that would be simple
<braunr> that's perfectly doable, yes
<spiderweb> (being an OS newbie) - it does seem to me that the whole
reincarnation server concept could quite possibly be a band aid.
<braunr> spiderweb: no it really works
<braunr> many systems do that actually
<braunr> let me give you a link
<braunr> it's a bit old, but there is a review of systems aiming at
resilience and how they achieve part of it
<spiderweb> neat, thanks
<braunr> actually it's not that old at all
<braunr> around 2007
IRC, freenode, #hurd, 2013-08-26
< teythoon> I came across some paper about process reincarnation and
created a little prototype a while back:
< teythoon> http://darnassus.sceen.net/gitweb/teythoon/reincarnation.git/
< teythoon> and I looked into restarting the exec server in case it
dies. the exec server is an easy target since it has no state of its own
< teythoon> the only problem is that there is no exec server around to
start a new one
< youpi> teythoon: there could be another exec server only used to
(re)start the exec server
< youpi> that other exec server could even be restarted by the normal exec
< pinotree> what about a watchdog server?
< teythoon> youpi: yes, I had the same idea, i actually patched /hurd/init
to do that, it's just not yet working
< pinotree> make it watch other servers (exec included), and make exec
watch the watchdog only
< teythoon> pinotree: look at my prototype, there is a watchdog server
< braunr> teythoon: what's the point of reincarnation without persistence ?
< teythoon> braunr: there is no point in reincarnation w/o persistence of
< teythoon> my prototype does a limited form of persistence
< teythoon> the point was to see whether I can mitm a translator and
restart it on demand and to gain more insight into the whole translator
< braunr> teythoon: ok
< teythoon> braunr: see the readme, it retains state across reincarnations
< braunr> teythoon: how ?
< teythoon> braunr: the server can store a checkpoint using the
< teythoon> uh >,< sorry, pasted twice
< braunr> oh ok