freenode, #hurd channel, 2011-03-05:

<nixness> what about testing though?
<nixness> like sort of "what's missing? lets write tests for it so that
  when someone gets to implementing it, he knows what to do. Repeat"
  project
<antrik> you mean creating an automated testing framework?
<antrik> this is actually a task I want to add for this year, if I get
  around to it :-)
<nixness> yeah I'd very much want to apply for that one
<nixness> cuz I've never done Kernel hacking before, but I know that with
  the right tasks like "test VM functionality", I would be able to write up
  the automated tests and hopefully learn more about what breaks/makes the
  kernel
<nixness> (and it would make implementing the feature much less hand-wavy
  and more correct)
<nixness> antrik, I believe the framework(CUnit right?) exists, but we just
  need to write the tests.
<antrik> do you have prior experience implementing automated tests?
<nixness> lots of tests!
<nixness> yes, in Java mostly, but I've played around with CUnit
<antrik> ah, great :-)
<nixness> here's what I know from experience: 1) write basic tests. 2)
  write ones that test multiple features 3) stress test [option 4)
  benchmark and record to DB]
<youpi> well, what we'd rather need is to fix the issues we already know
  from the existing testsuites :)

GSoC project propsal.

<nixness> youpi, true, and I think I should check what's available in way
  of tests, but if the tests are "all or nothing" then it becomes really
  hard to fix your mistakes
<youpi> they're not all or nothing
<antrik> youpi: the existing testsuites don't test specific system features
<youpi> libc ones do
<youpi> we could also check posixtestsuite which does too

open posix test suite.

<antrik> AFAIK libc has very few failing tests

glibc.

<youpi> err, like twenty?
<youpi> € grep -v '^#' expected-results-i486-gnu-libc | wc -l
<youpi> 67
<youpi> nope, even more
<antrik> oh, sorry, I confused it with coreutils
<pinotree> plus the binutils ones, i guess
<youpi> yes

binutils.

<antrik> anyways, I don't think relying on external testsuites for
  regression testing is a good plan
<antrik> also, that doesn't cover unit testing at all
<youpi> why ?
<youpi> sure there can be unit testing at the translator etc. level
<antrik> if we want to implement test-driven development, and/or do serious
  refactoring without too much risk, we need a test harness where we can
  add specific tests as needed
<youpi> but more than often, the issues are at the libc / etc. level
  because of a combination o fthings at the translator level, which unit
  testing won't find out
* nixness yewzzz!
<nixness> sure unit testing can find them out. if they're good "unit" tests
<youpi> the problem is that you don't necessarily know what "good" means
<youpi> e.g. for posix correctness
<youpi> since it's not posix
<nixness> but if they're composite clever tests, then you lose that
  granularity
<nixness> youpi, is that a blackbox test intended to be run at the very end
  to check if you're posix compliant?
<antrik> also, if we have our own test harness, we can run tests
  automatically as part of the build process, which is a great plus IMHO
<youpi> nixness: "that" = ?
<nixness> oh nvm, I thought there was a test stuie called "posix
  correctness"
<youpi> there's the posixtestsuite yes
<youpi> it's an external one however
<youpi> antrik: being external doesn't mean we can't invoke it
  automatically as part of the build process when it's available
<nixness> youpi, but being internal means it can test the inner workings of
  certain modules that you are unsure of, and not just the interface
<youpi> sure, that's why I said it'd be useful too
<youpi> but as I said too, most bugs I've seen were not easy to find out at
  the unit level
<youpi> but rather at the libc level
<antrik> of course we can integrate external tests if they exist and are
  suitable. but that that doesn't preclude adding our own ones too. in
  either case, that integration work has to be done too
<youpi> again, I've never said I was against internal testsuite
<antrik> also, the major purpose of a test suite is checking known
  behaviour. a low-level test won't directly point to a POSIX violation;
  but it will catch any changes in behaviour that would lead to one
<youpi> what I said is that it will be hard to write them tight enough to
  find bugs
<youpi> again, the problem is knowing what will  lead to a POSIX violation
<youpi> it's long work
<youpi> while libc / posixtestsuite / etc. already do that
<antrik> *any* unexpected change in behaviour is likely to cause bugs
  somewher
<youpi> but WHAT is "expected" ?
<youpi> THAT is the issue
<youpi> and libc/posixtessuite do know that
<youpi> at the translator level we don't really
<youpi> see the recent post about link()

link(dir,name) should fail with EPERM

<youpi> in my memory jkoenig pointed it out for a lot of such calls
<youpi> and that issue is clearly not known at the translator level
<nixness> so you're saying that the tests have to be really really
  low-level, and work our way up?
<youpi> I'm saying that the translator level tests will be difficult to
  write
<antrik> why isn't it known at the translator level? if it's a translator
  (not libc) bug, than obviously the translator must return something wrong
  at some point, and that's something we can check
<youpi> because you'll have to know all the details of the combinations
  used in libc, to know whether they'll lead to posix issues
<youpi> antrik: sure, but how did we detect that that was unexpected
  behavior?
<youpi> because of a glib test
<youpi> at the translator level we didn't know it was an unexpected
  behavior
<antrik> gnulib actually
<youpi> and if you had asked me, I wouldn't have known
<antrik> again, we do *not* write a test suite to find existing bugs
<youpi> right, took one for the other
<youpi> doesn't really matter actually
<youpi> antrik: ok, I don't care then
<antrik> we write a test suite to prevent future bugs, or track status of
  known bugs
<youpi> (don't care *enough* for now, I mean)
<nixness> hmm, so write something up antrik for GSoC :) and I'll be sure to
  apply
<antrik> now that we know some translators return a wrong error status in a
  particular situation, we can write a test that checks exactly this error
  status. that way we know when it is fixed, and also when it breaks again
<antrik> nixness: great :-)
<nixness> sweet. that kind of thing would also need a db backend
<antrik> nixness: BTW, if you have a good idea, you can send an application
  for it even if it's not listed among the proposed tasks...
<antrik> so you don't strictly need a writeup from me to apply for this :-)
<nixness> antrik, I'll keep that in mind, but I'll also be checking your
  draft page
<nixness> oh cool :)
<antrik> (and it's a well known fact that those projects which students
  proposed themselfs tend to be the most successful ones :-) )
* nixness draft initiated
<antrik> youpi: BTW, I'm surprised that you didn't mention libc testsuite
  before... working up from there is probably a more effective plan than
  starting with high-level test suites like Python etc...
<youpi> wasn't it already in the gsoc proposal?
<youpi> bummer
<antrik> nope

freenode, #hurd channel, 2011-03-06:

<nixness> how's the hurd coding workflow, typically?

nixness -> foocraft.

<foocraft> we're discussing how TDD can work with Hurd (or general kernel
  development) on #osdev
<foocraft> so what I wanted to know, since I haven't dealt much with GNU
  Hurd, is how do you guys go about coding, in this case
<tschwinge> Our current workflow scheme is... well... is...
<tschwinge> Someone wants to work on something, or spots a bug, then works
  on it, submits a patch, and 0 to 10 years later it is applied.
<tschwinge> Roughly.
<foocraft> hmm so there's no indicator of whether things broke with that
  patch
<foocraft> and how low do you think we can get with tests? A friend of mine
  was telling me that with kernel dev, you really don't know whether, for
  instance, the stack even exists, and a lot of things that I, as a
  programmer, can assume while writing code break when it comes to writing
  kernel code
<foocraft> Autotest seems promising

See autotest link given above.

<foocraft> but in any case, coming up with the testing framework that
  doesn't break when the OS itself breaks is hard, it seems
<foocraft> not sure if autotest isolates the mistakes in the os from
  finding their way in the validity of the tests themselves
<youpi> it could be interesting to have scripts that automatically start a
  sub-hurd to do the tests

subhurd.

<tschwinge> foocraft: To answer one of your earlier questions: you can do
  really low-level testing.  Like testing Mach's message passing.  A
  million times.  The questions is whether that makes sense.  And / or if
  it makes sense to do that as part of a unit testing framework.  Or rather
  do such things manually once you suspect an error somewhere.
<tschwinge> The reason for the latter may be that Mach IPC is already
  heavily tested during normal system operation.
<tschwinge> And yet, there still may be (there are, surely) bugs.
<tschwinge> But I guess that you have to stop at some (arbitrary?) level.
<foocraft> so we'd assume it works, and test from there
<tschwinge> Otherwise you'd be implementing the exact counter-part of what
  you're testing.
<tschwinge> Which may be what you want, or may be not.  Or it may just not
  be feasible.
<foocraft> maybe the testing framework should have dependencies
<foocraft> which we can automate using make, and phony targets that run
  tests
<foocraft> so everyone writes a test suite and says that it depends on A
  and B working correctly
<foocraft> then it'd go try to run the tests for A etc.
<tschwinge> Hmm, isn't that -- on a high level -- have you have by
  different packages?  For example, the perl testsuite depends (inherently)
  on glibc working properly.  A perl program's testsuite depends on perl
  working properly.
<foocraft> yeah, but afaik, the ordering is done by hand

freenode, #hurd channel, 2011-03-07:

<antrik> actually, I think for most tests it would be better not to use a
  subhurd... that leads precisely to the problem that if something is
  broken, you might have a hard time running the tests at all :-)
<antrik> foocraft: most of the Hurd code isn't really low-level. you can
  use normal debugging and testing methods
<antrik> gnumach of course *does* have some low-level stuff, so if you add
  unit tests to gnumach too, you might run into issues :-)
<antrik> tschwinge: I think testing IPC is a good thing. as I already said,
  automated testing is *not* to discover existing but unknown bugs, but to
  prevent new ones creeping in, and tracking progress on known bugs
<antrik> tschwinge: I think you are confusing unit testing and regression
  testing. http://www.bddebian.com/~hurd-web/open_issues/unit_testing/
  talks about unit testing, but a lot (most?) of it is actually about
  regression tests...

unit testing.

<tschwinge> antrik: That may certainly be -- I'm not at all an expert in
  this, and just generally though that some sort of automated testing is
  needed, and thus started collecting ideas.
<tschwinge> antrik: You're of course invited to fix that.

IRC, freenode, #hurd, 2011-03-08

(After discussing the anatomy of a hurd system.)

<antrik> so that's what your question is actually about?
<foocraft> so what I would imagine is a set of only-this-server tests for
  each server, and then we can have fun adding composite tests
<foocraft> thus making debugging the composite scenarios a bit less tricky
<antrik> indeed
<foocraft> and if you were trying to pass a composite test, it would also
  help knowing that you still didn't break the server-only test
<antrik> there are so many different things that can be tested... the
  summer will only suffice to dip into this really :-)
<foocraft> yeah, I'm designing my proposal to focus on 1) make/use a
  testing framework that fits the Hurd case very well 2) write some tests
  and docs on how to write good tests
<antrik> well, doesn't have to be *one* framework... unit testing and
  regression testing are quite different things, which can be covered by
  different frameworks