Performance analysis (Wikipedia article) deals with analyzing how computing resources are used for completing a specified task.

Profiling is one relevant tool.

In microkernel-based systems, there is generally a considerable RPC overhead.

In a multi-server system, it is non-trivial to implement a high-performance I/O System.

When providing POSIX compatibility (and similar interfaces) in an environemnt that doesn't natively implement these interfaces, there may be a severe performance degradation. For example, in this fork system call's case.

Unit testing can be used for tracking performance regressions.

IRC, freenode, #hurd, 2012-07-05

<braunr> the more i study the code, the more i think a lot of time is
  wasted on cpu, unlike the common belief of the lack of performance being
  only due to I/O

IRC, freenode, #hurd, 2012-07-23

<braunr> there are several kinds of scalability issues
<braunr> iirc, i found some big locks in core libraries like libpager and
<braunr> but anyway we can live with those
<braunr> in the case i observed, ext2fs, relying on libdiskfs and libpager,
  scans the entire file list to ask for writebacks, as it can't know if the
  pages are dirty or not
<braunr> the mistake here is moving part of the pageout policy out of the
<braunr> so it would require the kernel to handle periodic synces of the
  page cache
<antrik> braunr: as for big locks: considering that we don't have any SMP
  so far, does it really matter?...
<braunr> antrik: yes
<braunr> we have multithreading
<braunr> there is no reason to block many threads while if most of them
  could continue
<braunr> -while
<antrik> so that's more about latency than throughput?
<braunr> considering sleeping/waking is expensive, it's also about
<braunr> currently, everything that deals with sleepable locks (both
  gnumach and the hurd) just wake every thread waiting for an event when
  the event occurs (there are a few exceptions, but not many)
<antrik> ouch


IRC, freenode, #hurd, 2012-12-04

<damo22> why do some people think hurd is slow? i find it works well even
  under heavy load inside a virtual machine
<braunr> damo22: the virtual machine actually assists the hurd a lot :p
<braunr> but even with that, the hurd is a slow system
<damo22> i would have thought it would have the potential to be very fast,
  considering the model of the kernel
<braunr> the design implies by definition more overhead, but the true cause
  is more than 15 years without optimization on the core components
<braunr> how so ?
<damo22> since there are less layers of code between the hardware bare
  metal and the application that users run
<braunr> how so ? :)
<braunr> it's the contrary actually
<damo22> VFS -> IPC -> scheduler -> device drivers -> hardware
<damo22> that is monolithic
<braunr> well, it's not really meaningful
<braunr> and i'd say the same applies for a microkernel system
<damo22> if the application can talk directly to hardware through the
  kernel its almost like plugging directly into the hardware
<braunr> you never talk directly to hardware
<braunr> you talk to servers instead of the kernel
<damo22> ah
<braunr> consider monolithic kernel systems like systems with one big
<braunr> the kernel
<braunr> whereas a multiserver system is a kernel and many servers
<braunr> you still need the VFS to identify your service (and thus your
<braunr> you need much more IPC, since system calls are "replaced" with RPC
<braunr> the scheduler is basically the same
<damo22> okay
<braunr> device drivers are similar too, except they run in thread context
  (which is usually a bit heavier)
<damo22> but you can do cool things like report when an interrupt line is
<braunr> and there are many context switches between all that
<braunr> you can do all that in a monolithic kernel too, and faster
<braunr> but it's far more elegant, and (when well done) easy to do on a
  microkernel based system
<damo22> yes
<damo22> i like elegant, makes coding easier if you know the basics
<braunr> there are only two major differences between a monolilthic kernel
  and a multiserver microkernel system
* damo22 listens
<braunr> 1/ independence of location (your resources could be anywhere)
<braunr> 2/ separation of address spaces (your servers have their own
<damo22> wow
<braunr> these both imply additional layers of indirection, making the
  system as a whole slower
<damo22> but it would be far more secure though i suspect
<braunr> yes
<braunr> and reliable
<braunr> that's why systems like qnx were usually adopted for critical
<damo22> security and reliability are very important, i would switch to the
  hurd if it supported all the hardware i use 
<braunr> so would i :)
<braunr> but performance matters too
<damo22> not to me
<braunr> it should :p
<braunr> it really does matter a lot in practice
<damo22> i mean, a 2x slowdown compared to linux would not affect me
<damo22> if it had all the benefits we mentioned above
<braunr> but the hurd is really slow for other reasons than its additional
  layers of indrection unfortunately
<damo22> is it because of lack of optimisation in the core code?
<braunr> we're working on these issues, but it's not easy and takes a lot
  of time :p
<damo22> like you said
<braunr> yes
<braunr> and also because of some fundamental design choices related to the
  microkernel back in the 80s
<damo22> what about the darwin system
<damo22> it uses a mach kernel?
<braunr> yes
<damo22> what is stopping someone taking the MIT code from darwin and
  creating a monster free OS
<braunr> what for ?
<damo22> because it already has hardware support
<damo22> and a mach kernel
<braunr> in kernel drivers ?
<damo22> it has kernel extensions
<damo22> you can do things like kextload module
<braunr> first, being a mach kernel doesn't make it compatible or even
  easily usable with the hurd, the interfaces have evolved independantly
<braunr> and second, we really do want more stuff out of the kernel
<braunr> drivers in particular
<damo22> may i ask why you are very keen to have drivers out of kernel?
<braunr> for the same reason we want other system services out of the
<braunr> security, reliability, etc..
<braunr> ease of debugging
<braunr> the ability to restart drivers separately, without restarting the
<damo22> i see

IRC, freenode, #hurd, 2012-09-13

Test Driving GNU Hurd, With Benchmarks Against Linux, Phoronix, Michael Larabel, 2011-07-18.

<braunr> the phoronix benchmarks don't actually test the operating system
<hroi_> braunr: well, at least it tests its ability to run programs for
  those particular tasks
<braunr> exactly, it tests how programs that don't make much use of the
  operating system run
<braunr> well yes, we can run programs :)
<pinotree> those are just cpu-taking tasks
<hroi_> ok
<pinotree> if you do a benchmark with also i/o, you can see how it is
  (quite) slower on hurd
<hroi_> perhaps they should have run 10 of those programs in parallel, that
  would test the kernel multitasking I suppose
<braunr> not even I/O, simply system calls
<braunr> no, multitasking is ok on the hurd
<braunr> and it's very similar to what is done on other systems, which
  hasn't changed much for a long time
<braunr> (except for multiprocessor)
<braunr> true OS benchmarks measure system calls
<hroi_> ok, so Im sensing the view that the actual OS kernel architecture
  dont really make that much difference, good software does
<braunr> not at all
<braunr> i'm only saying that the phoronix benchmark results are useless
<braunr> because they didn't measure the right thing
<hroi_> ok

Optimizing Data Structure Layout

IRC, freenode, #hurd, 2014-01-02

<braunr> teythoon_: wow, digging into the vm code :)
<teythoon_> i discovered pahole and gnumach was a tempting target :)
<braunr> never heard of pahole :/
<teythoon_> it's nice
<teythoon_> braunr: try pahole -C kmem_cache /boot/gnumach
<teythoon_> on linux that is. ...
<braunr> ok
<teythoon_> braunr:
<braunr> very nice

IRC, freenode, #hurd, 2014-01-03

<braunr> teythoon: pahole is a very handy tool :)
<teythoon> yes
<teythoon> i especially like how general it is




IRC, freenode, #hurd, 2014-02-27

<braunr> tschwinge: about your concern with regard to performance
  measurements, you could run kvm with hugetlbfs and cpuset
<braunr> on a machine that provides nested page tables, this makes the
  virtualization overhead as small as it could be considering the
<braunr> hugetlbs reduces the overhead of page faults, and also implies
  locked memory while cpuset isolates the vm from global scheduling
<braunr> hugetlbfs*
<tschwinge> Thanks, will look into that.