Profiling (Wikipedia article) is a tool for tracing where CPU time is spent. This is usually done for performance analysis reasons.

  • rpctrace

  • ?gprof

    Should be working, but some issues have been reported, regarding GCC spec files. Should be possible to fix (if not yet done) easily.

  • glibc's sotruss

  • ltrace

  • latrace

  • dtrace

    Have a look at this, integrate it into the main trees.

  • LTTng

  • SystemTap

  • ... or some other Linux thing.

IRC, freenode, #hurd, 2013-06-17

<congzhang> is that possible we develop rpc msg analyse tool? make it clear
  view system at different level?
<congzhang> hurd was dynamic system, how can we just read log line by line
<kilobug> congzhang: well, you can use rpctrace and then analyze the logs,
  but rpctrace is quite intrusive and will slow down things (like strace or
<kilobug> congzhang: I don't know if a low-overhead solution could be made
  or not
<congzhang> that's the problem
<congzhang> when real system run, the msg cross different server, and then
  the debug action should not intrusive the process itself
<congzhang> we observe the system and analyse os
<congzhang> when rms choose microkernel, it's expect to accelerate the
  progress, but not
<congzhang> microkernel make debug a litter hard
<kilobug> well, it's not limited to microkernels, debugging/tracing is
  intrusive and slow things down, it's an universal law of compsci
<kilobug> no, it makes debugging easier
<congzhang> I don't think so
<kilobug> you can gdb the various services (like ext2fs or pfinet) more
<kilobug> and rpctrace isn't any worse than strace
<congzhang> how easy when debug lpc
<kilobug> lpc ?
<congzhang> because cross context
<congzhang> classic function call
<congzhang> when find the bug source, I don't care performance, I wan't to
  know it's right or wrong by design, If it work as I expect 
<congzhang> I optimize it latter
<congzhang> I have an idea, but don't know weather it's usefull or not
<braunr> rpctrace is a lot less instrusive than ptrace based tools
<braunr> congzhang: debugging is not made hard by the design choice, but by
  implementation details
<braunr> as a simple counter example, someone often cited usb development
  on l3 being made a lot easier than on a monolithic kernel
<congzhang> Collect the trace information first, and then layout the msg by
  graph, when something wrong, I focus the trouble rpc, and found what
  happen around
<braunr> "by graph" ?
<congzhang> yes
<congzhang> braunr: directed graph or something similar
<braunr> and not caring about performance when debugging is actually stupid
<braunr> i've seen it on many occasions, people not being able to use
  debugging tools because they were far too inefficient and slow
<braunr> why a graph ?
<braunr> what you want is the complete trace, taking into account cross
  address space boundaries
<congzhang> yes
<braunr> well it's linear
<braunr> switching server
<congzhang> by independent process view it's linear
<congzhang> it's linear on cpu's view too
<congzhang> yes, I need complete trace, and dynamic control at microkernel
<congzhang> os, if server crash, and then I know what's other doing, from
  the graph
<congzhang> graph needn't to be one, if the are not connect together, time
  sort them
<congzhang> when hurd was complete ok, some tools may be help too
<braunr> i don't get what you want on that graph
<congzhang> sorry, I need a context
<congzhang> like uml sequence diagram, I need what happen one by one
<congzhang> from server's view and from the function's view
<braunr> that's still linear
<braunr> so please stop using the word graph
<braunr> you want a trace
<braunr> a simple call trace
<congzhang> yes, and a tool
<braunr> with some work gdb could do it
<congzhang> you mean under  some microkernel infrastructure help 
<congzhang> ?
<braunr> if needed
<congzhang> braunr: will that be easy?
<braunr> not too hard
<braunr> i've had this idea for a long time actually
<braunr> another reason i insist on migrating threads (or rather, binding
  server and client threads)
<congzhang> braunr: that's  great
<braunr> the current problem we have when using gdb is that we don't know
  which server thread is handling the request of which client
<braunr> we can guess it
<braunr> but it's not always obvious
<congzhang> I read the talk, know some of your idea
<congzhang> make things happen like classic kernel, just from function
<braunr> that's it
<congzhang> I think you and other do a lot of work to improve the mach and
  hurd, buT we lack the design document and the diagram, one diagram was
  great than one thousand words
<braunr> diagrams are made after the prototypes that prove they're doable
<braunr> i'm not a researcher
<braunr> and we have little time
<braunr> the prototype is the true spec
<congzhang> that's why i wan't cllector the trace info and show, you can
  know what happen and how happen, maybe just suitable for newbie, hope
  more young hack like it
<braunr> once it's done, everything else is just sugar candy around it

IRC, freenode, #hurd, 2014-01-05

<teythoon> braunr: do you speak ocaml ?
<teythoon> i had this awesome idea for a universal profiling framework for
<teythoon> universal as in not os dependent, so it can be easily used on
  hurd or in gnu mach
<teythoon> it does a source transformation, instrumenting what you are
  interested in
<teythoon> for this transformation, coccinelle is used
<teythoon> i have a prototype to measure how often a field in a struct is
<teythoon> unfortunately, coccinelle hangs while processing kern/slab.c :/
<youpi> teythoon:  I do speak ocaml
<teythoon> awesome :)
<teythoon> unfortunately, i do not :/
<teythoon> i should probably get in touch with the coccinelle devs, most
  likely the problem is that coccinelle runs in circles somewhere
<youpi> it's not so complex actually
<youpi> possibly,  yes
<teythoon> do you know coccinelle ?
<youpi> the only really peculiar thing in ocaml is lambda calculus
<youpi> +c
<youpi> I know a bit, although I've never really written an semantic patch
<teythoon> i'm okay with that
<youpi> but I can understand them
<youpi> then ocaml should be fine for you :)
<youpi> just ask the few bits that you don't understand :)
<teythoon> yeah, i haven't really made an effort yet
<youpi> writing ocaml is a bit more difficult because you need to
  understand the syntax, but for putting printfs it should be easy enough
<youpi> if you get a backtrace with ocamldebug (it basically works like
  gdb), I can probably explain you what might be happening

IRC, freenode, #hurd, 2014-01-06

<teythoon> braunr: i'm not doing microoptimizations, i'm developing a
  profiler :p
<braunr> teythoon: nice :)
<teythoon> i thought you might like it
<braunr> teythoon: you may want to look at
<braunr> from the same people who brought radixvm
<teythoon> which data structure should i test it with next ?
<braunr> uh, no idea :)
<braunr> the ipc ones i suppose
<teythoon> yeah, or the task related ones
<braunr> but be careful, there many "inline" versions of many ipc functions
  in the fast paths
<braunr> and when they say inline, they really mean they copied it
<braunr> +are
<teythoon> but i have a microbenchmark for ipc performance
<braunr> you sure have been busy ;p
<braunr> it's funny you're working on a profiler at the same time a
  collegue of mine said he was interested in writing one in x15 :)
<teythoon> i don't think inlining is a problem for my tool
<teythoon> well, you can use my tool for x15
<braunr> i told him he could look at what you did
<braunr> so i expect he'll ask soon
<teythoon> cool :)
<teythoon> my tool uses coccinelle to instrument c code, so this works in
  any environment
<teythoon> one just needs a little glue and a method to get the data
<braunr> seems reasonable
<teythoon> for gnumach, i just stuff a tiny bit of code into the kdb

<teythoon> hm debians bigmem patch with my code transformation makes
  gnumach hang early on
<teythoon> i don't even get a single message from gnumach
<braunr> ouch
<teythoon> or it is somethign else entirely
<teythoon> it didn't even work without my patches o_O
<teythoon> weird
<teythoon> uh oh, the kmem_cache array is not properly aligned
<teythoon> braunr:
<braunr> teythoon: do you mean, with your patch ?
<braunr> i'm not sure i understand
<braunr> are you saying gnumach doesn't start because of an alignment issue
<teythoon> no, that's unrelated
<teythoon> i skipped the bigmem patch, have a running gnumach with
<braunr> hum, what is that aliased column ?
<teythoon> but, despite my efforts with __attribute__((align(64))), i see
  lot's of accesses to kmem_cache objects which are not properly aligned
<braunr> is that reported by the performance counters ?
<teythoon> no
<braunr> aer those the previous lines accessed by other unrelated code ?
<braunr> previous bytes in the same line*
<teythoon> this is a patch generated to instrument the code
<teythoon> so i instrument field access of the form i->a
<teythoon> but if one does &i->a, my approach will no longer keep track of
  any access through that pointer
<teythoon> so i do not count that as an access but as creating an alias for
  that field
<braunr> ok
<teythoon> so if that aliased count is not zero, the tool might
  underestimate the access count
<teythoon> hm
<teythoon> static struct kmem_cache kalloc_caches[KALLOC_NR_CACHES]
<teythoon> but
<teythoon> nm gnumach|grep kalloc_caches
<teythoon> c0226e20 b kalloc_caches
<teythoon> ah, that's fine
<braunr> yes
<teythoon> nevr mind
<braunr> don't we have a macro for the cache line size ?
<teythoon> ah, there are a great many more kmem_caches around and noone
  told me ...
<braunr> teythoon: eh :)
<braunr> aren't you familiar with type-specific caches ?
<teythoon> no, i'm not familiar with anything in gnumach-land
<braunr> well, it's the regular slab allocator, carrying the same ideas
  since 1994
<braunr> it's pretty much the same in linux and other modern unices
<teythoon> ok
<braunr> the main difference is likely that we allocate our caches
  statically because we have no kernel modules and know we'll never destroy
  them, only reap them
<teythoon> is there a macro for the cache line size ?
<teythoon> there is one burried in the linux source
<teythoon> L1_CACHE_BYTES from linux/src/include/asm-i386/cache.h
<braunr> there is one in kern/slab.h
<teythoon> but it is out of date
<teythoon> there is ?
<braunr> but it's commented out
<braunr> only used when SLAB_USE_CPU_POOLS is defined
<braunr> but the build system should give you CPU_L1_SHIFT
<teythoon> hm
<braunr> and we probably should define CPU_L1_SIZE from that
  unconditionnally in config.h or a general param.h file if there is one
<braunr> the architecture-specific one perhaps
<braunr> although it's exported to userland so maybe not

IRC, freenode, #hurd, 2014-01-07

<teythoon> braunr: linux defines ____cacheline_aligned :
<teythoon> where would i put a similar definition in gnumach ?
<taylanub> .oO( four underscores ?!? )
<teythoon> heh
<teythoon> yes, four
<braunr> teythoon: yes :)

<teythoon> are kmem_cache objects ever allocated dynamically in gnumach ?
<braunr> no
<teythoon> hm
<braunr> i figured that, since there are no kernel modules, there is no
  need to allocate them dynamically, since they're never destroyed
<teythoon> so i aligned all statically declarations with
  __attribute__((align(1 << CPU_L1_SHIFT)))
<teythoon> but i still see 77% of all accesses being to objects that are
  not properly aligned o_O
<teythoon> ah
<teythoon> >,<
<braunr> you could add an assertion in kmem_cache_init to find out what's
<teythoon> *aligned
<braunr> eh :)
<braunr> right
<teythoon> grr
<teythoon> sweet, the kmem_caches are now all properly aligned :)
<braunr> :)

<braunr> hm
<braunr> i guess i should change what vmstat reports as "cache" from the
  cached objects to the external ones (which map files and not anonymous
<teythoon> braunr:
<teythoon> turned out that struct kmem_cache was actually an easy target
<teythoon> no bitfields, no embedded structs that were addressed as such
  (and not aliased)
<braunr> :)

IRC, freenode, #hurd, 2014-01-09

<teythoon> braunr: i didn't quite get what you and youpi were talking about
  wrt to the alignment attribute
<teythoon> define a type for struct kmem_cache with the alignment attribute
  ? is that possible ?
<teythoon> ah, like it's done for kmem_cpu_pool
<braunr> teythoon: that's it :)
<braunr> note that aligning a struct doesn't change what sizeof returns
<teythoon> heh, that save's one a whole lot of trouble indeed
<braunr> you have to align a member inside for that
<teythoon> why would it change the size ?
<braunr> imagine an array of such structs
<teythoon> ah
<teythoon> right
<teythoon> but it fits into two cachelines exactly
<braunr> that wouldn't be a problem with an array either
<teythoon> so an array of those will still be aligned element-wise
<teythoon> yes
<braunr> and it's often used like that, just as i did for the cpu pools
<braunr> but then one is tempted to think the size of each element has
  changed too
<braunr> and then use that technique for, say, reserving a whole cache line
  for one variable
<teythoon> ah, now i get that remark ;)
<braunr> :)

<teythoon> braunr: i annotated struct kmem_cache in slab.h with
  __cacheline_aligned and it did not have the desired effect
<braunr> can you show the diff please ?
<braunr> i don't know why :/
<teythoon> that's how it's done for kmem_cpu_pool
<braunr> i'll try it here
<teythoon> wait
<teythoon> i made a typo
<teythoon> >,<
<teythoon> __cachline_aligned
<teythoon> bad one
<braunr> uh :)
<braunr> i don't see it
<braunr> ah yes
<braunr> missing e
<teythoon> yep, works like a charme :)
<teythoon> nice, good to know :)
<braunr> :)
<teythoon> given the previous discussion, shall i send it to the list or
  commit it right away ?
<braunr> i'd say go ahead and commit