IRC, freenode, #hurd, 2011-10-16:

<youpi> braunr: I realize that kmem_alloc_wired maps the allocated pages in
  the kernel map
<youpi> it's a bit of a waste when my allocation is exactly a page size
<youpi> is there a proper page allocation which would simply return its
  physical address?
<youpi> pages returned by vm_page_grab  may get swapped out, right?
<youpi> so could it be a matter of calling vm_page_alloc then vm_page_wire
  (with lock_queues held) ?
<youpi> s/alloc/grab/
<braunr> vm_page_grab() is only used at boot iirc
<braunr> youpi: mapping allocated memory in the kernel map is normal, even
  if it's only a page
<braunr> the allocated area usually gets merged with an existing
<braunr> youpi: also, i'm not sure about what you're trying to do here, so
  my answers may be out of scope :p
<youpi> saving addressing space
<youpi> with that scheme we're using twice as much addressing space for
  kernel buffers
<braunr> kernel or user task ?
<youpi> kernl
<braunr> hm are there so many wired areas ?
<youpi> several MiBs, yes
<youpi> there are also the zalloc areas
<braunr> that's pretty normal
<youpi> which I've recently incrased
<braunr> hm forget what i've just said about vm_page_grab()
<braunr> youpi: is there a particular problem caused by kernel memory
  exhaustion ?
<youpi> I currently can't pass the dh_strip stage of iceweasel due to this
<youpi> it can not allocate a stack 
<braunr> a kernel thread stack ?
<youpi> yes
<braunr> that's surprising
<youpi> it'd be good to have a kernel memory profile
<braunr> vminfo is able to return the kernel map
<youpi> well, it's not suprising if the kernel map is full
<youpi> but it doesn't tell what allocates which p ieces
<braunr> that's what i mean, it's surprising to have a full kernel map
<youpi> (that's what profiling is about)
<braunr> right
<youpi> well, it's not really surprising, considering that the krenel map
  size is arbitrarily chosen
<braunr> youpi: 2 GiB is really high enough
<youpi> it's not 2GiB, precisely
<youpi> there is much of the 2GiB addr space which is spent on physical
  memory mapping
<youpi> then there is the virtual mapping
<braunr> are you sure we're talking about the kernel map, or e.g. the kmem
<youpi> which is currently only 192MiB
<youpi> the kmem_map part of kernel_map
<braunr> ok, the kmem_map submap then
<braunr> netbsd has used 128 MiB for yeas with almost no issue
<braunr> mach uses more kernel objects so it's reasonable to have a bigger
<braunr> but that big ..
<youpi> I've made the zalloc areas quite big
<youpi> err, not only zalloc area
<braunr> kernel stacks are allocated directly from the kernel map
<youpi> kalloc to 64MiB, zalloc to 64MiB
<youpi> ipc map size to 8MiB
<braunr> youpi: it could be the lack of kernel map entries
<youpi> and the device io map to 16MiB
<braunr> do you have the exact error message ?
<youpi> no more room for vm_map_find_entry in 71403294
<youpi> no more rooom for kmem_alloc_aligned in 71403294
<braunr> ah, aligned
<youpi> for a stack
<youpi> which is 4 pages only
<braunr> memory returned by kmem functions always return pages
<braunr> hum
<braunr> kmem functions always return memory in page units
<youpi> and my xen driver is allocating 1MiB memory for the network buffer
<braunr> 4 pages for kernel stacks ?
<youpi> through kmem_alloc_wire
<braunr> that seems a lot
<youpi> that's needed for xen page updates
<youpi> without having to split the update in several parts
<braunr> ok
<braunr> but are there any alignment requirements ?
<youpi> I guess mach  uses the alignment trick to find "self"
<youpi> anyway, an alignment on 4pages shouldn't be a problem
<braunr> i think kmem_alloc_aligned() is the generic function used both for
  requests with and without alignment constraints
<youpi> so I was thinking about at least moving my xen net driver to
  vm_grab_page instead of kmem_alloc
<youpi> and along this, maybe others
<braunr> but you only get a vm_page, you can't access the memory it
<youpi> non, a lloc_aligned always aligns
<youpi> why?
<braunr> because it's not mapped
<youpi> there's even vm_grab_page_physical_addr
<youpi> it is, in the physical memory map
<braunr> ah, you mean using the direct mapped area
<youpi> yes
<braunr> then yes
<braunr> i don't know that part much
<youpi> what I'm afraid of is the wiring
<braunr> why ?
<youpi> because I don't want to see my page swapped out :)
<youpi> or whatever might happen if I don't wire it
<braunr> oh i'm almost sure you won't
<youpi> why?
<youpi> why some people need to wire it, and I won't?
<braunr> because in most mach vm derived code i've seen, you have to
  explicitely tell the vm your area is pageable
<youpi> ah,  mach does such thing indeed
<braunr> wiring can be annoying when you're passing kernel data to external
<braunr> you have to make sure the memory isn't wired once passed
<braunr> but that's rather a security/resource management problem
<youpi> in the net driver case, it's not passed to anything else
<youpi> I'm seeing 19MiB kmem_alloc_wired atm
<braunr> looks ok to me
<braunr> be aware that the vm_resident code was meant for single page
<youpi> what does this mean?
<braunr> there is no buddy algorithm or anything else decent enough wrt
<braunr> vm_page_grab_contiguous_pages() can be quite slow
<youpi> err, ok, but what is the relation with the question at stake ?
<braunr> you need 4 pages of direct mapped memory for stacks
<braunr> those pages need to be physically contiguous if you want to avoid
  the mapping
<braunr> allocating physically contiguous pages in mach is slow
<braunr> :)
<youpi> I didn't mean I wanted to avoid the mapping for stacks
<youpi> for anything more than a page, kmem mapping should be fine
<youpi> I'm concerned with code which allocates only page per page
<youpi> which thus really doesn't need any mapping
<braunr> i don't know the mach details but in derived implementations,
  there is almost no overhead when allocating single pages
<braunr> except for the tlb programming
<youpi> well, there is: twice as much addressing space
<braunr> well
<braunr> netbsd doesn't directly map physical memory
<braunr> and for others, like freebsd
<braunr> the area is just one vm_map_entry
<braunr> and on modern mmus, 4 MiBs physical mappings are used in pmap
<youpi> again, I don't care about tlb & performance
<youpi> just about the addressing space
<braunr> hm
<braunr> you say "twice"
<youpi> which is short when you're trying to link crazy stuff like
  iceweasel & co
<youpi> yes
<braunr> ok, the virtual space is doubled
<youpi> yes
<braunr> but the resources consume to describe them aren't
<braunr> even on mach
<youpi> since you have both the physical mapping and the kmem mapping
<youpi> I don't care much about the resources
<youpi> but about addressing space
<braunr> well there are a limited numbers of solutions
<youpi> the space it takes  and has to be taken from something else, that
  is,  here physical memory available to Mach
<braunr> reduce the physical mapping
<braunr> increase the kmem mapping
<braunr> or reduce kernel memory consumption
<youpi> and instead of taking the space from physical  mapping, we can as
  well avoid doubling the space consumption when it's trivial lto
<youpi> yes, the hird
<youpi> +t
<youpi> that's what I'm asking from the beginning :)
<braunr> 18:21 < youpi> I don't care much about the resources
<braunr> actually, you care :)
<youpi> yes and no
<braunr> i understand what you mean
<youpi> not in the sense "it takes a page_t to allocate a page"
<braunr> you want more virtual space, and aren't that much concerned about
  the number of objects used
<youpi> yes
<braunr> then it makes sense
<braunr> but in this case, it could be a good idea to generalize it
<braunr> have our own kmalloc/vmalloc interfaces
<braunr> maybe a gsoc project :)
<youpi> err, don't we have them already?
<youpi> I mean, what exactly do you want to generalize?
<braunr> mach only ever had vmalloc
<youpi> we already have a hell lot of allocators :)
<youpi> and it's a pain to distribute the available space to them
<braunr> yes
<braunr> but what you basically want is to introduce something similar to
  kmalloc for single pages
<youpi> or just patch the few cases that need it into just grabbing a page
<youpi> there are probably not so many
<braunr> ok
<braunr> i've just read vm_page_grab()
<braunr> it only removes a page from the free list
<braunr> other functions such as vm_page_alloc insert them afterwards
<braunr> if a page is in no list, it can't be paged out
<braunr> so i think it's probably safe to assume it's naturally wired
<braunr> you don't even need a call to vm_page_wire or a flag of some sort
<youpi> ok
<braunr> although considering the low amount of work done by
  vm_page_wire(), you could, for clarity
<youpi> braunr: I was also wondering about the VM_PAGE_FREE_RESERVED & such
<youpi> they're like 50 pages
<youpi> is this still reasonable  nowadays?
<braunr> that's a question i'm still asking myself quite often :)
<youpi> also, the BURST_{MAX,MIN} & such in vm_pageout.c are probably out
  of date?
<braunr> i didn't study the pageout code much
<youpi> k
<braunr> but correct handling of low memory thresholds is a good point to
  keep in mind
<braunr> youpi: i often wonder how linux can sometimes have so few free
  pages left and still be able to work without any visible failure
<youpi> well, as long as you have enough pages to be able to make progress,
  you're fine
<youpi> that' the point of the RESERVED pages in mach I guess
<braunr> youpi: yes but, obviously, hard values are *bad*
<braunr> linux must adjust it, depending on the number of processors, the
  number of pdflush threads, probably other things i don't have in mind
<braunr> i don't know what should make us adjust that value in mach
<youpi> which value?
<braunr> the size of the reserved pool
<youpi> I don't think it's adjusted
<braunr> that's what i'm saying
<braunr> i guess there is an #ifndef line for arch specific definitions
<youpi> err, you just said linux must adjust it :
<youpi> ))
<youpi> there is none
<braunr> linux adjusts it dynamically
<braunr> well ok
<braunr> that's another way to say it
<braunr> we don't have code to get rid of this macro
<braunr> but i don't even know how we, as maintainers, are supposed to
  guess it