DragonFlyBSD/src df49ec1sys/config LINT64, sys/platform/pc64/x86_64 pmap.c

kernel - VM rework part 18 - Cleanup

* Significantly reduce the zone limit for pvzone (for pmap
  pv_entry structures).  pv_entry's are no longer allocated
  on a per-page basis so the limit can be made much smaller.

  This also has the effect of reducing the per-cpu cache limit
  which ultimately stabilizes wired memory use for the zone.

* Also reduce the generic pre-cpu cache limit for zones.
  This only really effects the pvzone.

* Make pvzone, mapentzone, and swap_zone __read_mostly.

* Enhance vmstat -z, report current structural use and actual
  total memory use.

* Also cleanup the copyright statement for vm/vm_zone.c.  John Dyson's
  original copyright was slightly different than the BSD copyright and
  stipulated no changes, so separate out the DragonFly addendum.

DragonFlyBSD/src 2c68437sys/net netmap_user.h

<net/netmap_user.h>: s/<malloc.h>/<stdlib.h>/.

It is not used in base and in fact the netmap we have in the tree is
not hooked in, but it seems at least one port stumbles over this.

Reported-by: zrj

DragonFlyBSD/src 0600465sys/platform/pc64/include pmap.h, sys/platform/pc64/x86_64 pmap.c

kernel - VM rework part 17 - Cleanup

* Adjust kmapinfo and vmpageinfo in /usr/src/test/debug.
  Enhance the code to display more useful information.

* Get pmap_page_stats_*() working again.

* Change systat -vm's 'VM' reporting.  Replace VM-rss with PMAP and
  VMRSS.  Relabel VM-swp to SWAP and SWTOT.

  PMAP  - Amount of real memory faulted into user pmaps.

  VMRSS - Sum of all process RSS's in thet system.  This is
          the 'virtual' memory faulted into user pmaps and
          includes shared pages.

  SWAP  - Amount of swap space currently in use.

  SWTOT - Total amount of swap installed.

* Redocument vm_page.h.

* Remove dead code from pmap.c (some left over cruft from the
  days when pv_entry's were used for PTEs).

DragonFlyBSD/src 831a850sys/platform/pc64/x86_64 pmap.c, sys/vm vm_page.c vm_page.h

kernel - VM rework part 15 - Core pmap work, refactor PG_*

* Augment PG_FICTITIOUS.  This takes over some of PG_UNMANAGED's previous
  capabilities.  In addition, the pmap_*() API will work with fictitious
  pages, making mmap() operation (aka of the GPU) more consistent.

* Add PG_UNQUEUED.  This prevents a vm_page from being manipulated in
  the vm_page_queues[] in any way.  This takes over another feature
  of the old PG_UNMANAGED flag.


* Remove PG_DEVICE_IDX.  This is no longer relevant.  We use PG_FICTITIOUS
  for all device pages.

* Refactor vm_contig_pg_alloc(), vm_contig_pg_free(),
  vm_page_alloc_contig(), and vm_page_free_contig().

  These functions now set PG_FICTITIOUS | PG_UNQUEUED on the returned
  pages, and properly clear the bits upon free or if/when a regular
  (but special contig-managed) page is handed over to the normal paging

  This is combined with making the pmap*() functions work better with
  PG_FICTITIOUS is the primary 'fix' for some of DRMs hacks.

DragonFlyBSD/src 78831f7sys/sys cdefs.h, sys/vm vm_pageout.c vm_page.c

kernel - VM rework part 16 - Optimization & cleanup pass

* Adjust __exclusive_cache_line to use 128-byte alignment as
  per suggestion by mjg.  Use this for the global vmstats.

* Add the vmmeter_neg_slop_cnt global, which is a more generous
  dynamic calculation verses -VMMETER_SLOP_COUNT.  The idea is to
  return how often vm_page_alloc() synchronizes its per-cpu statistics
  with the global vmstats.

DragonFlyBSD/src ae442b2sys/platform/pc64/x86_64 pmap.c, sys/platform/vkernel64/include pmap.h

kernel - VM rework part 10 - Precursor work for terminal pv_entry removal

* Effectively remove pmap_track_modified().  Turn it into an assertion.
  The normal pmap code should NEVER EVER be called with any range inside
  the clean map.

  This assertion, and the routine in its entirety, will be removed in a
  later commit.

* The purpose of the original code was to prevent buffer cache kvm mappings
  from being misinterpreted as contributing to the underlying vm_page's
  modified state.  Normal paging operation synchronizes the modified bit and
  then transfers responsibility to the buffer cache.  We didn't want
  manipulation of the buffer cache to further affect the modified bit for
  the page.

  In modern times, the buffer cache does NOT use a kernel_object based
  mapping for anything and there should be no chance of any kernel related
  pmap_enter() (entering a managed page into the kernel_pmap) from messing
  with the space.

DragonFlyBSD/src 530e94fsys/platform/pc64/x86_64 pmap.c, sys/platform/vkernel64/platform pmap.c

kernel - VM rework part 9 - Precursor work for terminal pv_entry removal

* Cleanup the API a bit

* Get rid of pmap_enter_quick()

* Remove unused procedures.

* Document that vm_page_protect() (and thus the related
  pmap_page_protect()) must be called with a hard-busied page.  This
  ensures that the operation does not race a new pmap_enter() of the page.

DragonFlyBSD/src f16f912sys/dev/drm drm_vm.c, sys/dev/drm/ttm ttm_page_alloc.c

kernel - VM rework part 14 - Core pmap work, stabilize for X/drm

* Don't gratuitously change the vm_page flags in the drm code.

  The vm_phys_fictitious_reg_range() code in drm_vm.c was clearing
  PG_UNMANAGED.  It was only luck that this worked before, but
  because these are faked pages, PG_UNMANAGED must be set or the
  system will implode trying to convert the physical address back
  to a vm_page in certain routines.

  The ttm code was setting PG_FICTITIOUS in order to prevent the
  page from getting into the active or inactive queues (they had
  a conditional test for PG_FICTITIOUS).  But ttm never cleared
  the bit before freeing the page.  Remove the hack and instead
  fix it in vm_page.c

* in vm_object_terminate(), allow the case where there are still
  wired pages in a OBJT_MGTDEVICE object that has wound up on a
  queue (don't complain about it).  This situation arises because the
  ttm code uses the contig malloc API which returns wired pages.

  NOTE: vm_page_activate()/vm_page_deactivate() are allowed to mess
        with wired pages.  Wired pages are not anything 'special' to
        the queues, which allows us to avoid messing with the queues
        when pages are assigned to the buffer cache.

DragonFlyBSD/src 567a639sys/platform/pc64/include pmap.h, sys/platform/pc64/x86_64 pmap.c

kernel - VM rework part 11 - Core pmap work to remove terminal PVs

* Remove pv_entry_t belonging to terminal PTEs.  The pv_entry's for
  PT, PD, PDP, and PML4 remain.  This reduces kernel memory use for
  pv_entry's by 99%.

  The pmap code now iterates vm_object->backing_list (of vm_map_backing
  structures) to run-down pages for various operations.

* Remove vm_page->pv_list.  This was one of the biggest sources of
  contention for shared faults.  However, in this first attempt I
  am leaving all sorts of ref-counting intact so the contention has
  not been entirely removed yet.

* Current hacks:

  - Dynamic page table page removal currently disabled because the
    vm_map_backing scan needs to be able to deterministically
    run-down PTE pointers.  Removal only occurs at program exit.

  - PG_DEVICE_IDX probably isn't being handled properly yet.

  - Shared page faults not yet optimized.

* So far minor improvements in performance across the board.

    [4 lines not shown]

DragonFlyBSD/src e32fb2asys/vm vm_page.c vm_fault.c, test/debug vmpageinfo.c

kernel - VM rework part 13 - Core pmap work, stabilize & optimize

* Refactor the vm_page_hash hash again to get a better distribution.

* I tried to only hash shared objects but this resulted in a number of
  edge cases where program re-use could miss the optimization.

* Add a sysctl vm.page_hash_vnode_only (default off).  If turned on,
  only vm_page's associated with vnodes will be hashed.  This should
  generally not be necessary.

* Refactor vm_page_list_find2() again to avoid all duplicate queue
  checks.  This time I mocked the algorithm up in userland and twisted
  it until it did what I wanted.

* VM_FAULT_QUICK_DEBUG was accidently left on, turn it off.

* Do not remove the original page from the pmap when vm_fault_object()
  must do a COW.  And just in case this is ever added back in later,
  don't do it using pmap_remove_specific() !!!  Use pmap_remove_pages()
  to avoid the backing scan lock.

  vm_fault_page() will now do this removal (for procfs rwmem), the normal
  vm_fault will of course replace the page anyway, and the umtx code
  uses different recovery mechanisms now and should be ok.

    [4 lines not shown]

DragonFlyBSD/src e3c330fsys/platform/pc64/x86_64 pmap.c pmap_inval.c, sys/platform/vkernel64/platform pmap.c

kernel - VM rework part 12 - Core pmap work, stabilize & optimize

* Add tracking for the number of PTEs mapped writeable in md_page.
  Change how PG_WRITEABLE and PG_MAPPED is cleared in the vm_page
  to avoid clear/set races.  This problem occurs because we would
  have otherwise tried to clear the bits without hard-busying the
  page. This allows the bits to be set with only an atomic op.

  Procedures which test these bits universally do so while holding
  the page hard-busied, and now call pmap_mapped_sfync() prior to
  properly synchronize the bits.

* Fix bugs related to various counterse.  pm_stats.resident_count,
  wiring counts, vm_page->md.writeable_count, and

* Fix bugs related to synchronizing removed pte's with the vm_page.
  Fix one case where we were improperly updating (m)'s state based
  on a lost race against a pte swap-to-0 (pulling the pte).

* Fix a bug related to the page soft-busying code when the
  m->object/m->pindex race is lost.

* Implement a heuristical version of vm_page_active() which just
  updates act_count unlocked if the page is already in the

    [92 lines not shown]

DragonFlyBSD/src 0984e94include assert.h

<assert.h>: Sync comments a bit with FreeBSD.
+12-11 files

DragonFlyBSD/src 8ef931ainclude assert.h

<assert.h>: add missing __dead2 to __assert().

__assert() is called when an assertion fails. After printing an error
message, it will call abort(). abort() never returns, hence it has the
__dead2 attribute. Also add this attribute to __assert().

Taken-from: FreeBSD (r217207)
Submitted-by: Jan Beich
+1-11 files

DragonFlyBSD/src 097ba28lib/libpam/modules/pam_ftpusers pam_ftpusers.8

pam_ftpusers.8: Remove reference to ftpusers.5.

DragonFlyBSD/src a5db5bfsbin/reboot boot_pc64.8, sys/boot/common loader.8

sys/boot: Clean up btxld's manual page.

It is a host tool only and not installed to base.

DragonFlyBSD/src 7961249share/man/man4 ifmib.4 miibus.4, share/man/man5 rc.conf.5

i386 removal, part 72/x: Remove i386 specific ed.4 manpage references.

This was missing from 09ab7e4ea7d3a5476ab60148ed6fa1b8a0e61b0c.

DragonFlyBSD/src 4a69e56share/misc bsd-family-tree

bsd-family-tree: Sync with FreeBSD (add OpenBSD 6.5).

DragonFlyBSD/src 96c5ef2sys/dev/drm drm_ioctl.c

drm: Do not report PRIME as supported

This fixes kernel panics with the Ravenports graphics stack

DragonFlyBSD/src 161b332sys/kern vfs_syscalls.c

kernel - Remove improper direct user-space access

* chroot_kernel() (a privileged system call) was improperly
  callin kprintf() with a direct user address.  Just remove
  the kprintf().

Reported-by: tdfbsd

DragonFlyBSD/src 2a7bd4dsys/kern sys_vmm.c, sys/platform/pc64/x86_64 machdep.c db_trace.c

kernel: Don't include <sys/user.h> in kernel code.

There is really no point in doing that because its main purpose is to
expose kernel structures to userland. The majority of cases wasn't
needed at all and the rest required only a couple of other includes.

DragonFlyBSD/src 8bfc56acontrib/mdocml README.DELETED, usr.bin/mandoc config.h Makefile

mandoc(1): Use base recallocarray().

DragonFlyBSD/src aaea11ccontrib/mdocml compat_recallocarray.c

Merge branch 'vendor/MDOCML'

DragonFlyBSD/src 7c2b5adcontrib/mdocml compat_recallocarray.c

Remove the compat recallocarray() on the vendor branch.

DragonFlyBSD/src 68ea669bin/sh Makefile, share/man Makefile

Fix building release on master.

* <histedit.h> was moved to /usr/include/priv on master, so add that
  to the include search path when building sh(1) as a bootstrap tool.

* Fix the apropos(1) database generation (used for 'make distribution').
  If the system doesn't have the makewhatis(8) for a compatible
  database, just build no database.

DragonFlyBSD/src 3269e75share/man Makefile

makedb: Fix apropos database generation better across release/master.

The apropos database format used by our new man(1) is different and
incompatible to that used by our old man(1). The files are also named
differently, mandoc.db (new) and whatis (old).

So it makes no sense to use the old makewhatis on new systems or the
new makewhatis on old systems. If the desired makewhatis does not
exist, then we just don't generate the db, because the building system
doesn't have the makewhatis needed to generate it.

Once installed, the database will be updated regularly as per weekly
+4-61 files

DragonFlyBSD/src 3cc72d3sys/platform/pc64/x86_64 exception.S

Revert "kernel - Clean up direction flag on syscall entry"

Actually not needed, the D flag is cleared via the mask
set in MSR_SF_MASK.  Revert.

This reverts commit cea0e49dc0b2e5aea1b929d02f12d00df66528e2.

DragonFlyBSD/src cd9c487sys/cpu/x86_64/include asmacros.h, sys/platform/pc64/x86_64 machdep.c

kernel - Implement support for SMAP and SMEP security (3)

* Issue clac after the push on all traps, interrupts, and

* Improve code documentation.

DragonFlyBSD/src cea0e49sys/platform/pc64/x86_64 exception.S

kernel - Clean up direction flag on syscall entry

* Make sure the direction flag is clear on syscall entry.  Don't
  trust userland.

DragonFlyBSD/src 2f6148asys/platform/pc64/x86_64 pmap.c

kernel - Implement support for SMAP and SMEP security (2)

* Oops.  Do the CR4 initialization in the correct place, so it is
  applied to all CPUs.

DragonFlyBSD/src 48c77f2sys/cpu/x86_64/include asmacros.h specialreg.h, sys/platform/pc64/x86_64 support.s machdep.c

kernel - Implement support for SMAP and SMEP security

* Implement support for SMAP security.  This prevents accidental
  accesses to user address space from the kernel.  When available,
  we wrap intentional user-space accesses from the kernel with
  the 'stac' and 'clac' instructions.

  We use a NOP replacement policy to implement the feature.  The wrapper
  is initially a 'nop %eax' (3-byte NOP), and is replaced by 'stac' and
  'clac' via a .section iteration when the feature is supported.

* Implement support for SMEP security.  This prevents accidental
  execution of user code from the kernel and simply requires
  turning the bit on in CR4.

* Reports support in dmesg via the 'CPU Special Features Installed:'

DragonFlyBSD/src d4e0b0csys/platform/pc64/conf kern.mk, sys/platform/vkernel64/conf kern.mk

kernel - Implement retpoline for kernel

* Now that we have gcc-8 operational, we can turn on retpoline (software
  spectre protection against the return stack buffer).  Turn it on via

* No discernable performance loss with a generic buildkernel test:

                             Xeon e5-2620v4 x 2
                        time make -j 32 nativekernel (all tmpfs)
BEFORE 1717.427u 323.662s 2:28.49 1374.5%      9582+721k 200842+0io 4870pf+0w
BEFORE 1720.130u 338.635s 2:30.21 1370.5%      9555+720k 199720+0io 4804pf+0w
BEFORE 1722.395u 341.508s 2:30.71 1369.4%      9559+720k 199720+0io 4804pf+0w

AFTER  1720.271u 329.492s 2:28.27 1382.4%      9578+721k 200842+0io 4870pf+0w
AFTER  1736.268u 344.874s 2:30.90 1379.1%      9555+720k 199720+0io 4804pf+0w
AFTER  1726.056u 348.324s 2:31.14 1372.4%      9543+719k 199720+0io 4804pf+0w

DragonFlyBSD/src 1cf78b6lib/libtelnet genget.c, sys/dev/drm linux_vmalloc.c

Don't include "internal" headers outside of regular headers.

Include files like <sys/_timespec.h> and so on contain small parts
such as struct timespec that are supposed to be provided by multiple
regular headers. They should only be included by other headers, not
by *.c files.

None of these was actually needed except for the libtelnet one
(replaced with <stddef.h>).

DragonFlyBSD/src 154145ashare/misc pci_vendors

Update the pciconf(8) database.

May 14, 2019 snapshot from https://pci-ids.ucw.cz
+57-141 files

DragonFlyBSD/src 67e7cb8sys/platform/pc64/x86_64 pmap.c, sys/vm vm_map.c vm_fault.c

kernel - VM rework part 8 - Precursor work for terminal pv_entry removal

* Adjust structures so the pmap code can iterate backing_ba's with
  just the vm_object spinlock.

  Add a ba.pmap back-pointer.

  Move entry->start and entry->end into the ba (ba.start, ba.end).
  This is replicative of the base entry->ba.start and entry->ba.end,
  but local modifications are locked by individual objects to allow
  pmap ops to just look at backing ba's iterated via the object.

  Remove the entry->map back-pointer.

  Remove the ba.entry_base back-pointer.

* ba.offset is now an absolute offset and not additive.  Adjust all code
  that calculates and uses ba.offset (fortunately it is all concentrated
  in vm_map.c and vm_fault.c).

* Refactor ba.start/offset/end modificatons to be atomic with
  the necessary spin-locks to allow the pmap code to safely iterate
  the vm_map_backing list for a vm_object.

* Test VM system with full synth run.

DragonFlyBSD/src cd89a7csys/cpu/x86_64/include asmacros.h specialreg.h, sys/dev/misc/cpuctl cpuctl.c

kernel - Add MDS mitigation support for Intel side-channel attack

* Add MDS (Microarchitectural Data Sampling) attack mitigation to
  the kernel.  This is an attack against Intel CPUs made from 2011
  to date.  The attack is not currently known to work against AMD CPUs.

  With an intel microcode update the mitigation can be enabled with

  sysctl machdep.mds_mitigation=MD_CLEAR

* Without the intel microcode update, only disabling hyper-threading
  gives you any protection.  Older architectures might not get
  support.  If sysctl machdep.mds_support does not show support,
  then the currently loaded microcode does not have support for the

* DragonFlyBSD only supports the MD_CLEAR mode, and it will only
  be available with a microcode update from Intel.

  Updating the microcode alone does not protect against the attack.
  The microcode must be updated AND the mode must be turned on in
  DragonFlyBSD to protect against the attack.

  This mitigation burns around 250nS of additional latency on kernel->user
  transitions (system calls and interrupts primarily).  The additional

    [10 lines not shown]

DragonFlyBSD/src d29a243sys/dev/misc/evdev input.h input-event-codes.h

kernel/evdev: Synchronize event codes with Linux 4.16

Taken-from: FreeBSD, Linux

DragonFlyBSD/src b089704lib/libc/gen dlfcn.c

rtld-elf - Notify thread state to optimize relocations (2)

* Remove write() prototype in dlfcn.c that was only used for

Reminded-by: swildner

DragonFlyBSD/src 0d06b0alib/libc/gen sysconf.3, lib/libc/sys pathconf.2

pathconf.2/sysconf.3: Add some related references to SEE ALSO.

DragonFlyBSD/src 7f62b37lib/libc/gen _pthread_stubs.c

libc: Implement properly pthread_equal() stub.

Functional stub is needed to avoid forcing thread library on librecrypto.

DragonFlyBSD/src 5b329e6sys/vm vm_map.c vm_swapcache.c

kernel - VM rework part 7 - Initial vm_map_backing index

* Implement a TAILQ and hang vm_map_backing structures off
  of the related object.  This feature is still in progress
  and will eventually be used to allow pmaps to manipulate
  vm_page's without pv_entry's.

  At the same time, remove all sharing of vm_map_backing.
  For example, clips no longer share the vm_map_backing.  We
  can't share the structures if they are being used to
  itemize areas for pmap management.

  TODO - reoptimize this at some point.

  TODO - not yet quite deterministic enough for pmap
         searches (due to clips).

* Refactor vm_object_reference_quick() to again allow
  operation on any vm_object whos ref_count is already
  at least 1, or which belongs to a vnode.  The ref_count
  is no longer being used for complex vm_object collapse,
  shadowing, or migration code.

  This allows us to avoid a number of unnecessary token
  grabs on objects during clips, shadowing, and forks.

    [7 lines not shown]

DragonFlyBSD/src 175f5a8sys/vm vm_map.c

kernel - VM rework part 6 - Stabilize

* Fix a case and situations where VPAGETABLE won't work.
+10-21 files

DragonFlyBSD/src 50caca1lib/libc/gen dlfcn.c Symbol.map, lib/libthread_xu/thread thr_kern.c

rtld-elf - Notify thread state to optimize relocations

* Add shims to allow libthread_xu to notify rtld when threading
  is being used.

* Requires weak symbols in libc which are overriden by rtld-elf.

* Implement the feature in rtld-elf and use it to avoid making calls
  to lwp_gettid().  When threaded, use tls_get_tcb() (which does not
  require a system call) instead of lwp_gettid().  When not threaded,
  just use a constant.

  NOTE: We cannot use tls_get_tcb() unconditionally because the tcb
        is not setup during early relocations.  So do this whack-a-mole
        to make it work.

* This leaves just the sigprocmask wrappers around rtld-elf (which
  are needed to prevent stacked relocations from signal handlers).

Poked-by: mjg

DragonFlyBSD/src 8492a2fsys/kern sysv_shm.c, sys/sys thread.h

kernel - VM rework part 5 - Cleanup

* Cleanup vm_map_entry_shadow()

* Remove (unused) vmspace_president_count()
  Remove (barely used) struct lwkt_token typedef.

* Cleanup the vm_map_aux, vm_map_entry, vm_map, and vm_object

* Adjfustments to in-code documentation

DragonFlyBSD/src 2ee9073sys/bus/cam/scsi scsi_da.c

kernel - Restore kern.cam.da.X.trim_enabled sysctl

* This sysctl was not always being properly installed due to an
  ordering and timing issue.

* The code was not setting the trim flag in the correct structure.

DragonFlyBSD/src 747e296lib/i18n_module Makefile.inc, lib/libnetgraph7 Makefile

Clean up some Makefiles.

* WARNS?=6 is usually not needed because upper-level Makefile.inc's
  already have it (such as usr.bin/Makefile.inc).

* Remove an unneded SRCS in ndis_events(8).

DragonFlyBSD/src ae70e58lib/libutil Makefile

libutil: Raise WARNS to 6.

DragonFlyBSD/src 6e316fcusr.bin/kcollect gnuplot.c

kcollect - Adjust Mops right hand on graph

* Adjust the Mops cap based on ncpus.

DragonFlyBSD/src f2187f0sys/bus/cam/scsi scsi_da.c

kernel - Restore kern.cam.da.X.trim_enabled sysctl

* This sysctl was not always being properly installed due to an
  ordering and timing issue.

* The code was not setting the trim flag in the correct structure.

DragonFlyBSD/src 1dcf1bcsys/vm vm_object.c

kernel - VM rework (fix introduced bug)

* Fix a null-pointer dereferencing bug in vm_object_madvise() introduced
  in recent commits.
+1-41 files

DragonFlyBSD/src 1c024bcsys/vm vm_fault.c vm_map.c

kernel - VM rework part 4 - Implement vm_fault_collapse()

* Add the function vm_fault_collapse().  This function simulates
  faults to copy all pages from backing objects into the front
  object, allowing the backing objects to be disconnected
  from the map entry.

  This function is called under certain conditions from the
  vmspace_fork*() code prior to a fork to potentially collapse
  the entry's backing objects into the front object.  The
  caller then disconnects the backing objects, truncating the
  list to a single object (the front object).

  This optimization is necessary to prevent the backing_ba list
  from growing in an unbounded fashion.  In addition, being able
  to disconnect the graph allows redundant backing store to
  be freed more quickly, reducing memory use.

* Add sysctl vm.map_backing_shadow_test (default enabled).
  The vmspace_fork*() code now does a quick all-shadowed test on
  the first backing object and calls vm_fault_collapse()
  if it comes back true, regardless of the chain length.

* Add sysctl vm.map_backing_limit (default 5).
  The vmspace_fork*() code calls vm_fault_collapse() when the

    [4 lines not shown]