ZFS on Linux/src 060f022include/sys uberblock_impl.h, man/man5 zfs-module-parameters.5

MMP interval and fail_intervals in uberblock

When Multihost is enabled, and a pool is imported, uberblock writes
include ub_mmp_delay to allow an importing node to calculate the
duration of an activity test.  This value, is not enough information.

If zfs_multihost_fail_intervals > 0 on the node with the pool imported,
the safe minimum duration of the activity test is well defined, but does
not depend on ub_mmp_delay:

zfs_multihost_fail_intervals * zfs_multihost_interval

and if zfs_multihost_fail_intervals == 0 on that node, there is no such
well defined safe duration, but the importing host cannot tell whether
mmp_delay is high due to I/O delays, or due to a very large
zfs_multihost_interval setting on the host which last imported the pool.
As a result, it may use a far longer period for the activity test than
is necessary.

This patch renames ub_mmp_sequence to ub_mmp_config and uses it to
record the zfs_multihost_interval and zfs_multihost_fail_intervals
values, as well as the mmp sequence.  This allows a shorter activity
test duration to be calculated by the importing host in most situations.
These values are also added to the multihost_history kstat records.


    [30 lines not shown]

ZFS on Linux/src d10b2f1module/zfs dsl_dataset.c

Mutex leak in dsl_dataset_hold_obj()

In addition to dsl_dataset_evict_async() releasing a hold, there is
an error case in dsl_dataset_hold_obj() which had missed 4 additional
release calls.  This was introduced in a1d477c24.

openzfsonosx-commit: https://github.com/openzfsonosx/zfs/commit/63ff7f1c

Authored by: Jorgen Lundman <lundman at lundman.net>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1 at llnl.gov>
Ported-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #8517

ZFS on Linux/src 45001b9module/zfs qat_crypt.c

QAT: Allocate digest_buffer using QAT_PHYS_CONTIG_ALLOC()

If the buffer 'digest_buffer' is allocated in the qat_checksum()
stack, it can't ensure that the address is physically contiguous,
and the DMA result of the buffer may be handled incorrectly.
Using QAT_PHYS_CONTIG_ALLOC() ensures a physically
contiguous allocation.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tom Caputi <tcaputi at datto.com>
Signed-off-by: Chengfei, Zhu <chengfeix.zhu at intel.com>
Closes #8323 
Closes #8521 

ZFS on Linux/src ec4f9b8module/zfs dmu.c

Report holes when there are only metadata changes

Update the dirty check in dmu_offset_next() such that dnode's
are only considered dirty for the purpose or reporting holes
when there are pending data blocks or frees to be synced.  This
ensures that when there are only metadata updates to be synced
(atime) that holes are reported.

Reviewed-by: Debabrata Banerjee <dbanerje at akamai.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #6958 
Closes #8505 
DeltaFile
+28-3module/zfs/dmu.c
+28-31 files

ZFS on Linux/src 066da71lib/libzfs libzfs_import.c, tests/zfs-tests/tests/functional/cli_root/zpool_labelclear zpool_labelclear_valid.ksh zpool_labelclear_removed.ksh

Improve `zpool labelclear`

1) As implemented the `zpool labelclear` command overwrites
the calculated offsets of all four vdev labels even when only a
single valid label is found.  If the device as been re-purposed
but still contains a valid label this can result in space no
longer owned by ZFS being zeroed.  Prevent this by verifying
every label removed is intact before it's overwritten.

2) Address a small bug in zpool_do_labelclear() which prevented
labelclear from working on file vdevs.  Only block devices support
BLKFLSBUF, try the ioctl() but when it's reported as unsupported
this should not be fatal.

3) Fix `zpool labelclear` so it can be run on vdevs which were
removed from the pool with `zpool remove`.  Additionally, allow
intact but partial labels to be cleared as in the case of a failed
`zpool attach` or `zpool replace`.

4) Remove LABELCLEAR and LABELREAD variables for test cases.

Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Tim Chase <tim at chase2k.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>

    [3 lines not shown]

ZFS on Linux/src 304d469module/zfs dnode.c

Add missing dmu_zfetch_fini() in dnode_move_impl()

As it turns out, on the Windows platform when rw_init() is called
(rather its bedrock call ExInitializeResourceLite) it is placed on
an active-list of locks, and is removed at rw_destroy() time.

dnode_move() has logic to copy over the old-dnode to new-dnode,
including calling dmu_zfetch_init(new-dnode). But due to the missing
dmu_zfetch_fini(old-dnode), kmem will call dnode_dest() to release the
memory (and in debug builds fill pattern 0xdeadbeef) over the Windows
active-lock's prev/next list pointers, making Windows sad.

But on other platforms, the contents of dmu_zfetch_fini() is one
call to list_destroy() and one to rw_destroy(), which is effectively
a no-op call and is not required. This commit is mostly for
"correctness" and can be skipped there.

Porting Notes:
* This leak exists on Linux but currently can never happen because
  the dnode_move() functionality is not supported.

openzfsonosx-commit: openzfsonosx/zfs at d95fe517

Authored by: Julian Heuking <JulianH at beckhoff.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>

    [3 lines not shown]
DeltaFile
+1-1module/zfs/dnode.c
+1-11 files

ZFS on Linux/src 73c25a7lib/libzfs libzfs_sendrecv.c

Add space in error message

This patch simply adds a missing space in the
ZFS_ERR_FROM_IVSET_GUID_MISSING error message.

Reviewed-by: Richard Laager <rlaager at wiktel.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Don Brady <don.brady at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Closes #8514 

ZFS on Linux/src ca6c7a9module/zfs arc.c

Fix l2arc_evict() destroy race

When destroying an arc_buf_hdr_t its identity cannot be discarded
until it is entirely undiscoverable.  This not only includes being
unhashed, but also being removed from the l2arc header list.
Discarding the header's identify prematurely renders the hash
lock useless because it will always hash to bucket zero.

This change resolves a race with l2arc_evict() by discarding the
identity after it has been removed from the l2arc header list.
This ensures either the header is not on the list or contains
the correct identify.

Reviewed-by: Tom Caputi <tcaputi at datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #7688 
Closes #8144 
DeltaFile
+30-24module/zfs/arc.c
+30-241 files

ZFS on Linux/src ab7615dcmd/zinject zinject.c, module/zfs dsl_scan.c zio_inject.c

Multiple DVA Scrubbing Fix

Currently, there is an issue in the sequential scrub code which
prevents self healing from working in some cases. The scrub code
will split up all DVA copies of a bp and issue each of them
separately. The problem is that, since each of the DVAs is no
longer associated with the others, the self healing code doesn't
have the opportunity to repair problems that show up in one of the
DVAs with the data from the others.

This patch fixes this issue by ensuring that all IOs issued by the
sequential scrub code include all DVAs. Initially, only the first
DVA of each is attempted. If an issue arises, the IO is retried
with all available copies, giving the self healing code a chance
to correct the issue.

To test this change, this patch also adds the ability for zinject
to specify individual DVAs to inject read errors into. We then
add a new test case that utilizes this functionality to ensure
scrubs and self-healing reads can handle and transparently fix
issues with individual copies of blocks.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Closes #8453 

ZFS on Linux/src 2bbec1cmodule/zfs zio.c vdev.c, tests/runfiles linux.run

Make zpool status counters match error events count

The number of IO and checksum events should match the number of errors
seen in zpool status.  Previously there was a mismatch between the
two counts because zpool status would only count unrecovered errors,
while zpool events would get an event for *all* errors (recovered or
not).  This lead to situations where disks could be faulted for
"too many errors", while at the same time showing zero errors in zpool
status.

This fixes the zpool status error counters to increment at the same
times we post the error events.

Reviewed-by: Tom Caputi <tcaputi at datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #4851 
Closes #7817 

ZFS on Linux/src 04a3b07module/zfs zfs_vfsops.c

Fix memory leaks in zfsvfs_create_impl()

This patch simply fixes some small memory leaks that can happen
during error handling in zfsvfs_create_impl(). If the function
fails, it frees all the memory / references it created.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Closes #8490 

ZFS on Linux/src c742bf1etc/init.d zfs-import.in

zfs-import: should be before swap

zfs-import must be done before swap in order for swap on zvol to work

Reviewed-by: Kash Pande <kash at tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Henrik Riomar <henrik.riomar at gmail.com>
Closes #8502 

ZFS on Linux/src eaed840cmd/zpool zpool_main.c, module/zfs zfeature.c dsl_crypt.c

Better user experience for errata 4

This patch attempts to address some user concerns that have arisen
since errata 4 was introduced.

* The errata warning has been made less scary for users without
  any encrypted datasets.

* The errata warning now clears itself without a pool reimport if
  the bookmark_v2 feature is enabled and no encrypted datasets
  exist.

* It is no longer possible to create new encrypted datasets without
  enabling the bookmark_v2 feature, thus helping to ensure that the
  errata is resolved.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Issue ##8308
Closes #8504 

ZFS on Linux/src 98310e5cmd/zed/zed.d zed.rc

Update commented zed.rc values to defaults

Update zed.rc values reflect their default value.  This helps
avoid confusion if a user expects functionality to be enabled.

Reviewed-by: Richard Laager <rlaager at wiktel.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Kash Pande <kash at tripleback.net>
Closes #8498 

ZFS on Linux/src 508c552tests/zfs-tests/tests/functional/cli_root/zfs_change-key zfs_change-key_format.ksh, tests/zfs-tests/tests/functional/cli_root/zfs_create zfs_create_crypt_combos.ksh

Use 'printf %s' instead of 'echo -n' for compatibility

The ksh 'echo -n' behavior on Illumos and Linux differs.  For
compatibility with others platforms switch to "printf '%s' ".

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Allan Jude <allanjude at freebsd.org>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Signed-off-by: Igor Kozhukhov <igor at dilos.org>
Closes #8501 

ZFS on Linux/src 1af240fman/man5 zfs-module-parameters.5, module/zfs vdev_queue.c

Add separate aggregation limit for non-rotating media

Before sequential scrub patches ZFS never aggregated I/Os above 128KB.
Sequential scrub bumped that to 1MB, supposedly to reduce number of
head seeks for spinning disks.  But for SSDs it makes little to no
sense, especially on FreeBSD, where due to MAXPHYS limitation device
will likely still see bunch of 128KB I/Os instead of one large.
Having more strict aggregation limit for SSDs allows to avoid
allocation of large memory buffer and copy to/from it, that is a
serious problem when throughput reaches gigabytes per second.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Closes #8494 

ZFS on Linux/src 12a935e.github CONTRIBUTING.md

Update CONTRIBUTING to point users to IRC as well as mailing list

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Richard Laager <rlaager at wiktel.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Kash Pande <kash at tripleback.net>
Closes #8466 

ZFS on Linux/src 5cc9ba5cmd/zstreamdump zstreamdump.c, tests/zfs-tests/tests/functional/rsend rsend.kshlib send-c_zstreamdump.ksh

Make zstreamdump -v more greppable

Currently, the verbose output of zstreamdump includes new line
characters within some individual records. Presumably, this was
originally done to keep the output from getting too wide to fit
on a terminal. However, since new flags and struct members have
been added, these rules have not been maintained consistently. In
addition, these newlines can make it hard to grep the output in
some scenarios. This patch simply removes these newlines, making
the output easier to grep and removing the inconsistency.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed by: Allan Jude <allanjude at freebsd.org>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Closes #8493 

ZFS on Linux/src 1814242module/nvpair nvpair.c

OpenZFS 9914 - NV_UNIQUE_NAME_TYPE broken after 9580

Authored by: Andrew Stormont <astormont at racktopsystems.com>
Reviewed by: Yuri Pankov <yuripv at yuripv.net>
Reviewed by: Garrett D'Amore <garrett at damore.org>
Reviewed by: Andy Fiddaman <omnios at citrus-it.co.uk>
Reviewed-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Reviewed-by: Igor Kozhukhov <igor at dilos.org>
Approved by: Dan McDonald <danmcd at joyent.com>
Ported-by: Brian Behlendorf <behlendorf1 at llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/9914
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b8a5bee18
Closes #8496

ZFS on Linux/src f00ab3fmodule/zfs dsl_crypt.c dmu_recv.c, tests/zfs-tests/tests/functional/cli_root/zpool_import zpool_import_errata4.ksh

Detect and prevent mixed raw and non-raw sends

Currently, there is an issue in the raw receive code where
raw receives are allowed to happen on top of previously
non-raw received datasets. This is a problem because the
source-side dataset doesn't know about how the blocks on
the destination were encrypted. As a result, any MAC in
the objset's checksum-of-MACs tree that is a parent of both
blocks encrypted on the source and blocks encrypted by the
destination will be incorrect. This will result in
authentication errors when we decrypt the dataset.

This patch fixes this issue by adding a new check to the
raw receive code. The code now maintains an "IVset guid",
which acts as an identifier for the set of IVs used to
encrypt a given snapshot. When a snapshot is raw received,
the destination snapshot will take this value from the
DRR_BEGIN payload. Non-raw receives and normal "zfs snap"
operations will cause ZFS to generate a new IVset guid.
When a raw incremental stream is received, ZFS will check
that the "from" IVset guid in the stream matches that of
the "from" destination snapshot. If they do not match, the
code will error out the receive, preventing the problem.

This patch requires an on-disk format change to add the

    [10 lines not shown]

ZFS on Linux/src 579ce7cinclude zfeature_common.h, include/sys dsl_bookmark.h

Add bookmark v2 on-disk feature

This patch adds the bookmark v2 feature to the on-disk format. This
feature will be needed for the upcoming redacted sends and for an
upcoming fix that for raw receives. The feature is not currently
used by any code and thus this change is a no-op, aside from the
fact that the user can now enable the feature.

Reviewed-by: Paul Dagnelie <pcd at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Issue #8308

ZFS on Linux/src 369aa50include/sys dnode.h, module/zfs dmu_recv.c dnode.c

Fix handling of maxblkid for raw sends

Currently, the receive code can create an unreadable dataset from
a correct raw send stream. This is because it is currently
impossible to set maxblkid to a lower value without freeing the
associated object. This means truncating files on the send side
to a non-0 size could result in corruption. This patch solves this
issue by adding a new 'force' flag to dnode_new_blkid() which will
allow the raw receive code to force the DMU to accept the provided
maxblkid even if it is a lower value than the existing one.

For testing purposes the send_encrypted_files.ksh test has been
extended to include a variety of truncated files and multiple
snapshots. It also now leverages the xattrtest command to help
ensure raw receives correctly handle xattrs.

Reviewed-by: Paul Dagnelie <pcd at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Signed-off-by: Tom Caputi <tcaputi at datto.com>
Closes #8168 
Closes #8487 

ZFS on Linux/src 146bdc4cmd/arc_summary arc_summary3

Fix typo in arc_summary3

This is a simple fix for a typo ("perfetch" rather than "prefetch")
in arc_summary3.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Richard Laager <rlaager at wiktel.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Jason Cohen <jwittlincohen at gmail.com>
Closes #8499 

ZFS on Linux/src db2af93include/sys mmp.h, man/man5 zfs-module-parameters.5

Increase default zfs_multihost_fail_intervals and import_intervals

By default, when multihost is enabled for a pool, the pool is
suspended if (zfs_multihost_fail_intervals*zfs_multihost_interval) ms
pass without a successful MMP write.  This is the recommended
configuration.

The default value for zfs_multihost_fail_intervals has been 5, and the
default value for zfs_multihost_interval has been 1000, so pool
suspension occurred at 5 seconds.

There have been multiple cases where a single misbehaving device in a
pool triggered a SCSI reset, and all I/O paused for 5-6 seconds.  This
in turn caused MMP to suspend the pool.

In the cases observed, the rest of the devices were healthy and the
pool was otherwise correctly performing I/O.  The reset was handled
correctly by ZFS, and by suspending the pool MMP made replacing the
device more difficult as well as forcing the host to be rebooted.

Increase the default value of zfs_multihost_fail_intervals to 10, so
that MMP tolerates up to 10 seconds of failed MMP writes before
suspending the pool.

Increase the default value of zfs_multihost_import_intervals to 20, to

    [9 lines not shown]

ZFS on Linux/src cffa837module/zfs arc.c

Fix most zfs_arc_* mod params not actually being modifiable at runtime

Most of the zfs_arc_* module parameters do not have their values used by
the ARC code directly. Instead, there is a function, arc_tuning_update,
which is called during module initialization and periodically
thereafter, whose job is to fetch the module parameter values, clamp/
limit them appropriately, and then assign those values to a separate set
of internal variables that are actually referenced by the ARC code.

Commit 3ec34e55 featured an overhaul of arc_reclaim_thread, which is the
former location where the post-init-time calls to arc_tuning_update
would occur. The rework split the work previously done by the
arc_reclaim_thread into a pair of replacement threads; and
unfortunately, the call to arc_tuning_update fell through the cracks and
was lost in the reorganization.

This meant that changing almost any ARC-related zfs module parameter via
/sys/module/zfs/parameters/ would result in the module parameter value
itself appearing to change; however the modification would not actually
propagate to the ARC code and have any real effect.

This commit reinstates the post-init-time call to arc_tuning_update. It
is now called during arc_adjust_cb_check; this should be equivalent to
its former call location in arc_reclaim_thread.


    [5 lines not shown]
DeltaFile
+8-0module/zfs/arc.c
+8-01 files

ZFS on Linux/src 4c0883fcmd/zfs zfs_main.c, include/sys/fs zfs.h

Avoid retrieving unused snapshot props

This patch modifies the zfs_ioc_snapshot_list_next() ioctl to enable it
to take input parameters that alter the way looping through the list of
snapshots is performed. The idea here is to restrict functions that
throw away some of the snapshots returned by the ioctl to a range of
snapshots that these functions actually use. This improves efficiency
and execution speed for some rollback and send operations.

Reviewed-by: Tom Caputi <tcaputi at datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed by: Matt Ahrens <mahrens at delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk at datto.com>
Closes #8077 

ZFS on Linux/src dd785b5module/zfs vdev_initialize.c

Fix vdev_initialize_restart / removal race

Resolve a vdev_initialize crash uncovered by ztest.  Similar
to when starting a new initialization verify that a removal
is not in progress.  Additionally, do not restart when the
thread already exists.  This check is now congruent with the
POOL_INITIALIZE_DO handling in spa_vdev_initialize_impl().

Reviewed-by: Tom Caputi <tcaputi at datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #8477 

ZFS on Linux/src 3d31aadinclude/sys spa_impl.h, module/zfs mmp.c vdev.c

MMP writes rotate over leaves

Instead of choosing a leaf vdev quasi-randomly, by starting at the root
vdev and randomly choosing children, rotate over leaves to issue MMP
writes.  This fixes an issue in a pool whose top-level vdevs have
different numbers of leaves.

The issue is that the frequency at which individual leaves are chosen
for MMP writes is based not on the total number of leaves but based on
how many siblings the leaves have.

For example, in a pool like this:

       root-vdev
   +------+---------------+
vdev1                   vdev2
  |                       |
  |                +------+-----+-----+----+
disk1             disk2 disk3 disk4 disk5 disk6

vdev1 and vdev2 will each be chosen 50% of the time.  Every time vdev1
is chosen, disk1 will be chosen.  However, every time vdev2 is chosen,
disk2 is chosen 20% of the time.  As a result, disk1 will be sent 5x as
many MMP writes as disk2.


    [18 lines not shown]

ZFS on Linux/src b1b94e9module/zfs zpl_file.c

zfs does not honor NFS sync write semantics

The linux kernel's nfsd implementation use RWF_SYNC to determine if the
write is synchronous or not. This flag is used to set the kernel's I/O
control block flags. Unfortunately, ZFS was not updated to inspect these
flags so NFS sync writes were not being honored.

This change maps the IOCB_* flags to the ZFS equivalent.

Reviewed-by: Don Brady <don.brady at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: George Wilson <george.wilson at delphix.com>
Closes #8474 
Closes #8452 
Closes #8486 
DeltaFile
+30-2module/zfs/zpl_file.c
+30-21 files

ZFS on Linux/src 1118f99module/zfs dsl_dataset.c

Fix lockdep between ds_lock and dd_lock in dsl_dataset_namelen()

Booting debug kernel found an inconsistent lock dependency between
dataset's ds_lock and its directory's dd_lock.

[ 32.215336] ======================================================
[ 32.221859] WARNING: possible circular locking dependency detected
[ 32.221861] 4.14.90+ #8 Tainted: G           O
[ 32.221862] ------------------------------------------------------
[ 32.221863] dynamic_kernel_/4667 is trying to acquire lock:
[ 32.221864]  (&ds->ds_lock){+.+.}, at: [<ffffffffc10a4bde>] 
dsl_dataset_check_quota+0x9e/0x8a0 [zfs]
[ 32.221941] but task is already holding lock:
[ 32.221941]  (&dd->dd_lock){+.+.}, at: [<ffffffffc10cd8e9>] 
dsl_dir_tempreserve_space+0x3b9/0x1290 [zfs]
[ 32.221983] which lock already depends on the new lock.
[ 32.221983] the existing dependency chain (in reverse order) is:
[ 32.221984] -> #1 (&dd->dd_lock){+.+.}:
[ 32.221992]    __mutex_lock+0xef/0x14c0
[ 32.222049]    dsl_dir_namelen+0xd4/0x2d0 [zfs]
[ 32.222093]    dsl_dataset_namelen+0x2f1/0x430 [zfs]
[ 32.222142]    verify_dataset_name_len+0xd/0x40 [zfs]
[ 32.222184]    dmu_objset_find_dp_impl+0x5f5/0xef0 [zfs]
[ 32.222226]    dmu_objset_find_dp_cb+0x40/0x60 [zfs]
[ 32.222235]    taskq_thread+0x969/0x1460 [spl]

    [47 lines not shown]

ZFS on Linux/src bf90948include/sys/fs zfs.h, tests/zfs-tests/cmd/libzfs_input_check libzfs_input_check.c

Reorder ZFS ioctls to fix cross-version compatibility

Reorder ZFS ioctls to fix cross-version compatibility.

Reviewed-by: Don Brady <don.brady at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed by: Matt Ahrens <mahrens at delphix.com>
Signed-off-by: Lorenz Brun <lorenz at dolansoft.org>
Closes #8484 

ZFS on Linux/src b46fd24module/spl spl-vnode.c

Linux 5.1 compat: get_ds() removed

Commit torvalds/linux at 736706bee has removed the get_fs() function
as a bit of cleanup.  It has been defined as KERNEL_DS on all
architectures for all supported kernels.  Replace get_fs() with
KERNEL_DS as was done in the kernel.

Reviewed-by: Tom Caputi <tcaputi at datto.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #8479 

ZFS on Linux/src becdcecconfig kernel-fpu.m4, include/linux simd_x86.h

kernel_fpu fixes

This patch fixes a few issues when detecting which kernel_fpu functions
are available.

- Use kernel_fpu_begin() if it's exported on newer kernels.

- Use ZFS_LINUX_TRY_COMPILE_SYMBOL() to choose the right kernel_fpu
  function when using --enable-linux-builtin.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #8259
Closes #8363 

ZFS on Linux/src a73e8fdinclude/sys dmu.h, module/zfs bpobj.c dbuf.c

Stack overflow in recursive bpobj_iterate_impl

The function bpobj_iterate_impl overflows the stack when bpobjs
are deeply nested. Rewrite the function to eliminate the recursion.

Reviewed-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Reviewed-by: Matt Ahrens <mahrens at delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski at datto.com>
Closes #7674
Closes #7675 
Closes #7908 

ZFS on Linux/src 96ebc5amodule/zfs vdev_initialize.c

Fix race in vdev_initialize_thread

Before allowing new allocations to the metaslab we need to ensure
that any issued initializing writes have been synced.  Otherwise,
it's possible for metaslab_block_alloc() to allocate a range which
is about to be overwritten by an initializing IO.

Serapheim Dimitropoulos <serapheim at delphix.com>
Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Reviewed-by: Tim Chase <tim at chase2k.com>
Reviewed-by: George Wilson <george.wilson at delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #8461 

ZFS on Linux/src 762f9efconfig pkg.m4 find_system_library.m4, lib/libspl/include/rpc xdr.h

config: better libtirpc detection

Improve the autoconf code for finding libtirpc and do not assume the
headers are in /usr/include/tirpc.

Also remove this assumption from the `rpc/xdr.h` header in libspl and
use the same `#include_next` mechanism that is used for other libspl
headers.

Include pkg.m4 from pkg-config in config/ for PKG_CHECK_MODULES(), the
file license allows this.

Include ax_save_flags.m4 and ax_restore_flags.m4 from autoconf-archive,
the file licenses are compatible. Use the 2012 versions so as not rely
on a more recent autoconf feature AS_VAR_COPY(), which breaks some build
slaves.

Add new macro library `config/find_system_library.m4` which defines the
FIND_SYSTEM_LIBRARY() macro which is a convenience wrapper over using
PKG_CHECK_MODULES() with a fallback to standard library locations and
some sanity checks.

The parameters are:

```

    [24 lines not shown]

ZFS on Linux/src 0409679module/spl spl-kmem-cache.c

Fix style of spl_kmem_cache_create()

Fix indentation of code in ifdef's.
Remove obsolete comment.
Make if/else statements more readable by adding braces.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens at delphix.com>
Closes #8459 

ZFS on Linux/src 8133679cmd/ztest ztest.c, man/man8 zpool.8

Do not resume a pool if multihost is enabled

When multihost is enabled, and a pool is suspended, return
EINVAL in response to "zpool clear <pool>".  The pool
may have been imported on another host while I/O was suspended.

Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Olaf Faaland <faaland1 at llnl.gov>
Closes #6933 
Closes #8460 

ZFS on Linux/src 4f3218aman/man8 zpool.8

Warn user about accidentally sharing devices

Improve the man page text to warn the user about the risk of adding
the same device to multiple pools via simultaneous "zpool create",
"zpool add", "zpool replace", etc.

State that MMP/multihost does not protect against these scenarios.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Signed-off-by: Olaf Faaland <faaland1 at llnl.gov>
Closes #6473 
Closes #8457 
DeltaFile
+30-5man/man8/zpool.8
+30-51 files

ZFS on Linux/src 87c25d5man/man5 zfs-module-parameters.5, module/zfs abd.c

abd_alloc should use scatter for >1K allocations

abd_alloc() normally does scatter allocations, thus solving the problem
that ABD originally set out to: the bulk of ZFS's allocations are single
pages, which are faster to allocate and free, and don't suffer from
internal fragmentation (and the inability to reclaim memory because some
buffers in the slab are still allocated).

However, the current code does linear allocations for 4KB and smaller
allocations, defeating the purpose of ABD.

Scatter ABD's use at least one page each, so sub-page allocations waste
some space when allocated as scatter (e.g. 2KB scatter allocation wastes
half of each page).  Using linear ABD's for small allocations means that
they will be put on slabs which contain many allocations.  This can
improve memory efficiency, but it also makes it much harder for ARC
evictions to actually free pages, because all the buffers on one slab
need to be freed in order for the slab (and underlying pages) to be
freed.  Typically, 512B and 1KB kmem caches have 16 buffers per slab, so
it's possible for them to actually waste more memory than scatter (one
page per buf = wasting 3/4 or 7/8th; one buf per slab = wasting
15/16th).

Spill blocks are typically 512B and are heavily used on systems running
selinux with the default dnode size and the `xattr=sa` property set.

    [11 lines not shown]

ZFS on Linux/src 3a1f2d5etc/init.d zfs-share.in

Remove zfs-zed hard dep from zfs-share init script

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: bunder2015 <omfgbunder at gmail.com>
Signed-off-by: Chris Zubrzycki <github at mid-earth.net>
Closes #8447 

ZFS on Linux/src 4616412contrib/initramfs/scripts zfs.in

initramfs/debian: use panic() instead of directly calling /bin/sh

Debian has a panic() function which makes it possible to disable shell
access in initramfs by setting the panic kernel parameter. Use it.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Kash Pande <kash at tripleback.net>
Signed-off-by: Michael Niewöhner <foss at mniewoehner.de>
Closes #8448 

ZFS on Linux/src 4ad1aac.github CONTRIBUTING.md

mailing list link update

update link to reference IRC as well as mailing list.

Signed-off-by: Kash Pande <kash at tripleback.net>

ZFS on Linux/src d6838aecmd/zstreamdump zstreamdump.c

zstreamdump: include embedded writes when dumping raw data (-d)

When feeding a replication stream to `zstreamdump -d` (raw dump mode),
it does not print the raw data for DRR_WRITE_EMBEDDED records.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed by: Matt Ahrens <mahrens at delphix.com>
Signed-off-by: Allan Jude <allanjude at freebsd.org>
Closes #8430 

ZFS on Linux/src 6af7ba4module/zfs spa_stats.c vdev.c

Fix overly broad spa config lock

The spa_txg_history_init_io() and spa_txg_history_fini_io() were
mistakenly taking SCL_ALL when only SCL_CONFIG is required to
access the vdev stats.  This could result in a deadlock which
was observed when running ztest.

Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Tim Chase <tim at chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #8445 

ZFS on Linux/src c568ab8cmd/zfs zfs_main.c, man/man8 zfs.8 zfs-program.8

zfs.8 has wrong description of "zfs program -t"

The "-t" argument to "zfs program" specifies a limit on the number of
LUA instructions that can be executed.  The zfs.8 manpage has the wrong
description.  It should be updated to match what's in zfs-program.8

Also fix the formatting of the zfs help message.

Reviewed by: Allan Jude <allanjude at freebsd.org>
Reviewed-by: loli10K <ezomori.nozomu at gmail.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens at delphix.com>
Closes #8410 

ZFS on Linux/src 47d7ef5lib/libzutil zutil_import.c

Sort by full path name instead of by GUID when importing

Preferentially sort by the full path name instead of GUID when determining
which device links to use.  This helps ensure that the pool vdevs are named
consistently when multiple links for a device appear in the same directory.
For example, the /dev/disk/by-id/scsi* and /dev/disk/by-id/wwn* links.

Reviewed-by: Alek Pinchuk <apinchuk at datto.com>
Reviewed-by: Richard Elling <Richard.Elling at RichardElling.com>
Authored-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Kash Pande <kash at tripleback.net>
Closes #8108 
Closes #8440 

ZFS on Linux/src e065034lib/libzfs libzfs_dataset.c

Improve error message for zfs create with @ or # in name

Reorder the `zfs create` error messages in order to return the most
specific one first.  If none of them apply then an expanded version of
the invalid name message is used.

Reviewed by: Tom Caputi <tcaputi at datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed by: Matt Ahrens <mahrens at delphix.com>
Signed-off-by: Damian Wojsław <damian at wojslaw.pl>
Closes #8155 
Closes #8352 

ZFS on Linux/src ba7b05cman/man8 zfs.8

zfs(8): improve document of compression behaviours

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Reviewed by: Allan Jude <allanjude at freebsd.org>
Reviewed-by: bunder2015 <omfgbunder at gmail.com>
Signed-off-by: DHE <git at dehacked.net>
Closes #4660 
Closes #8423 
DeltaFile
+13-0man/man8/zfs.8
+13-01 files

ZFS on Linux/src 8eef997module/zfs metaslab.c

Error path in metaslab_load_impl() forgets to drop ms_sync_lock

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80 at gmail.com>
Reviewed-by: George Melikov <mail at gmelikov.ru>
Signed-off-by: Serapheim Dimitropoulos <serapheim at delphix.com>
Closes #8444