Commit Graph

691522 Commits

Author SHA1 Message Date
Mateusz Jurczyk
9380fa60b1 kernel/sysctl_binary.c: check name array length in deprecated_sysctl_warning()
Prevent use of uninitialized memory (originating from the stack frame of
do_sysctl()) by verifying that the name array is filled with sufficient
input data before comparing its specific entries with integer constants.

Through timing measurement or analyzing the kernel debug logs, a
user-mode program could potentially infer the results of comparisons
against the uninitialized memory, and acquire some (very limited)
information about the state of the kernel stack.  The change also
eliminates possible future warnings by tools such as KMSAN and other
code checkers / instrumentations.

Link: http://lkml.kernel.org/r/20170524122139.21333-1-mjurczyk@google.com
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
7c43a657a4 test_sysctl: test against int proc_dointvec() array support
Add a few initial respective tests for an array:

  o Echoing values separated by spaces works
  o Echoing only first elements will set first elements
  o Confirm PAGE_SIZE limit still applies even if an array is used

Link: http://lkml.kernel.org/r/20170630224431.17374-7-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
2920fad3a5 test_sysctl: add simple proc_douintvec() case
Test against a simple proc_douintvec() case.  While at it, add a test
against UINT_MAX.  Make sure UINT_MAX works, and UINT_MAX+1 will fail
and that negative values are not accepted.

Link: http://lkml.kernel.org/r/20170630224431.17374-6-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
eb965eda1c test_sysctl: add simple proc_dointvec() case
Test against a simple proc_dointvec() case.  While at it, add a test
against INT_MAX.  Make sure INT_MAX works, and INT_MAX+1 will fail.
Also test negative values work.

Link: http://lkml.kernel.org/r/20170630224431.17374-5-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
1c0357c846 test_sysctl: test against PAGE_SIZE for int
Add the following tests to ensure we do not regress:

  o Test using a buffer full of space (PAGE_SIZE-1) followed by a
    single digit works

  o Test using a buffer full of spaces (PAGE_SIZE or over) will fail

As tests increase instead of unloading the module and reloading it we
can just do a shell reset_vals() with a reset to values we know are set
at init on the driver.

Link: http://lkml.kernel.org/r/20170630224431.17374-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
64b671204a test_sysctl: add generic script to expand on tests
This adds a generic script to let us more easily add more tests cases.
Since we really have only two types of tests cases just fold them into
the one file.  Each test unit is now identified into its separate
function:

    # ./sysctl.sh -l
  Test ID list:

  TEST_ID x NUM_TEST
  TEST_ID:   Test ID
  NUM_TESTS: Number of recommended times to run the test

  0001 x 1 - tests proc_dointvec_minmax()
  0002 x 1 - tests proc_dostring()

For now we start off with what we had before, and run only each test
once.  We can now watch a test case until it fails:

  ./sysctl.sh -w 0002

We can also run a test case x number of times, say we want to run a test
case 100 times:

  ./sysctl.sh -c 0001 100

To run a test case only once, for example:

  ./sysctl.sh -s 0002

The default settings are specified at the top of sysctl.sh.

Link: http://lkml.kernel.org/r/20170630224431.17374-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
9308f2f9e7 test_sysctl: add dedicated proc sysctl test driver
The existing tools/testing/selftests/sysctl/ tests include two test
cases, but these use existing production kernel sysctl interfaces.  We
want to expand test coverage but we can't just be looking for random
safe production values to poke at, that's just insane!

Instead just dedicate a test driver for debugging purposes and port the
existing scripts to use it.  This will make it easier for further tests
to be added.

Subsequent patches will extend our test coverage for sysctl.

The stress test driver uses a new license (GPL on Linux, copyleft-next
outside of Linux).  Linus was fine with this [0] and later due to Ted's
and Alans's request ironed out an "or" language clause to use [1] which
is already present upstream.

[0] https://lkml.kernel.org/r/CA+55aFyhxcvD+q7tp+-yrSFDKfR0mOHgyEAe=f_94aKLsOu0Og@mail.gmail.com
[1] https://lkml.kernel.org/r/1495234558.7848.122.camel@linux.intel.com

Link: http://lkml.kernel.org/r/20170630224431.17374-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
61d9b56a89 sysctl: add unsigned int range support
To keep parity with regular int interfaces provide the an unsigned int
proc_douintvec_minmax() which allows you to specify a range of allowed
valid numbers.

Adding proc_douintvec_minmax_sysadmin() is easy but we can wait for an
actual user for that.

Link: http://lkml.kernel.org/r/20170519033554.18592-6-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
4f2fec00af sysctl: simplify unsigned int support
Commit e7d316a02f ("sysctl: handle error writing UINT_MAX to u32
fields") added proc_douintvec() to start help adding support for
unsigned int, this however was only half the work needed.  Two fixes
have come in since then for the following issues:

  o Printing the values shows a negative value, this happens since
    do_proc_dointvec() and this uses proc_put_long()

This was fixed by commit 5380e5644a ("sysctl: don't print negative
flag for proc_douintvec").

  o We can easily wrap around the int values: UINT_MAX is 4294967295, if
    we echo in 4294967295 + 1 we end up with 0, using 4294967295 + 2 we
    end up with 1.
  o We echo negative values in and they are accepted

This was fixed by commit 425fffd886 ("sysctl: report EINVAL if value
is larger than UINT_MAX for proc_douintvec").

It still also failed to be added to sysctl_check_table()...  instead of
adding it with the current implementation just provide a proper and
simplified unsigned int support without any array unsigned int support
with no negative support at all.

Historically sysctl proc helpers have supported arrays, due to the
complexity this adds though we've taken a step back to evaluate array
users to determine if its worth upkeeping for unsigned int.  An
evaluation using Coccinelle has been done to perform a grammatical
search to ask ourselves:

  o How many sysctl proc_dointvec() (int) users exist which likely
    should be moved over to proc_douintvec() (unsigned int) ?
	Answer: about 8
	- Of these how many are array users ?
		Answer: Probably only 1
  o How many sysctl array users exist ?
	Answer: about 12

This last question gives us an idea just how popular arrays: they are not.
Array support should probably just be kept for strings.

The identified uint ports are:

  drivers/infiniband/core/ucma.c - max_backlog
  drivers/infiniband/core/iwcm.c - default_backlog
  net/core/sysctl_net_core.c - rps_sock_flow_sysctl()
  net/netfilter/nf_conntrack_timestamp.c - nf_conntrack_timestamp -- bool
  net/netfilter/nf_conntrack_acct.c nf_conntrack_acct -- bool
  net/netfilter/nf_conntrack_ecache.c - nf_conntrack_events -- bool
  net/netfilter/nf_conntrack_helper.c - nf_conntrack_helper -- bool
  net/phonet/sysctl.c proc_local_port_range()

The only possible array users is proc_local_port_range() but it does not
seem worth it to add array support just for this given the range support
works just as well.  Unsigned int support should be desirable more for
when you *need* more than INT_MAX or using int min/max support then does
not suffice for your ranges.

If you forget and by mistake happen to register an unsigned int proc
entry with an array, the driver will fail and you will get something as
follows:

sysctl table check failed: debug/test_sysctl//uint_0002 array now allowed
CPU: 2 PID: 1342 Comm: modprobe Tainted: G        W   E <etc>
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS <etc>
Call Trace:
 dump_stack+0x63/0x81
 __register_sysctl_table+0x350/0x650
 ? kmem_cache_alloc_trace+0x107/0x240
 __register_sysctl_paths+0x1b3/0x1e0
 ? 0xffffffffc005f000
 register_sysctl_table+0x1f/0x30
 test_sysctl_init+0x10/0x1000 [test_sysctl]
 do_one_initcall+0x52/0x1a0
 ? kmem_cache_alloc_trace+0x107/0x240
 do_init_module+0x5f/0x200
 load_module+0x1867/0x1bd0
 ? __symbol_put+0x60/0x60
 SYSC_finit_module+0xdf/0x110
 SyS_finit_module+0xe/0x10
 entry_SYSCALL_64_fastpath+0x1e/0xad
RIP: 0033:0x7f042b22d119
<etc>

Fixes: e7d316a02f ("sysctl: handle error writing UINT_MAX to u32 fields")
Link: http://lkml.kernel.org/r/20170519033554.18592-5-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Liping Zhang <zlpnobody@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
d383d48470 sysctl: fold sysctl_writes_strict checks into helper
The mode sysctl_writes_strict positional checks keep being copy and pasted
as we add new proc handlers.  Just add a helper to avoid code duplication.

Link: http://lkml.kernel.org/r/20170519033554.18592-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Suggested-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
a19ac33749 sysctl: kdoc'ify sysctl_writes_strict
Document the different sysctl_writes_strict modes in code.

Link: http://lkml.kernel.org/r/20170519033554.18592-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Luis R. Rodriguez
89c5b53b16 sysctl: fix lax sysctl_check_table() sanity check
Patch series "sysctl: few fixes", v5.

I've been working on making kmod more deterministic, and as I did that I
couldn't help but notice a few issues with sysctl.  My end goal was just
to fix unsigned int support, which back then was completely broken.
Liping Zhang has sent up small atomic fixes, however it still missed yet
one more fix and Alexey Dobriyan had also suggested to just drop array
support given its complexity.

I have inspected array support using Coccinelle and indeed its not that
popular, so if in fact we can avoid it for new interfaces, I agree its
best.

I did develop a sysctl stress driver but will hold that off for another
series.

This patch (of 5):

Commit 7c60c48f58 ("sysctl: Improve the sysctl sanity checks")
improved sanity checks considerbly, however the enhancements on
sysctl_check_table() meant adding a functional change so that only the
last table entry's sanity error is propagated.  It also changed the way
errors were propagated so that each new check reset the err value, this
means only last sanity check computed is used for an error.  This has
been in the kernel since v3.4 days.

Fix this by carrying on errors from previous checks and iterations as we
traverse the table and ensuring we keep any error from previous checks.
We keep iterating on the table even if an error is found so we can
complain for all errors found in one shot.  This works as -EINVAL is
always returned on error anyway, and the check for error is any non-zero
value.

Fixes: 7c60c48f58 ("sysctl: Improve the sysctl sanity checks")
Link: http://lkml.kernel.org/r/20170519033554.18592-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Bharat Bhushan
a711bdc095 kexec/kdump: minor Documentation updates for arm64 and Image
Minor updates in Documentation for arm64 as relocatable kernel.  Also
this patch updates documentation for using uncompressed image "Image"
which is used for ARM64.

Link: http://lkml.kernel.org/r/1495104793-6563-1-git-send-email-Bharat.Bhushan@nxp.com
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Pratyush Anand <panand@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Xunlei Pang
1229384f5b kdump: protect vmcoreinfo data under the crash memory
Currently vmcoreinfo data is updated at boot time subsys_initcall(), it
has the risk of being modified by some wrong code during system is
running.

As a result, vmcore dumped may contain the wrong vmcoreinfo.  Later on,
when using "crash", "makedumpfile", etc utility to parse this vmcore, we
probably will get "Segmentation fault" or other unexpected errors.

E.g.  1) wrong code overwrites vmcoreinfo_data; 2) further crashes the
system; 3) trigger kdump, then we obviously will fail to recognize the
crash context correctly due to the corrupted vmcoreinfo.

Now except for vmcoreinfo, all the crash data is well
protected(including the cpu note which is fully updated in the crash
path, thus its correctness is guaranteed).  Given that vmcoreinfo data
is a large chunk prepared for kdump, we better protect it as well.

To solve this, we relocate and copy vmcoreinfo_data to the crash memory
when kdump is loading via kexec syscalls.  Because the whole crash
memory will be protected by existing arch_kexec_protect_crashkres()
mechanism, we naturally protect vmcoreinfo_data from write(even read)
access under kernel direct mapping after kdump is loaded.

Since kdump is usually loaded at the very early stage after boot, we can
trust the correctness of the vmcoreinfo data copied.

On the other hand, we still need to operate the vmcoreinfo safe copy
when crash happens to generate vmcoreinfo_note again, we rely on vmap()
to map out a new kernel virtual address and update to use this new one
instead in the following crash_save_vmcoreinfo().

BTW, we do not touch vmcoreinfo_note, because it will be fully updated
using the protected vmcoreinfo_data after crash which is surely correct
just like the cpu crash note.

Link: http://lkml.kernel.org/r/1493281021-20737-3-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Xunlei Pang
5203f4995d powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdr
vmcoreinfo_max_size stands for the vmcoreinfo_data, the correct one we
should use is vmcoreinfo_note whose total size is VMCOREINFO_NOTE_SIZE.

Like explained in commit 77019967f0 ("kdump: fix exported size of
vmcoreinfo note"), it should not affect the actual function, but we
better fix it, also this change should be safe and backward compatible.

After this, we can get rid of variable vmcoreinfo_max_size, let's use
the corresponding macros directly, fewer variables means more safety for
vmcoreinfo operation.

[xlpang@redhat.com: fix build warning]
  Link: http://lkml.kernel.org/r/1494830606-27736-1-git-send-email-xlpang@redhat.com
Link: http://lkml.kernel.org/r/1493281021-20737-2-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:25:59 -07:00
Xunlei Pang
203e9e4121 kexec: move vmcoreinfo out of the kernel's .bss section
As Eric said,
 "what we need to do is move the variable vmcoreinfo_note out of the
  kernel's .bss section. And modify the code to regenerate and keep this
  information in something like the control page.

  Definitely something like this needs a page all to itself, and ideally
  far away from any other kernel data structures. I clearly was not
  watching closely the data someone decided to keep this silly thing in
  the kernel's .bss section."

This patch allocates extra pages for these vmcoreinfo_XXX variables, one
advantage is that it enhances some safety of vmcoreinfo, because
vmcoreinfo now is kept far away from other kernel data structures.

Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:25:59 -07:00
Christoph Lameter
112166f88c kernel/fork.c: virtually mapped stacks: do not disable interrupts
The reason to disable interrupts seems to be to avoid switching to a
different processor while handling per cpu data using individual loads and
stores.  If we use per cpu RMV primitives we will not have to disable
interrupts.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:25:59 -07:00
Geert Uytterhoeven
91a90140f9 mm/memory.c: mark create_huge_pmd() inline to prevent build failure
With gcc 4.1.2:

    mm/memory.o: In function `create_huge_pmd':
    memory.c:(.text+0x93e): undefined reference to `do_huge_pmd_anonymous_page'

Interestingly, create_huge_pmd() is emitted in the assembler output, but
never called.

Converting transparent_hugepage_enabled() from a macro to a static
inline function reduced the ability of the compiler to remove unused
code.

Fix this by marking create_huge_pmd() inline.

Fixes: 16981d7635 ("mm: improve readability of transparent_hugepage_enabled()")
Link: http://lkml.kernel.org/r/1499842660-10665-1-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:25:59 -07:00
Ian Abbott
c7acec713d kernel.h: handle pointers to arrays better in container_of()
If the first parameter of container_of() is a pointer to a
non-const-qualified array type (and the third parameter names a
non-const-qualified array member), the local variable __mptr will be
defined with a const-qualified array type.  In ISO C, these types are
incompatible.  They work as expected in GNU C, but some versions will
issue warnings.  For example, GCC 4.9 produces the warning
"initialization from incompatible pointer type".

Here is an example of where the problem occurs:

-------------------------------------------------------
   #include <linux/kernel.h>
   #include <linux/module.h>

  MODULE_LICENSE("GPL");

  struct st {
  	int a;
  	char b[16];
  };

  static int __init example_init(void) {
  	struct st t = { .a = 101, .b = "hello" };
  	char (*p)[16] = &t.b;
  	struct st *x = container_of(p, struct st, b);
  	printk(KERN_DEBUG "%p %p\n", (void *)&t, (void *)x);
  	return 0;
  }

  static void __exit example_exit(void) {
  }

  module_init(example_init);
  module_exit(example_exit);
-------------------------------------------------------

Building the module with gcc-4.9 results in these warnings (where '{m}'
is the module source and '{k}' is the kernel source):

-------------------------------------------------------
  In file included from {m}/example.c:1:0:
  {m}/example.c: In function `example_init':
  {k}/include/linux/kernel.h:854:48: warning: initialization from incompatible pointer type
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                  ^
  {m}/example.c:14:17: note: in expansion of macro `container_of'
    struct st *x = container_of(p, struct st, b);
                   ^
  {k}/include/linux/kernel.h:854:48: warning: (near initialization for `x')
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                  ^
  {m}/example.c:14:17: note: in expansion of macro `container_of'
    struct st *x = container_of(p, struct st, b);
                   ^
-------------------------------------------------------

Replace the type checking performed by the macro to avoid these
warnings.  Make sure `*(ptr)` either has type compatible with the
member, or has type compatible with `void`, ignoring qualifiers.  Raise
compiler errors if this is not true.  This is stronger than the previous
behaviour, which only resulted in compiler warnings for a type mismatch.

[arnd@arndb.de: fix new warnings for container_of()]
  Link: http://lkml.kernel.org/r/20170620200940.90557-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170525120316.24473-7-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:25:59 -07:00
Stephen Rothwell
0a2c13d9cd include/linux/dcache.h: use unsigned chars in struct name_snapshot
"kernel.h: handle pointers to arrays better in container_of()" triggers:

In file included from include/uapi/linux/stddef.h:1:0,
                 from include/linux/stddef.h:4,
                 from include/uapi/linux/posix_types.h:4,
                 from include/uapi/linux/types.h:13,
                 from include/linux/types.h:5,
                 from include/linux/syscalls.h:71,
                 from fs/dcache.c:17:
fs/dcache.c: In function 'release_dentry_name_snapshot':
include/linux/compiler.h:542:38: error: call to '__compiletime_assert_305' declared with attribute error: pointer type mismatch in container_of()
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                      ^
include/linux/compiler.h:525:4: note: in definition of macro '__compiletime_assert'
    prefix ## suffix();    \
    ^
include/linux/compiler.h:542:2: note: in expansion of macro '_compiletime_assert'
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^
include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert'
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^
include/linux/kernel.h:860:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
  BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
  ^
fs/dcache.c:305:7: note: in expansion of macro 'container_of'
   p = container_of(name->name, struct external_name, name[0]);

Switch name_snapshot to use unsigned chars, matching struct qstr and
struct external_name.

Link: http://lkml.kernel.org/r/20170710152134.0f78c1e6@canb.auug.org.au
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:25:59 -07:00
SeongJae Park
51e988f409 kokr/memory-barriers.txt: Fix obsolete link to atomic_ops.txt
Obsolete links to atomic_ops.txt exist in ko_KR/memory-barriers.txt
though the file has moved to core-api/atomic_ops.rst.  This commit fixes
the obsolete links.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-07-12 16:56:40 -06:00
SeongJae Park
3afadfd902 memory-barriers.txt: Fix broken link to atomic_ops.txt
Few obsolete links to atomic_ops.txt exist in memory-barriers.txt though
the file has moved to core-api/atomic_ops.rst.  This commit fixes the
obsolete links.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-07-12 16:56:30 -06:00
Jonathan Corbet
6d16e9143e docs: Turn off section numbering for the input docs
The input docs enable section numbering at multiple levels, leading to a
lot of bright-red "nested numbered toctree" warnings in newer Sphinx
versions.  Just take that directive out for now to help alleviate the
global red-pixel shortage.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-07-12 16:51:44 -06:00
Jonathan Corbet
7d2b39ab9b docs: Include uaccess docs from the right file
Documentation/core-api/kernel-api.rst was including kerneldoc comments from
arch/x86/include/asm/uaccess_32.h, but the relevant comments moved to
.../uaccess.h some time ago.  Correct the include to pick up the comments
and eliminate a warning.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-07-12 16:51:38 -06:00
LABBE Corentin
d93b07f8a6 net: stmmac: revert "support future possible different internal phy mode"
Since internal phy-mode is reserved for non-xMII protocol we cannot use
it with dwmac-sun8i.
Furthermore, all DT patchs which comes with this patch were cleaned, so
the current state is broken.
This reverts commit 1c2fa5f846 ("net: stmmac: support future possible different internal phy mode")

Fixes: 1c2fa5f846 ("net: stmmac: support future possible different internal phy mode")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 14:41:56 -07:00
Bert Kenward
c70d68150f sfc: don't read beyond unicast address list
If we have more than 32 unicast MAC addresses assigned to an interface
we will read beyond the end of the address table in the driver when
adding filters. The next 256 entries store multicast addresses, so we
will end up attempting to insert duplicate filters, which is mostly
harmless. If we add more than 288 unicast addresses we will then read
past the multicast address table, which is likely to be more exciting.

Fixes: 12fb0da45c ("sfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 14:41:06 -07:00
David S. Miller
07b8a7cf9a Merge branch 'net-doc-fixes'
Stephen Hemminger says:

====================
minor net kernel-doc fixes

Fix a couple of small errors in kernel-doc for networking
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 14:39:44 -07:00
stephen hemminger
d3f6cd9e60 datagram: fix kernel-doc comments
An underscore in the kernel-doc comment section has special meaning
and mis-use generates an errors.

./net/core/datagram.c:207: ERROR: Unknown target name: "msg".
./net/core/datagram.c:379: ERROR: Unknown target name: "msg".
./net/core/datagram.c:816: ERROR: Unknown target name: "t".

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 14:39:43 -07:00
stephen hemminger
771edcafa7 socket: add documentation for missing elements
Fill in missing kernel-doc for missing elements in struct sock.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 14:39:43 -07:00
Alexey Khoroshilov
57fe14790b smsc911x: Add check for ioremap_nocache() return code
There is no check for return code of smsc911x_drv_probe()
in smsc911x_drv_probe(). The patch adds one.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12 14:35:43 -07:00
Rafael J. Wysocki
20dd95ef0e Merge branch 'for-rc' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq
Pull devfreq changes for v4.13 from MyungJoo Ham.

* 'for-rc' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq:
  PM / devfreq: constify attribute_group structures.
  PM / devfreq: tegra: fix error return code in tegra_devfreq_probe()
  PM / devfreq: rk3399_dmc: fix error return code in rk3399_dmcfreq_probe()
2017-07-12 23:34:39 +02:00
Dmitry Torokhov
dda5202b00 Merge branch 'next' into for-linus
Prepare second round of input updates for 4.13 merge window.
2017-07-12 14:17:17 -07:00
Alexandre Belloni
40bf6a3548 rtc: Remove wrong deprecation comment
rtc_time_to_tm and rtc_tm_to_time are not deprecated and make perfect sense
for RTCs that are simple 32bit counters.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2017-07-12 23:11:23 +02:00
Rafael J. Wysocki
0ce3fcaff9 PCI / PM: Restore PME Enable after config space restoration
Commit dc15e71eef (PCI / PM: Restore PME Enable if skipping wakeup
setup) introduced a mechanism by which the PME Enable bit can be
restored by pci_enable_wake() if dev->wakeup_prepared is set in
case it has been overwritten by PCI config space restoration.

However, that commit overlooked the fact that on some systems (Dell
XPS13 9360 in particular) the AML handling wakeup events checks PME
Status and PME Enable and it won't trigger a Notify() for devices
where those bits are not set while it is running.

That happens during resume from suspend-to-idle when pci_restore_state()
invoked by pci_pm_default_resume_early() clears PME Enable before the
wakeup events are processed by AML, effectively causing those wakeup
events to be ignored.

Fix this issue by restoring the PME Enable configuration right after
pci_restore_state() has been called instead of doing that in
pci_enable_wake().

Fixes: dc15e71eef (PCI / PM: Restore PME Enable if skipping wakeup setup)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2017-07-12 23:09:21 +02:00
Hans de Goede
c3a73ed8a8 platform/x86: silead_dmi: Add entry for Ployer Momo7w tablet touchscreen
This Ployer Momo7w revision has the same hardware as the Trekstor
ST70416-6, so we re-use the surftab_wintron70_st70416_6_data.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2017-07-12 13:57:42 -07:00
Claudio Imbrenda
286de8f6ac KVM: trigger uevents when creating or destroying a VM
This patch adds a few lines to the KVM common code to fire a
KOBJ_CHANGE uevent whenever a KVM VM is created or destroyed. The event
carries five environment variables:

CREATED indicates how many times a new VM has been created. It is
	useful for example to trigger specific actions when the first
	VM is started
COUNT indicates how many VMs are currently active. This can be used for
	logging or monitoring purposes
PID has the pid of the KVM process that has been started or stopped.
	This can be used to perform process-specific tuning.
STATS_PATH contains the path in debugfs to the directory with all the
	runtime statistics for this VM. This is useful for performance
	monitoring and profiling.
EVENT described the type of event, its value can be either "create" or
	"destroy"

Specific udev rules can be then set up in userspace to deal with the
creation or destruction of VMs as needed.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:31 +02:00
Janakarajan Natarajan
89c8a4984f KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
Enable the Virtual VMLOAD VMSAVE feature. This is done by setting bit 1
at position B8h in the vmcb.

The processor must have nested paging enabled, be in 64-bit mode and
have support for the Virtual VMLOAD VMSAVE feature for the bit to be set
in the vmcb.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:30 +02:00
Janakarajan Natarajan
76ff359249 KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
Define a new cpufeature definition for Virtual VMLOAD VMSAVE.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:29 +02:00
Janakarajan Natarajan
0dc92119b5 KVM: SVM: Rename lbr_ctl field in the vmcb control area
Rename the lbr_ctl variable to better reflect the purpose of the field -
provide support for virtualization extensions.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:29 +02:00
Janakarajan Natarajan
8a77e90966 KVM: SVM: Prepare for new bit definition in lbr_ctl
The lbr_ctl variable in the vmcb control area is used to enable or
disable Last Branch Record (LBR) virtualization. However, this is to be
done using only bit 0 of the variable. To correct this and to prepare
for a new feature, change the current usage to work only on a particular
bit.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:28 +02:00
Ladi Prosek
b742c1e6e7 KVM: SVM: handle singlestep exception when skipping emulated instructions
kvm_skip_emulated_instruction handles the singlestep debug exception
which is something we almost always want. This commit (specifically
the change in rdmsr_interception) makes the debug.flat KVM unit test
pass on AMD.

Two call sites still call skip_emulated_instruction directly:

* In svm_queue_exception where it's used only for moving the rip forward

* In task_switch_interception which is analogous to handle_task_switch
  in VMX

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:27 +02:00
Radim Krčmář
fb5307298e KVM: x86: take slots_lock in kvm_free_pit
kvm_vm_release() did not have slots_lock when calling
kvm_io_bus_unregister_dev() and this went unnoticed until 4a12f95177
("KVM: mark kvm->busses as rcu protected") added dynamic checks.
Luckily, there should be no race at that point:

  =============================
  WARNING: suspicious RCU usage
  4.12.0.kvm+ #0 Not tainted
  -----------------------------
  ./include/linux/kvm_host.h:479 suspicious rcu_dereference_check() usage!

   lockdep_rcu_suspicious+0xc5/0x100
   kvm_io_bus_unregister_dev+0x173/0x190 [kvm]
   kvm_free_pit+0x28/0x80 [kvm]
   kvm_arch_sync_events+0x2d/0x30 [kvm]
   kvm_put_kvm+0xa7/0x2a0 [kvm]
   kvm_vm_release+0x21/0x30 [kvm]

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:26 +02:00
Gleb Fotengauer-Malinovskiy
949c033694 KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
In case of KVM_S390_GET_CMMA_BITS, the kernel does not only read struct
kvm_s390_cmma_log passed from userspace (which constitutes _IOC_WRITE),
it also writes back a return value (which constitutes _IOC_READ) making
this an _IOWR ioctl instead of _IOW.

Fixes: 4036e387 ("KVM: s390: ioctls to get and set guest storage attributes")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:26 +02:00
Jim Mattson
48ae0fb49b kvm: vmx: Properly handle machine check during VM-entry
vmx_complete_atomic_exit should call kvm_machine_check for any
VM-entry failure due to a machine-check event. Such an exit should be
recognized solely by its basic exit reason (i.e. the low 16 bits of
the VMCS exit reason field). None of the other VMCS exit information
fields contain valid information when the VM-exit is due to "VM-entry
failure due to machine-check event".

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com>
[Changed VM_EXIT_INTR_INFO condition to better describe its reason.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:25 +02:00
Radim Krčmář
0bc48bea36 KVM: x86: update master clock before computing kvmclock_offset
kvm master clock usually has a different frequency than the kernel boot
clock.  This is not a problem until the master clock is updated;
update uses the current kernel boot clock to compute new kvm clock,
which erases any kvm clock cycles that might have built up due to
frequency difference over a long period.

KVM_SET_CLOCK is one of places where we can safely update master clock
as the guest-visible clock is going to be shifted anyway.

The problem with current code is that it updates the kvm master clock
after updating the offset.  If the master clock was enabled before
calling KVM_SET_CLOCK, then it might have built up a significant delta
from kernel boot clock.
In the worst case, the time set by userspace would be shifted by so much
that it couldn't have been set at any point during KVM_SET_CLOCK.

To fix this, move kvm_gen_update_masterclock() before computing
kvmclock_offset, which means that the master clock and kernel boot clock
will be sufficiently close together.
Another solution would be to replace get_kvmclock_ns() with
"ktime_get_boot_ns() + ka->kvmclock_offset", which is marginally more
accurate, but would break symmetry with KVM_GET_CLOCK.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:24 +02:00
J. Bruce Fields
630458e730 nfsd4: factor ctime into change attribute
Factoring ctime into the nfsv4 change attribute gives us better
properties than just i_version alone.

Eventually we'll likely also expose this (as opposed to raw i_version)
to userspace, at which point we'll want to move it to a common helper,
called from either userspace or individual filesystems.  For now, nfsd
is the only user.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12 15:55:00 -04:00
Chuck Lever
35a30fc389 svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
Clean up: No need to save the I/O direction. The functions that
release svc_rdma_chunk_ctxt already know what direction to use.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12 15:55:00 -04:00
Chuck Lever
91b022ec8b svcrdma: use offset_in_page() macro
Clean up: Use offset_in_page() macro instead of open-coding.

Reported-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12 15:54:59 -04:00
Chuck Lever
9450ca8e2f svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw API
Clean up: Registration mode details are now handled by the rdma_rw
API, and thus can be removed from svcrdma.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12 15:54:59 -04:00
Chuck Lever
0d956e694a svcrdma: Clean-up svc_rdma_unmap_dma
There's no longer a need to compare each SGE's lkey with the PD's
local_dma_lkey. Now that FRWR is gone, all DMA mappings are for
pages that were registered with this key.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12 15:54:58 -04:00