Commit Graph

16605 Commits

Author SHA1 Message Date
Richard Guy Briggs
ae887e0bdc audit: make use of remaining sleep time from wait_for_auditd
If wait_for_auditd() times out, go immediately to the error function rather
than retesting the loop conditions.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:27:53 -05:00
Richard Guy Briggs
e789e561a5 audit: reset audit backlog wait time after error recovery
When the audit queue overflows and times out (audit_backlog_wait_time), the
audit queue overflow timeout is set to zero.  Once the audit queue overflow
timeout condition recovers, the timeout should be reset to the original value.

See also:
	https://lkml.org/lkml/2013/9/2/473

Cc: stable@vger.kernel.org # v3.8-rc4+
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:27:30 -05:00
Richard Guy Briggs
33faba7fa7 audit: listen in all network namespaces
Convert audit from only listening in init_net to use register_pernet_subsys()
to dynamically manage the netlink socket list.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:27:24 -05:00
Richard Guy Briggs
2f2ad10133 audit: restore order of tty and ses fields in log output
When being refactored from audit_log_start() to audit_log_task_info(), in
commit e23eb920 the tty and ses fields in the log output got transposed.
Restore to original order to avoid breaking search tools.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:27:16 -05:00
Richard Guy Briggs
f9441639e6 audit: fix netlink portid naming and types
Normally, netlink ports use the PID of the userspace process as the port ID.
If the PID is already in use by a port, the kernel will allocate another port
ID to avoid conflict.  Re-name all references to netlink ports from pid to
portid to reflect this reality and avoid confusion with actual PIDs.  Ports
use the __u32 type, so re-type all portids accordingly.

(This patch is very similar to ebiederman's 5deadd69)

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:26:52 -05:00
Eric W. Biederman
ca24a23ebc audit: Simplify and correct audit_log_capset
- Always report the current process as capset now always only works on
  the current process.  This prevents reporting 0 or a random pid in
  a random pid namespace.

- Don't bother to pass the pid as is available.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
(cherry picked from commit bcc85f0af31af123e32858069eb2ad8f39f90e67)
(cherry picked from commit f911cac4556a7a23e0b3ea850233d13b32328692)

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[eparis: fix build error when audit disabled]
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:26:48 -05:00
Eric Paris
fc582aef7d Linux 3.12
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSdt9HAAoJEHm+PkMAQRiGnzEH/345Keg5dp+oKACnokBfzOtp
 V0p3g5EBsGtzEVnV+1B96trczDUtWdDFFr5GfGSj565NBQpFyc+iZC1mC99RDJCs
 WUquGFqlLMK2aV0SbKwCO4K1rJ5A0TRVj0ZRJOUJUY7jwNf5Qahny0WBVjO/8qAY
 UvJK1rktBClhKdH53YtpDHHgXBeZ2LOrzt1fQ/AMpujGbZauGvnLdNOli5r2kCFK
 jzoOgFLvX+PHU/5/d4/QyJPeQNPva5hjk5Ho9UuSJYhnFtPO3EkD4XZLcpcbNEJb
 LqBvbnZWm6CS435lfU1l93RqQa5xMO9ITk0oe4h69syTSHwWk9aJ+ZTc/4Up+t8=
 =57MC
 -----END PGP SIGNATURE-----

Merge tag 'v3.12'

Linux 3.12

Conflicts:
	fs/exec.c
2013-11-22 18:57:54 -05:00
Eric Paris
9175c9d2ae audit: fix type of sessionid in audit_set_loginuid()
sfr pointed out that with CONFIG_UIDGID_STRICT_TYPE_CHECKS set the audit
tree would not build.  This is because the oldsessionid in
audit_set_loginuid() was accidentally being declared as a kuid_t.  This
patch fixes that declaration mistake.

Example of problem:
kernel/auditsc.c: In function 'audit_set_loginuid':
kernel/auditsc.c:2003:15: error: incompatible types when assigning to
type 'kuid_t' from type 'int'
  oldsessionid = audit_get_sessionid(current);

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-06 11:47:24 -05:00
Richard Guy Briggs
9410d228a4 audit: call audit_bprm() only once to add AUDIT_EXECVE information
Move the audit_bprm() call from search_binary_handler() to exec_binprm().  This
allows us to get rid of the mm member of struct audit_aux_data_execve since
bprm->mm will equal current->mm.

This also mitigates the issue that ->argc could be modified by the
load_binary() call in search_binary_handler().

audit_bprm() was being called to add an AUDIT_EXECVE record to the audit
context every time search_binary_handler() was recursively called.  Only one
reference is necessary.

Reported-by: Oleg Nesterov <onestero@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
---
This patch is against 3.11, but was developed on Oleg's post-3.11 patches that
introduce exec_binprm().
2013-11-05 11:15:03 -05:00
Richard Guy Briggs
d9cfea91e9 audit: move audit_aux_data_execve contents into audit_context union
audit_bprm() was being called to add an AUDIT_EXECVE record to the audit
context every time search_binary_handler() was recursively called.  Only one
reference is necessary, so just update it.  Move the the contents of
audit_aux_data_execve into the union in audit_context, removing dependence on a
kmalloc along the way.

Reported-by: Oleg Nesterov <onestero@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:09:36 -05:00
Richard Guy Briggs
9462dc5981 audit: remove unused envc member of audit_aux_data_execve
Get rid of write-only audit_aux_data_exeve structure member envc.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:09:31 -05:00
Eric W. Biederman
bd131fb1aa audit: Kill the unused struct audit_aux_data_capset
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
(cherry picked from ebiederman commit 6904431d6b41190e42d6b94430b67cb4e7e6a4b7)
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:09:20 -05:00
Eric Paris
78122037b7 audit: do not reject all AUDIT_INODE filter types
commit ab61d38ed8 tried to merge the
invalid filter checking into a single function.  However AUDIT_INODE
filters were not verified in the new generic checker.  Thus such rules
were being denied even though they were perfectly valid.

Ex:
$ auditctl -a exit,always -F arch=b64 -S open -F key=/foo -F inode=6955 -F devmajor=9 -F devminor=1
Error sending add rule data request (Invalid argument)

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:09:16 -05:00
Jeff Layton
d3aea84a4a audit: log the audit_names record type
...to make it clear what the intent behind each record's operation was.

In many cases you can infer this, based on the context of the syscall
and the result. In other cases it's not so obvious. For instance, in
the case where you have a file being renamed over another, you'll have
two different records with the same filename but different inode info.
By logging this information we can clearly tell which one was created
and which was deleted.

This fixes what was broken in commit bfcec708.
Commit 79f6530c should also be backported to stable v3.7+.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:09:04 -05:00
Richard Guy Briggs
b95d77fe34 audit: use given values in tty_audit enable api
In send/GET, we don't want the kernel to lie about what value is set.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:42 -05:00
Mathias Krause
4d8fe7376a audit: use nlmsg_len() to get message payload length
Using the nlmsg_len member of the netlink header to test if the message
is valid is wrong as it includes the size of the netlink header itself.
Thereby allowing to send short netlink messages that pass those checks.

Use nlmsg_len() instead to test for the right message length. The result
of nlmsg_len() is guaranteed to be non-negative as the netlink message
already passed the checks of nlmsg_ok().

Also switch to min_t() to please checkpatch.pl.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org  # v2.6.6+ for the 1st hunk, v2.6.23+ for the 2nd
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:37 -05:00
Eric Paris
e13f91e3c5 audit: use memset instead of trying to initialize field by field
We currently are setting fields to 0 to initialize the structure
declared on the stack.  This is a bad idea as if the structure has holes
or unpacked space these will not be initialized.  Just use memset.  This
is not a performance critical section of code.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:35 -05:00
Mathias Krause
64fbff9ae0 audit: fix info leak in AUDIT_GET requests
We leak 4 bytes of kernel stack in response to an AUDIT_GET request as
we miss to initialize the mask member of status_set. Fix that.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org  # v2.6.6+
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:30 -05:00
Richard Guy Briggs
db510fc5cd audit: update AUDIT_INODE filter rule to comparator function
It appears this one comparison function got missed in f368c07d (and 9c937dcc).

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:24 -05:00
Eric Paris
21b85c31d2 audit: audit feature to set loginuid immutable
This adds a new 'audit_feature' bit which allows userspace to set it
such that the loginuid is absolutely immutable, even if you have
CAP_AUDIT_CONTROL.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:17 -05:00
Eric Paris
d040e5af38 audit: audit feature to only allow unsetting the loginuid
This is a new audit feature which only grants processes with
CAP_AUDIT_CONTROL the ability to unset their loginuid.  They cannot
directly set it from a valid uid to another valid uid.  The ability to
unset the loginuid is nice because a priviledged task, like that of
container creation, can unset the loginuid and then priv is not needed
inside the container when a login daemon needs to set the loginuid.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:13 -05:00
Eric Paris
81407c84ac audit: allow unsetting the loginuid (with priv)
If a task has CAP_AUDIT_CONTROL allow that task to unset their loginuid.
This would allow a child of that task to set their loginuid without
CAP_AUDIT_CONTROL.  Thus when launching a new login daemon, a
priviledged helper would be able to unset the loginuid and then the
daemon, which may be malicious user facing, do not need priv to function
correctly.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:09 -05:00
Eric Paris
83fa6bbe4c audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
After trying to use this feature in Fedora we found the hard coding
policy like this into the kernel was a bad idea.  Surprise surprise.
We ran into these problems because it was impossible to launch a
container as a logged in user and run a login daemon inside that container.
This reverts back to the old behavior before this option was added.  The
option will be re-added in a userspace selectable manor such that
userspace can choose when it is and when it is not appropriate.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:01 -05:00
Eric Paris
da0a610497 audit: loginuid functions coding style
This is just a code rework.  It makes things more readable.  It does not
make any functional changes.

It does change the log messages to include both the old session id as
well the new and it includes a new res field, which means we get
messages even when the user did not have permission to change the
loginuid.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:56 -05:00
Eric Paris
b0fed40214 audit: implement generic feature setting and retrieving
The audit_status structure was not designed with extensibility in mind.
Define a new AUDIT_SET_FEATURE message type which takes a new structure
of bits where things can be enabled/disabled/locked one at a time.  This
structure should be able to grow in the future while maintaining forward
and backward compatibility (based loosly on the ideas from capabilities
and prctl)

This does not actually add any features, but is just infrastructure to
allow new on/off types of audit system features.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:30 -05:00
Richard Guy Briggs
42f74461a5 audit: change decimal constant to macro for invalid uid
SFR reported this 2013-05-15:

> After merging the final tree, today's linux-next build (i386 defconfig)
> produced this warning:
>
> kernel/auditfilter.c: In function 'audit_data_to_entry':
> kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned only
> in ISO C90 [enabled by default]
>
> Introduced by commit 780a7654ce ("audit: Make testing for a valid
> loginuid explicit") from Linus' tree.

Replace this decimal constant in the code with a macro to make it more readable
(add to the unsigned cast to quiet the warning).

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:27 -05:00
Tyler Hicks
0868a5e150 audit: printk USER_AVC messages when audit isn't enabled
When the audit=1 kernel parameter is absent and auditd is not running,
AUDIT_USER_AVC messages are being silently discarded.

AUDIT_USER_AVC messages should be sent to userspace using printk(), as
mentioned in the commit message of 4a4cd633 ("AUDIT: Optimise the
audit-disabled case for discarding user messages").

When audit_enabled is 0, audit_receive_msg() discards all user messages
except for AUDIT_USER_AVC messages. However, audit_log_common_recv_msg()
refuses to allocate an audit_buffer if audit_enabled is 0. The fix is to
special case AUDIT_USER_AVC messages in both functions.

It looks like commit 50397bd1 ("[AUDIT] clean up audit_receive_msg()")
introduced this bug.

Cc: <stable@kernel.org> # v2.6.25+
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: linux-audit@redhat.com
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:23 -05:00
Oleg Nesterov
d48d805122 audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_context
If audit_filter_task() nacks the new thread it makes sense
to clear TIF_SYSCALL_AUDIT which can be copied from parent
by dup_task_struct().

A wrong TIF_SYSCALL_AUDIT is not really bad but it triggers
the "slow" audit paths in entry.S to ensure the task can not
miss audit_syscall_*() calls, this is pointless if the task
has no ->audit_context.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:18 -05:00
Gao feng
af0e493d30 Audit: remove duplicate comments
Remove it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:14 -05:00
Richard Guy Briggs
b8f89caafe audit: remove newline accidentally added during session id helper refactor
A newline was accidentally added during session ID helper refactorization in
commit 4d3fb709.  This needlessly uses up buffer space, messes up syslog
formatting and makes userspace processing less efficient.  Remove it.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:09 -05:00
Ilya V. Matveychikov
47145705e3 audit: remove duplicate inclusion of the netlink header
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:06:53 -05:00
Richard Guy Briggs
b50eba7e2d audit: format user messages to size of MAX_AUDIT_MESSAGE_LENGTH
Messages of type AUDIT_USER_TTY were being formatted to 1024 octets,
truncating messages approaching MAX_AUDIT_MESSAGE_LENGTH (8970 octets).

Set the formatting to 8560 characters, given maximum estimates for prefix and
suffix budgets.

See the problem discussion:
https://www.redhat.com/archives/linux-audit/2009-January/msg00030.html

And the new size rationale:
https://www.redhat.com/archives/linux-audit/2013-September/msg00016.html

Test ~8k messages with:
auditctl -m "$(for i in $(seq -w 001 820);do echo -n "${i}0______";done)"

Reported-by: LC Bruzenak <lenny@magitekltd.com>
Reported-by: Justin Stephenson <jstephen@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:06:49 -05:00
Peter Zijlstra
bf378d341e perf: Fix perf ring buffer memory ordering
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:19 +01:00
Linus Torvalds
aff22d3f1a Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "This tree contains a clockevents regression fix for certain ARM
  subarchitectures"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Sanitize ticks to nsec conversion
2013-10-27 10:29:25 -07:00
Linus Torvalds
e2756f5e0f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "The tree contains three fixes:

   - Two tooling fixes

   - Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
     this merge window.  (Patches were proposed to fix it but it was all
     a bit late and we felt it's safer to just delay the ABI one more
     kernel release and do it right)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Disable PERF_RECORD_MMAP2 support
  perf scripting perl: Fix build error on Fedora 12
  perf probe: Fix to initialize fname always before use it
2013-10-27 10:28:35 -07:00
Linus Torvalds
1c99ca43a4 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
 "This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
  kernels built with GCC 3.x (there are still such distros)"

Side note: it's not just a fix for old gcc versions, it's also removing
an incredibly broken/subtle check that LLVM had issues with, and that
made no sense.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mutex: Avoid gcc version dependent __builtin_constant_p() usage
2013-10-27 10:18:15 -07:00
Linus Torvalds
20582e34c8 ACPI and power management fixes for 3.12-rc7
- Fix for rounding errors in intel_pstate causing CPU utilization to
    be underestimated from Brennan Shacklett.
 
  - intel_pstate fix to always use the correct max pstate value when
    computing the min pstate from Dirk Brandewie.
 
  - Hibernation fix for deadlocking resume in cases when the probing
    of the device containing the image is deferred from Russ Dill.
 
  - acpi-cpufreq fix to prevent the module from staying in memory
    when the driver cannot be registered and then attempting to
    unregister things that have never been registered on exit.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSatHPAAoJEILEb/54YlRxGrYP/0VplYzQX/CqDwO/folY1wiJ
 nASYG0ZhMR7wKTr75PF/9bSoTvftk+KGbx8i48DGFXsc1a8e4lCA878zu1debH0x
 +oi+qNlrRvNgwr0Eg7ALuM68zxG+b/D8vqHRqlXcnDa+rMlS/8q/l8d/D8rou49O
 pERPWLF0LI2kdRxmeOzvqo8oA1KnhUvN/nPvtFKB7LwcQU+9iY3RMAJa2i6K/zsn
 Epq76aePKz98KB6U9LZIqzshe6AQQanFr0+kz/6dHXArygVfh7+ZpByRIYcH67KK
 iR8N6O02Fq06dVLIu3Ta5Lqq5/EWBVfauLcMBTwZ2LxmZsPGUqPac02X3HpGcB3+
 1R0zWGmKpsTcs9hUGaCp9Qs2IzQvopcbnjZ6YxMkDtfZip+2FI1aaHapRT1egl8B
 xV3Yd0ZqqyAUs50tneQE9AMQ8ndae6t8gK8c3Z+EZTF0cSBuz9puQkRBGc+RdGDs
 TTaKBbgI+TY5uws6VVWEijKkQjzCsVWM+n6yW8GlRmqrpBO/yAkuw0cBB5PXnrw0
 uoSqw3+XLFrsVzeJ7V/Zi0+jpCq9qOBubO88j9EMX7tmL1xDPLECWeztAAwiFy9C
 Vwapn6qygPV5nfnMu+zDyMe/gcKcGbXmYactaLKDap0FMsRVFDY/YikN5PeJFwJF
 gLbn26BQxHcKNNUFAYr+
 =PsHF
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from
 "These fix two bugs in the intel_pstate driver, a hibernate bug leading
  to nasty resume failures sometimes and acpi-cpufreq initialization bug
  that causes problems to happen during module unload when intel_pstate
  is in use.

  Specifics:

   - Fix for rounding errors in intel_pstate causing CPU utilization to
     be underestimated from Brennan Shacklett.

   - intel_pstate fix to always use the correct max pstate value when
     computing the min pstate from Dirk Brandewie.

   - Hibernation fix for deadlocking resume in cases when the probing of
     the device containing the image is deferred from Russ Dill.

   - acpi-cpufreq fix to prevent the module from staying in memory when
     the driver cannot be registered and then attempting to unregister
     things that have never been registered on exit"

* tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  acpi-cpufreq: Fail initialization if driver cannot be registered
  PM / hibernate: Move software_resume to late_initcall_sync
  intel_pstate: Correct calculation of min pstate value
  intel_pstate: Improve accuracy by not truncating until final result
2013-10-26 04:38:47 +01:00
Russ Dill
d3c345dbc7 PM / hibernate: Move software_resume to late_initcall_sync
software_resume is being called after deferred_probe_initcall in
drivers base. If the probing of the device that contains the resume
image is deferred, and the system has been instructed to wait for
it to show up, this wait will occur in software_resume. This causes
a deadlock.

Move software_resume into late_initcall_sync so that it happens
after all the other late_initcalls.

Signed-off-by: Russ Dill <Russ.Dill@ti.com>
Acked-by: Pavel Machek <Pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-25 01:58:49 +02:00
Thomas Gleixner
97b9410643 clockevents: Sanitize ticks to nsec conversion
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use
clockevents_config_and_register() where possible" caused a regression
for some of the converted subarchs.

The reason is, that the clockevents core code converts the minimal
hardware tick delta to a nanosecond value for core internal
usage. This conversion is affected by integer math rounding loss, so
the backwards conversion to hardware ticks will likely result in a
value which is less than the configured hardware limitation. The
affected subarchs used their own workaround (SIGH!) which got lost in
the conversion.

The solution for the issue at hand is simple: adding evt->mult - 1 to
the shifted value before the integer divison in the core conversion
function takes care of it. But this only works for the case where for
the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
case where "mult > 1 << shift" we can apply the rounding add only for
the minimum delta value to make sure that the backward conversion is
not less than the given hardware limit. For the upper bound we need to
omit the rounding add, because the backwards conversion is always
larger than the original latch value. That would violate the upper
bound of the hardware device.

Though looking closer at the details of that function reveals another
bogosity: The upper bounds check is broken as well. Checking for a
resulting "clc" value greater than KTIME_MAX after the conversion is
pointless. The conversion does:

      u64 clc = (latch << evt->shift) / evt->mult;

So there is no sanity check for (latch << evt->shift) exceeding the
64bit boundary. The latch argument is "unsigned long", so on a 64bit
arch the handed in argument could easily lead to an unnoticed shift
overflow. With the above rounding fix applied the calculation before
the divison is:

       u64 clc = (latch << evt->shift) + evt->mult - 1;

So we need to make sure, that neither the shift nor the rounding add
is overflowing the u64 boundary.

[ukl: move assignment to rnd after eventually changing mult, fix build
 issue and correct comment with the right math]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: nicolas.ferre@atmel.com
Cc: Marc Pignat <marc.pignat@hevs.ch>
Cc: john.stultz@linaro.org
Cc: kernel@pengutronix.de
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
2013-10-23 12:51:21 +02:00
Linus Torvalds
ee7eafc907 Merge branch 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Two late fixes for cgroup.

  One fixes descendant walk introduced during this rc1 cycle.  The other
  fixes a post 3.9 bug during task attach which can lead to hang.  Both
  fixes are critical and the fixes are relatively straight-forward"

* 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix to break the while loop in cgroup_attach_task() correctly
  cgroup: fix cgroup post-order descendant walk of empty subtree
2013-10-22 08:20:34 +01:00
Ingo Molnar
e4f8eaad70 perf/urgent fixes:
. Fix build error on Fedora 12.
 
 . Fix to initialize fname always before use it, bug introduced
   during this merge window, from Masami Hiramatsu.
 
 . Disable PERF_RECORD_MMAP2 support, from Stephane Eranian.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSY5N6AAoJENZQFvNTUqpALrMP/ArKFqrfhWBncu1oezl3xb8S
 ZsXSLscro47XcdIRUY/vBcqLnv8l0fO8lyk1aaUaI5BclGCRrZ9/sLmJmRxSbvSy
 tr0wet7Lzv2FGPQWJEmNK5uRZ7tvFVQbBs+KFj5hfD5TRKUmj71XJYpJ9xWfxvup
 XVOKqVZJl+ey/1KUy53tccLtrKi2hrE9uPEpHMri5hoquknGNjQe3MokH9AYtKo8
 6Lcm+cHsqKvbbkQwThnmd84qLZ6xwr/cuiteZaQkEe7QZxMSU1ik8ajo4N2v3tw6
 f/mkLoyf0K7wDDhEQpHL4BvkhVa7gPFpCj/hgKflXlb+HtCUJJEq6Q6hKlpsv7T9
 39Eoq2E8jTvDXyTOj8yXjTuzku1SdU55uZsltX56gAVHqhae8Phf3/RBFAtC1C1u
 c6nt8/BIjjnluhOfLB/irrHmstS+t3gBAyxC0foveEIyS1AelGgkYx4c36tabp4k
 lEww3PckiqHozWS/dURYKMvO1vEVLJ8w++9pbp28IIsNNKaWIXpFdhWvI30FIpIH
 nKk/5V1wx95bbBusIq7yU8WYBujciiPphqXY2+2QGAy1qf2pSdacOTroAZ85zdad
 4BU9HJwW9xHuWnVPoaOAohtlkXPPaWeMvGfRnOPArHwCfjqiRs2gjtXaAGRtlJLs
 CDqtqCm0H8P4o9+1yu6R
 =QV+5
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

" * Fix build error on Fedora 12.

  * Fix to initialize fname always before use it, bug introduced
    during this merge window, from Masami Hiramatsu.

  * Disable PERF_RECORD_MMAP2 support, from Stephane Eranian. "

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-20 10:51:35 +02:00
Tetsuo Handa
b0267507df mutex: Avoid gcc version dependent __builtin_constant_p() usage
Commit 040a0a37 ("mutex: Add support for wound/wait style locks")
used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot
handle such expression correctly, leading to boot failure when
built with CONFIG_DEBUG_MUTEXES=y.

Fix it by explicitly passing a bool which tells whether p != NULL
or not.

[ PeterZ: This is a sad patch, but provided it actually generates
          similar code I suppose its the best we can do bar whole
	  sale deprecating gcc-3. ]

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: peterz@infradead.org
Cc: imirkin@alum.mit.edu
Cc: daniel.vetter@ffwll.ch
Cc: robdclark@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/201310171945.AGB17114.FSQVtHOJFOOFML@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-18 21:58:54 +02:00
Stephane Eranian
3090ffb5a2 perf: Disable PERF_RECORD_MMAP2 support
For now, we disable the extended MMAP record support (MMAP2).

We have identified cases where it would not report the correct mapping
information, clone(VM_CLONE) but with separate pids.  We will revisit
the support once we find a solution for this case.

The patch changes the kernel to return EINVAL if attr->mmap2 is set. The
patch also modifies the perf tool to use regular PERF_RECORD_MMAP for
synthetic events and it also prevents the tool from requesting
attr->mmap2 mode because the kernel would reject it.

The support will be revisited once the kenrel interface is updated.

In V2, we reduce the patch to the strict minimum.

In V3, we avoid calling perf_event_open() with mmap2 set because we know
it will fail and require fallback retry.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131017173215.GA8820@quad
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-17 16:27:14 -03:00
Anjana V Kumar
ea84753c98 cgroup: fix to break the while loop in cgroup_attach_task() correctly
Both Anjana and Eunki reported a stall in the while_each_thread loop
in cgroup_attach_task().

It's because, when we attach a single thread to a cgroup, if the cgroup
is exiting or is already in that cgroup, we won't break the loop.

If the task is already in the cgroup, the bug can lead to another thread
being attached to the cgroup unexpectedly:

  # echo 5207 > tasks
  # cat tasks
  5207
  # echo 5207 > tasks
  # cat tasks
  5207
  5215

What's worse, if the task to be attached isn't the leader of the thread
group, we might never exit the loop, hence cpu stall. Thanks for Oleg's
analysis.

This bug was introduced by commit 081aa458c3
("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()")

[ lizf: - fixed the first continue, pointed out by Oleg,
        - rewrote changelog. ]

Cc: <stable@vger.kernel.org> # 3.9+
Reported-by: Eunki Kim <eunki_kim@samsung.com>
Reported-by: Anjana V Kumar <anjanavk12@gmail.com>
Signed-off-by: Anjana V Kumar <anjanavk12@gmail.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-10-13 16:07:10 -04:00
Linus Torvalds
0e7a3ed04f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Various fixlets:

  On the kernel side:

   - fix a race
   - fix a bug in the handling of the perf ring-buffer data page

  On the tooling side:

   - fix the handling of certain corrupted perf.data files
   - fix a bug in 'perf probe'
   - fix a bug in 'perf record + perf sched'
   - fix a bug in 'make install'
   - fix a bug in libaudit feature-detection on certain distros"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf session: Fix infinite loop on invalid perf.data file
  perf tools: Fix installation of libexec components
  perf probe: Fix to find line information for probe list
  perf tools: Fix libaudit test
  perf stat: Set child_pid after perf_evlist__prepare_workload()
  perf tools: Add default handler for mmap2 events
  perf/x86: Clean up cap_user_time* setting
  perf: Fix perf_pmu_migrate_context
2013-10-08 09:23:12 -07:00
Linus Torvalds
7dee8dff47 ACPI and power management fixes for 3.12-rc4
1) The resume part of user space driven hibernation (s2disk) is now
     broken after the change that moved the creation of memory bitmaps
     to after the freezing of tasks, because I forgot that the resume
     utility loaded the image before freezing tasks and needed the
     bitmaps for that.  The fix adds special handling for that case.
 
  2) One of recent commits changed the export of acpi_bus_get_device()
     to EXPORT_SYMBOL_GPL(), which was technically correct but broke
     existing binary modules using that function including one in
     particularly widespread use.  Change it back to EXPORT_SYMBOL().
 
  3) The intel_pstate driver sometimes fails to disable turbo if its
     no_turbo sysfs attribute is set.  Fix from Srinivas Pandruvada.
 
  4) One of recent cpufreq fixes forgot to update a check in cpufreq-cpu0
     which still (incorrectly) treats non-NULL as non-error.  Fix from
     Philipp Zabel.
 
  5) The SPEAr cpufreq driver uses a wrong variable type in one place
     preventing it from catching errors returned by one of the functions
     called by it.  Fix from Sachin Kamat.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSTvXfAAoJEKhOf7ml8uNslkkP+QGoghnGR9hScYq/0Mcnzr4b
 kwkiRx54NggjzzN8Q+ejZmxNZ7UZt3q05PmHPtJk3A8gzqIMsb83jnXsZNiDiQs6
 m+KBYrV5dhPZkp08X2tHJp5ijZNRULpp9QA49ulnLfVT/A+rkr5xBCK0W3ln/zL3
 tJSlGJ3N7yYUXe3nMRCCNnnnAzWA+Tk8yRaMx5MnFqlQWWnyx1SGKjD/kVv0/3RA
 6rlDPQEIuoCTqLKotnGIqVN2hTFPFJKc9yTrRGZ15pMjdUGHMwnHy6KMAdXy4Rdh
 R1DOdf+bvPkkFiGE1D1vKOt7pdOG/cTtNkppvWZRuoGg2AMJGm5KWlrdLhlvunyt
 IQXmdt/eWecNr+WzN8FiDp4LEQcI6VjEDaJ3qbjXHLH/FOupBKXYoNWpejj4bGSE
 PtPmJYjNpD2vF3cdtt80ZAYSxhLutwPQksoAwyJ40++l53Ygi81BO31LWZQnDk/8
 HPWOXFThmWJtT03b0sG25GpboiCpYtHEmbwQe+y+pRx7L12HBfE4StT3hmv5Z9J4
 RXXB3yNq4ApXtFq1mitpiPmSVfYe+zu590m7ZUr457BpXi7MH17tzDn9nUJ2eTZl
 kXwUNWiRKGjPmKYxV/ml/apClozsGMFP+XoZkYotFd0W5+SVLuhdXdtClIt4NAbD
 dUkYVMm/BBBALpmH+yKw
 =P4mh
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:

 - The resume part of user space driven hibernation (s2disk) is now
   broken after the change that moved the creation of memory bitmaps to
   after the freezing of tasks, because I forgot that the resume utility
   loaded the image before freezing tasks and needed the bitmaps for
   that.  The fix adds special handling for that case.

 - One of recent commits changed the export of acpi_bus_get_device() to
   EXPORT_SYMBOL_GPL(), which was technically correct but broke existing
   binary modules using that function including one in particularly
   widespread use.  Change it back to EXPORT_SYMBOL().

 - The intel_pstate driver sometimes fails to disable turbo if its
   no_turbo sysfs attribute is set.  Fix from Srinivas Pandruvada.

 - One of recent cpufreq fixes forgot to update a check in cpufreq-cpu0
   which still (incorrectly) treats non-NULL as non-error.  Fix from
   Philipp Zabel.

 - The SPEAr cpufreq driver uses a wrong variable type in one place
   preventing it from catching errors returned by one of the functions
   called by it.  Fix from Sachin Kamat.

* tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: Use EXPORT_SYMBOL() for acpi_bus_get_device()
  intel_pstate: fix no_turbo
  cpufreq: cpufreq-cpu0: NULL is a valid regulator, part 2
  cpufreq: SPEAr: Fix incorrect variable type
  PM / hibernate: Fix user space driven resume regression
2013-10-04 15:03:42 -07:00
Peter Zijlstra
9886167d20 perf: Fix perf_pmu_migrate_context
While auditing the list_entry usage due to a trinity bug I found that
perf_pmu_migrate_context violates the rules for
perf_event::event_entry.

The problem is that perf_event::event_entry is a RCU list element, and
hence we must wait for a full RCU grace period before re-using the
element after deletion.

Therefore the usage in perf_pmu_migrate_context() which re-uses the
entry immediately is broken. For now introduce another list_head into
perf_event for this specific usage.

This doesn't actually fix the trinity report because that never goes
through this code.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-mkj72lxagw1z8fvjm648iznw@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 09:58:53 +02:00
Ingo Molnar
0d119fb576 Merge branch 'irq/urgent-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into irq/urgent
Pull a hardirq-nesting fix from Frederic Weisbecker.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 07:53:01 +02:00
Frederic Weisbecker
ded7975475 irq: Force hardirq exit's softirq processing on its own stack
The commit facd8b80c6
("irq: Sanitize invoke_softirq") converted irq exit
calls of do_softirq() to __do_softirq() on all architectures,
assuming it was only used there for its irq disablement
properties.

But as a side effect, the softirqs processed in the end
of the hardirq are always called on the inline current
stack that is used by irq_exit() instead of the softirq
stack provided by the archs that override do_softirq().

The result is mostly safe if the architecture runs irq_exit()
on a separate irq stack because then softirqs are processed
on that same stack that is near empty at this stage (assuming
hardirq aren't nesting).

Otherwise irq_exit() runs in the task stack and so does the softirq
too. The interrupted call stack can be randomly deep already and
the softirq can dig through it even further. To add insult to the
injury, this softirq can be interrupted by a new hardirq, maximizing
the chances for a stack overrun as reported in powerpc for example:

	do_IRQ: stack overflow: 1920
	CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
	Call Trace:
	[c0000000050a8740] .show_stack+0x130/0x200 (unreliable)
	[c0000000050a8810] .dump_stack+0x28/0x3c
	[c0000000050a8880] .do_IRQ+0x2b8/0x2c0
	[c0000000050a8930] hardware_interrupt_common+0x154/0x180
	--- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
		LR = .cp_start_xmit+0x390/0x820 [8139cp]
	[c0000000050a8d40] .dev_hard_start_xmit+0x394/0x640
	[c0000000050a8e00] .sch_direct_xmit+0x110/0x260
	[c0000000050a8ea0] .dev_queue_xmit+0x260/0x630
	[c0000000050a8f40] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
	[c0000000050a8fc0] .br_dev_xmit+0x198/0x270 [bridge]
	[c0000000050a9070] .dev_hard_start_xmit+0x394/0x640
	[c0000000050a9130] .dev_queue_xmit+0x428/0x630
	[c0000000050a91d0] .ip_finish_output+0x2a4/0x550
	[c0000000050a9290] .ip_local_out+0x50/0x70
	[c0000000050a9310] .ip_queue_xmit+0x148/0x420
	[c0000000050a93b0] .tcp_transmit_skb+0x4e4/0xaf0
	[c0000000050a94a0] .__tcp_ack_snd_check+0x7c/0xf0
	[c0000000050a9520] .tcp_rcv_established+0x1e8/0x930
	[c0000000050a95f0] .tcp_v4_do_rcv+0x21c/0x570
	[c0000000050a96c0] .tcp_v4_rcv+0x734/0x930
	[c0000000050a97a0] .ip_local_deliver_finish+0x184/0x360
	[c0000000050a9840] .ip_rcv_finish+0x148/0x400
	[c0000000050a98d0] .__netif_receive_skb_core+0x4f8/0xb00
	[c0000000050a99d0] .netif_receive_skb+0x44/0x110
	[c0000000050a9a70] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
	[c0000000050a9b20] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
	[c0000000050a9bd0] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
	[c0000000050a9c70] .nf_iterate+0x114/0x130
	[c0000000050a9d30] .nf_hook_slow+0xb4/0x1e0
	[c0000000050a9e00] .br_handle_frame+0x290/0x330 [bridge]
	[c0000000050a9ea0] .__netif_receive_skb_core+0x34c/0xb00
	[c0000000050a9fa0] .netif_receive_skb+0x44/0x110
	[c0000000050aa040] .napi_gro_receive+0xe8/0x120
	[c0000000050aa0c0] .cp_rx_poll+0x31c/0x590 [8139cp]
	[c0000000050aa1d0] .net_rx_action+0x1dc/0x310
	[c0000000050aa2b0] .__do_softirq+0x158/0x330
	[c0000000050aa3b0] .irq_exit+0xc8/0x110
	[c0000000050aa430] .do_IRQ+0xdc/0x2c0
	[c0000000050aa4e0] hardware_interrupt_common+0x154/0x180
	 --- Exception: 501 at .bad_range+0x1c/0x110
		 LR = .get_page_from_freelist+0x908/0xbb0
	[c0000000050aa7d0] .list_del+0x18/0x50 (unreliable)
	[c0000000050aa850] .get_page_from_freelist+0x908/0xbb0
	[c0000000050aa9e0] .__alloc_pages_nodemask+0x21c/0xae0
	[c0000000050aaba0] .alloc_pages_vma+0xd0/0x210
	[c0000000050aac60] .handle_pte_fault+0x814/0xb70
	[c0000000050aad50] .__get_user_pages+0x1a4/0x640
	[c0000000050aae60] .get_user_pages_fast+0xec/0x160
	[c0000000050aaf10] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
	[c0000000050aafd0] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
	[c0000000050ab070] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
	[c0000000050ab190] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
	[c0000000050ab270] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
	[c0000000050ab320]  kvm_start_lightweight+0x1ec/0x1fc [kvm]
	[c0000000050ab4f0] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
	[c0000000050ab9c0] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
	[c0000000050aba50] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
	[c0000000050abae0] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
	[c0000000050abc90] .do_vfs_ioctl+0x4ec/0x7c0
	[c0000000050abd80] .SyS_ioctl+0xd4/0xf0
	[c0000000050abe30] syscall_exit+0x0/0x98

Since this is a regression, this patch proposes a minimalistic
and low-risk solution by blindly forcing the hardirq exit processing of
softirqs on the softirq stack. This way we should reduce significantly
the opportunities for task stack overflow dug by softirqs.

Longer term solutions may involve extending the hardirq stack coverage to
irq_exit(), etc...

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: #3.9.. <stable@vger.kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
2013-10-01 12:39:08 +02:00
Oleg Nesterov
314a8ad0f1 pidns: fix free_pid() to handle the first fork failure
"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.

However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:03 -07:00