Commit Graph

14716 Commits

Author SHA1 Message Date
Eric Paris
2a0b4be6dd audit: fix message spacing printing auid
The helper function didn't include a leading space, so it was jammed
against the previous text in the audit record.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-05-08 00:02:19 -04:00
Eric Paris
82d8da0d46 Revert "audit: move kaudit thread start from auditd registration to kaudit init"
This reverts commit 6ff5e45985.

Conflicts:
	kernel/audit.c

This patch was starting a kthread for all the time.  Since the follow on
patches that required it didn't get finished in 3.10 time, we shouldn't
ship this change in 3.10.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-05-07 22:27:21 -04:00
Eric W. Biederman
780a7654ce audit: Make testing for a valid loginuid explicit.
audit rule additions containing "-F auid!=4294967295" were failing
with EINVAL because of a regression caused by e1760bd.

Apparently some userland audit rule sets want to know if loginuid uid
has been set and are using a test for auid != 4294967295 to determine
that.

In practice that is a horrible way to ask if a value has been set,
because it relies on subtle implementation details and will break
every time the uid implementation in the kernel changes.

So add a clean way to test if the audit loginuid has been set, and
silently convert the old idiom to the cleaner and more comprehensible
new idiom.

Cc: <stable@vger.kernel.org> # 3.7
Reported-By: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-05-07 22:27:15 -04:00
Eric Paris
b24a30a730 audit: fix event coverage of AUDIT_ANOM_LINK
The userspace audit tools didn't like the existing formatting of the
AUDIT_ANOM_LINK event. It needed to be expanded to emit an AUDIT_PATH
event as well, so this implements the change. The bulk of the patch is
moving code out of auditsc.c into audit.c and audit.h for general use.
It expands audit_log_name to include an optional "struct path" argument
for the simple case of just needing to report a pathname. This also
makes
audit_log_task_info available when syscall auditing is not enabled,
since
it is needed in either case for process details.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Steve Grubb <sgrubb@redhat.com>
2013-04-30 15:31:28 -04:00
Eric Paris
7173c54e3a audit: use spin_lock in audit_receive_msg to process tty logging
This function is called when we receive a netlink message from
userspace.  We don't need to worry about it coming from irq context or
irqs making it re-entrant.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-30 15:31:28 -04:00
Richard Guy Briggs
46e959ea29 audit: add an option to control logging of passwords with pam_tty_audit
Most commands are entered one line at a time and processed as complete lines
in non-canonical mode.  Commands that interactively require a password, enter
canonical mode to do this while shutting off echo.  This pair of features
(icanon and !echo) can be used to avoid logging passwords by audit while still
logging the rest of the command.

Adding a member (log_passwd) to the struct audit_tty_status passed in by
pam_tty_audit allows control of canonical mode without echo per task.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-30 15:31:28 -04:00
Eric Paris
bde02ca858 audit: use spin_lock_irqsave/restore in audit tty code
Some of the callers of the audit tty function use spin_lock_irqsave/restore.
We were using the forced always enable version, which seems really bad.
Since I don't know every one of these code paths well enough, it makes
sense to just switch everything to the safe version.  Maybe it's a
little overzealous, but it's a lot better than an unlucky deadlock when
we return to a caller with irq enabled and they expect it to be
disabled.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-30 15:31:28 -04:00
Eric Paris
4d3fb709b2 helper for some session id stuff 2013-04-30 15:31:28 -04:00
Eric Paris
b122c3767c audit: use a consistent audit helper to log lsm information
We have a number of places we were reimplementing the same code to write
out lsm labels.  Just do it one darn place.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-30 15:31:28 -04:00
Eric Paris
152f497b9b audit: push loginuid and sessionid processing down
Since we are always current, we can push a lot of this stuff to the
bottom and get rid of useless interfaces and arguments.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-30 15:31:28 -04:00
Eric Paris
dc9eb698f4 audit: stop pushing loginid, uid, sessionid as arguments
We always use current.  Stop pulling this when the skb comes in and
pushing it around as arguments.  Just get it at the end when you need
it.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-30 15:31:28 -04:00
Eric Paris
1890090916 audit: remove the old depricated kernel interface
We used to have an inflexible mechanism to add audit rules to the
kernel.  It hasn't been used in a long time.  Get rid of that stuff.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-30 15:31:28 -04:00
Eric Paris
ab61d38ed8 audit: make validity checking generic
We have 2 interfaces to send audit rules.  Rather than check validity of
things in 2 places make a helper function.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-30 15:31:28 -04:00
Eric Paris
62062cf8a3 audit: allow checking the type of audit message in the user filter
When userspace sends messages to the audit system it includes a type.
We want to be able to filter messages based on that type without have to
do the all or nothing option currently available on the
AUDIT_FILTER_TYPE filter list.  Instead we should be able to use the
AUDIT_FILTER_USER filter list and just use the message type as one part
of the matching decision.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-16 17:28:49 -04:00
Eric Paris
34c474de7b audit: fix build break when AUDIT_DEBUG == 2
Looks like this one has been around since 5195d8e21:

	kernel/auditsc.c: In function ‘audit_free_names’:
	kernel/auditsc.c:998: error: ‘i’ undeclared (first use in this function)

...and this warning:

	kernel/auditsc.c: In function ‘audit_putname’:
	kernel/auditsc.c:2045: warning: ‘i’ may be used uninitialized in this function

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-16 10:17:02 -04:00
Gao feng
72199caa8d audit: remove duplicate export of audit_enabled
audit_enabled has already been exported in
include/linux/audit.h. and kernel/audit.h
includes include/linux/audit.h, no need to
export aduit_enabled again in kernel/audit.h

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-12 09:42:08 -04:00
Eric Paris
ad395abece Audit: do not print error when LSMs disabled
RHBZ: 785936

If the audit system collects a record about one process sending a signal
to another process it includes in that collection the 'secid' or 'an int
used to represet an LSM label.'  If there is no LSM enabled it will
collect a 0.  The problem is that when we attempt to print that record
we ask the LSM to convert the secid back to a string.  Since there is no
LSM it returns EOPNOTSUPP.

Most code in the audit system checks if the secid is 0 and does not
print LSM info in that case.  The signal information code however forgot
that check.  Thus users will see a message in syslog indicating that
converting the sid to string failed.  Add the right check.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-11 15:39:10 -04:00
Eric Paris
f7616102d6 audit: use data= not msg= for AUDIT_USER_TTY messages
Userspace parsing libraries assume that msg= is only for userspace audit
records, not for user tty records.  Make this consistent with the other
tty records.

Reported-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-11 11:26:03 -04:00
Andrew Morton
e2c5adc88a auditsc: remove audit_set_context() altogether - fold it into its caller
>   In function audit_alloc_context(), use kzalloc, instead of kmalloc+memset. Patch also renames audit_zero_context() to
> audit_set_context(), to represent it's inner workings properly.

Fair enough.  I'd go futher...

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-10 15:18:50 -04:00
Rakib Mullick
17c6ee707a auditsc: Use kzalloc instead of kmalloc+memset.
In function audit_alloc_context(), use kzalloc, instead of kmalloc+memset. Patch also renames audit_zero_context() to
audit_set_context(), to represent it's inner workings properly.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-10 15:18:24 -04:00
Chen Gang
2950fa9d32 kernel: audit: beautify code, for extern function, better to check its parameters by itself
__audit_socketcall is an extern function.
  better to check its parameters by itself.

    also can return error code, when fail (find invalid parameters).
    also use macro instead of real hard code number
    also give related comments for it.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
[eparis: fix the return value when !CONFIG_AUDIT]
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-10 13:31:12 -04:00
Dmitry Monakhov
65ada7bc02 audit: destroy long filenames correctly
filename should be destroyed via final_putname() instead of __putname()
Otherwise this result in following BUGON() in case of long names:
  kernel BUG at mm/slab.c:3006!
  Call Trace:
  kmem_cache_free+0x1c1/0x850
  audit_putname+0x88/0x90
  putname+0x73/0x80
  sys_symlinkat+0x120/0x150
  sys_symlink+0x16/0x20
  system_call_fastpath+0x16/0x1b

Introduced-in: 7950e3852

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-10 13:21:52 -04:00
Richard Guy Briggs
6ff5e45985 audit: move kaudit thread start from auditd registration to kaudit init
The kauditd_thread() task was started only after the auditd userspace daemon
registers itself with kaudit.  This was fine when only auditd consumed messages
from the kaudit netlink unicast socket.  With the addition of a multicast group
to that socket it is more convenient to have the thread start on init of the
kaudit kernel subsystem.

Signed-off-by: Richard Guy Briggs <rbriggs@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-08 16:19:18 -04:00
Richard Guy Briggs
3320c5133d audit: flatten kauditd_thread wait queue code
The wait queue control code in kauditd_thread() was nested deeper than
necessary.  The function has been flattened for better legibility.

Signed-off-by: Richard Guy Briggs <rbriggs@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-08 16:19:17 -04:00
Richard Guy Briggs
b551d1d981 audit: refactor hold queue flush
The hold queue flush code is an autonomous chunk of code that can be
refactored, removed from kauditd_thread() into flush_hold_queue() and
flattenned for better legibility.

Signed-off-by: Richard Guy Briggs <rbriggs@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-08 16:19:16 -04:00
Matvejchikov Ilya
37eebe39c9 audit: improve GID/EGID comparation logic
It is useful to extend GID/EGID comparation logic to be able to
match not only the exact EID/EGID values but the group/egroup also.

Signed-off-by: Matvejchikov Ilya <matvejchikov@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-08 16:19:15 -04:00
Eric W. Biederman
6e6668845f kernel/pid.c: reenable interrupts when alloc_pid() fails because init has exited
We're forgetting to reenable local interrupts on an error path.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-12 14:34:00 -08:00
Linus Torvalds
2a6f79e8c1 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Three small fixlets"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Fix format string for 32-bit platforms
  sched: Fix warning in kernel/sched/fair.c
  sched/rt: Use root_domain of rt_rq not current processor
2013-02-05 07:58:24 +11:00
Linus Torvalds
51c1abb95f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Three fixlets and two small (and low risk) hw-enablement changes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix event group context move
  x86/perf: Add IvyBridge EP support
  perf/x86: Fix P6 driver section warning
  arch/x86/tools/insn_sanity.c: Identify source of messages
  perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
2013-02-05 07:57:09 +11:00
Linus Torvalds
5dc31b5767 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull two small RCU fixlets from Ingo Molnar.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Make rcu_nocb_poll an early_param instead of module_param
  rcu: Prevent soft-lockup complaints about no-CBs CPUs
2013-02-05 07:56:07 +11:00
Jiri Olsa
0231bb5336 perf: Fix event group context move
When we have group with mixed events (hw/sw) we want to end up
with group leader being in hw context. So if group leader is
initialy sw event, we move all the events under hw context.

The move is done for each event by removing it from its context
and adding it back into proper one. As a part of the removal the
event is automatically disabled, which is not what we want at
this stage of creating groups.

The fix is to initialize event state after removal from sw
context.

This fix resulted from the following discussion:

  http://thread.gmane.org/gmane.linux.kernel.perf.user/1144

Reported-by: Andreas Hollmann <hollmann@in.tum.de>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/r/1359714225-4231-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-03 12:01:29 +01:00
Linus Torvalds
bdb0ae6a76 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is a collection of miscellaneous fixes, the most important one is
  the fix for the Samsung laptop bricking issue (auto-blacklisting the
  samsung-laptop driver); the efi_enabled() changes you see below are
  prerequisites for that fix.

  The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI
  debugging, and requiring CAP_SYS_RAWIO for MSR references, just as
  with I/O port references."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  samsung-laptop: Disable on EFI hardware
  efi: Make 'efi_enabled' a function to query EFI facilities
  smp: Fix SMP function call empty cpu mask race
  x86/msr: Add capabilities check
  x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES
  x86/olpc: Fix olpc-xo1-sci.c build errors
  arch/x86/platform/uv: Fix incorrect tlb flush all issue
  x86-64: Fix unwind annotations in recent NMI changes
  x86-32: Start out cr0 clean, disable paging before modifying cr3/4
2013-01-31 17:08:43 +11:00
Dave Airlie
ff0d05bf73 Revert "console: implement lockdep support for console_lock"
This reverts commit daee779718.

I'll requeue this after the console locking fixes, so lockdep
is useful again for people until fbcon is fixed.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-01-31 15:46:56 +11:00
Wang YanQing
f44310b98d smp: Fix SMP function call empty cpu mask race
I get the following warning every day with v3.7, once or
twice a day:

  [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()

As explained by Linus as well:

 |
 | Once we've done the "list_add_rcu()" to add it to the
 | queue, we can have (another) IPI to the target CPU that can
 | now see it and clear the mask.
 |
 | So by the time we get to actually send the IPI, the mask might
 | have been cleared by another IPI.
 |

This patch also fixes a system hang problem, if the data->cpumask
gets cleared after passing this point:

        if (WARN_ONCE(!mask, "empty IPI mask"))
                return;

then the problem in commit 83d349f35e ("x86: don't send an IPI to
the empty set of CPU's") will happen again.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: peterz@infradead.org
Cc: mina86@mina86.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
[ Tidied up the changelog and the comment in the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-28 11:21:57 +01:00
Arnd Bergmann
cff3c124a7 sched/debug: Fix format string for 32-bit platforms
The type returned from atomic64_t can be either unsigned
long or unsigned long long, depending on the architecture.
Using a cast to unsigned long long lets us use the same
format string for all architectures.

Without this patch, building with scheduler debugging
enabled results in:

  kernel/sched/debug.c: In function 'print_cfs_rq':
  kernel/sched/debug.c:225:2: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Wformat]
  kernel/sched/debug.c:225:2: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Wformat]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: linux-arm-kernel@list.infradead.org
Link: http://lkml.kernel.org/r/1359123276-15833-7-git-send-email-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-25 15:23:15 +01:00
Arnd Bergmann
38dc3348e3 sched: Fix warning in kernel/sched/fair.c
a4c96ae319 "sched: Unthrottle rt runqueues in
__disable_runtime()" turned the unthrottle_offline_cfs_rqs
function into a static symbol, which now triggers a warning
about it being potentially unused:

  kernel/sched/fair.c:2055:13: warning: 'unthrottle_offline_cfs_rqs' defined but not used [-Wunused-function]

Marking it __maybe_unused shuts up the gcc warning and lets the
compiler safely drop the function body when it's not being used.

To reproduce, build the ARM bcm2835_defconfig.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Boonstoppel <pboonstoppel@nvidia.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: linux-arm-kernel@list.infradead.org
Link: http://lkml.kernel.org/r/1359123276-15833-6-git-send-email-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-25 15:23:14 +01:00
Shawn Bohrer
aa7f67304d sched/rt: Use root_domain of rt_rq not current processor
When the system has multiple domains do_sched_rt_period_timer()
can run on any CPU and may iterate over all rt_rq in
cpu_online_mask.  This means when balance_runtime() is run for a
given rt_rq that rt_rq may be in a different rd than the current
processor.  Thus if we use smp_processor_id() to get rd in
do_balance_runtime() we may borrow runtime from a rt_rq that is
not part of our rd.

This changes do_balance_runtime to get the rd from the passed in
rt_rq ensuring that we borrow runtime only from the correct rd
for the given rt_rq.

This fixes a BUG at kernel/sched/rt.c:687! in __disable_runtime
when we try reclaim runtime lent to other rt_rq but runtime has
been lent to a rt_rq in another rd.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: peterz@infradead.org
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358186131-29494-1-git-send-email-sbohrer@rgmadvisors.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-25 08:20:47 +01:00
Ingo Molnar
d36b7b9643 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull RCU fixes from Paul E. McKenney.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 13:28:24 +01:00
Tejun Heo
f56c3196f2 async: fix __lowest_in_progress()
Commit 083b804c4d ("async: use workqueue for worker pool") made it
possible that async jobs are moved from pending to running out-of-order.
While pending async jobs will be queued and dispatched for execution in
the same order, nothing guarantees they'll enter "1) move self to the
running queue" of async_run_entry_fn() in the same order.

Before the conversion, async implemented its own worker pool.  An async
worker, upon being woken up, fetches the first item from the pending
list, which kept the executing lists sorted.  The conversion to
workqueue was done by adding work_struct to each async_entry and async
just schedules the work item.  The queueing and dispatching of such work
items are still in order but now each worker thread is associated with a
specific async_entry and moves that specific async_entry to the
executing list.  So, depending on which worker reaches that point
earlier, which is non-deterministic, we may end up moving an async_entry
with larger cookie before one with smaller one.

This broke __lowest_in_progress().  running->domain may not be properly
sorted and is not guaranteed to contain lower cookies than pending list
when not empty.  Fix it by ensuring sort-inserting to the running list
and always looking at both pending and running when trying to determine
the lowest cookie.

Over time, the async synchronization implementation became quite messy.
We better restructure it such that each async_entry is linked to two
lists - one global and one per domain - and not move it when execution
starts.  There's no reason to distinguish pending and running.  They
behave the same for synchronization purposes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 16:21:24 -08:00
Linus Torvalds
d26d45253b Kprobes now uses the function tracer if it can. That is, if a probe
is placed on a function mcount/nop location, and the arch supports it,
 instead of adding a breakpoint, kprobes will register a function callback
 as that is much more efficient.
 
 The function tracer requires to update modules before they run, and
 uses the module notifier to do so. But if something else in the module
 notifiers registers a kprobe at one of these locations, before ftrace
 can get to it, then the system could fail.
 
 The function tracer must be initialized early, otherwise module notifiers
 that probe will only work by chance.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQ/ZaaAAoJEOdOSU1xswtMQsoH/02S5ngidMgIXwYCGzwxR6Ym
 f2krSz76rBkFn9ivFPJFfq2Wi05UhhVfFr4gpKzoK+/rBqwVkV5tRIQ4AhJ1/Q3d
 wCXB2mcRI6ky/kB/ts8q6gjj+rlJ18NgZf79TA5y5q7FEYc6DvB2dZ+vE+rvCnXk
 q/Bx6q4phEdLbVqet+Ga36qzi9u+pW0P+ntZmi0EQLVnz9p+mtdVaRz32qKp0FYi
 XhHtPMHQDZoOu8utPNtPcfSOZp1sNN8tXqjEAJ4/Ba54badUfg4WJ+RXGPsYH+4T
 lOwpVv7Xp0Sp2b1HivEVfCs1bx2RNzluZisPahBEwdBv+XssvqMbAfG+X3EVk3w=
 =6EYI
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.8-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
 "Kprobes now uses the function tracer if it can.  That is, if a probe
  is placed on a function mcount/nop location, and the arch supports it,
  instead of adding a breakpoint, kprobes will register a function
  callback as that is much more efficient.

  The function tracer requires to update modules before they run, and
  uses the module notifier to do so.  But if something else in the
  module notifiers registers a kprobe at one of these locations, before
  ftrace can get to it, then the system could fail.

  The function tracer must be initialized early, otherwise module
  notifiers that probe will only work by chance."

* tag 'trace-3.8-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Be first to run code modification on modules
2013-01-22 10:30:49 -08:00
Oleg Nesterov
9067ac85d5 wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task
wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().

TASK_ALL has no other users, probably can be killed.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 10:08:17 -08:00
Oleg Nesterov
9899d11f65 ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack.  However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it.  Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 10:08:00 -08:00
Oleg Nesterov
910ffdb18a ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up()
Cleanup and preparation for the next change.

signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.

Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.

This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 08:50:08 -08:00
Steven Rostedt
c1bf08ac26 ftrace: Be first to run code modification on modules
If some other kernel subsystem has a module notifier, and adds a kprobe
to a ftrace mcount point (now that kprobes work on ftrace points),
when the ftrace notifier runs it will fail and disable ftrace, as well
as kprobes that are attached to ftrace points.

Here's the error:

 WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
 Hardware name: Bochs
 Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared]
 Pid: 8068, comm: modprobe Tainted: GF            3.7.0-0.rc8.git0.1.fc19.x86_64 #1
 Call Trace:
  [<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70
  [<ffffffffa0180000>] ? 0xffffffffa017ffff
  [<ffffffffa0180000>] ? 0xffffffffa017ffff
  [<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff810fd189>] ftrace_bug+0x239/0x280
  [<ffffffff810fd626>] ftrace_process_locs+0x376/0x520
  [<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50
  [<ffffffff8163912d>] notifier_call_chain+0x4d/0x70
  [<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80
  [<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20
  [<ffffffff810c2a23>] sys_init_module+0x73/0x220
  [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
 ---[ end trace 9ef46351e53bbf80 ]---
 ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat]
  actual: cc:bb:d2:4b:e1

A kprobe was added to the init_once() function in the fat module on load.
But this happened before ftrace could have touched the code. As ftrace
didn't run yet, the kprobe system had no idea it was a ftrace point and
simply added a breakpoint to the code (0xcc in the cc:bb:d2:4b:e1).

Then when ftrace went to modify the location from a call to mcount/fentry
into a nop, it didn't see a call op, but instead it saw the breakpoint op
and not knowing what to do with it, ftrace shut itself down.

The solution is to simply give the ftrace module notifier the max priority.
This should have been done regardless, as the core code ftrace modification
also happens very early on in boot up. This makes the module modification
closer to core modification.

Link: http://lkml.kernel.org/r/20130107140333.593683061@goodmis.org

Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:21:50 -05:00
Linus Torvalds
ee61abb322 module: fix missing module_mutex unlock
Commit 1fb9341ac3 ("module: put modules in list much earlier") moved
some of the module initialization code around, and in the process
changed the exit paths too.  But for the duplicate export symbol error
case the change made the ddebug_cleanup path jump to after the module
mutex unlock, even though it happens with the mutex held.

Rusty has some patches to split this function up into some helper
functions, hopefully the mess of complex goto targets will go away
eventually.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-20 20:22:58 -08:00
Linus Torvalds
226364766f Various minor fixes, but a slightly more complex one to fix the per-cpu overload
problem introduced recently by kvm id changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ/IaJAAoJENkgDmzRrbjxOjAQAIrI9+Jo3Lsxk1v9gXeo9xn2
 ST4LNv7/oW2+3NFBOkKsGVpcXe1JtGySIXyx9k+dELPa5xe4Rs4HE3pHQj/VoEx8
 FKz3oUXSHkuh+paKuFXvZ2u/z0/FI99GmqHPObvGQ4iS3hTXAibzO83yYYPxwApq
 Zq4kof/dAcVVPLm8fGVAMPA2Rbh/WmjDfrIv8gv71QkDjtRLzcr40VIgky5cvu7V
 FWcBl4/DVoKkGnDPsLDhLK9QGqgBGhFIlNIcVX4Jv50DiCibOyzdjeUXYxMftoGr
 Rw56hHwGpPdqbRIjBkR071vIl/mlXTmxIv+d77vZNBin2MIBwAzCQXo8I1/HojCK
 /wKhI+RFj0J5DaDo/BTB80cmI3X2oah5sRUebW6vd9HjunhFFndg4mVeDNPa0E0+
 F72xWlj79BjdIOuD06TLg6Tg2klL49nC8bUc0wrsh6onEjhd9v7Cp/X/rxi5cKYW
 eEv3oLkKwUHoheF9gBlpnT0Yyl/HpFe+nemblzj/ybRKnk4A5vtJqV9eZnqoOS16
 lgIkKOpgXT9dzSom2EL/f4sMCeLLYC44DQwOvxNKt/BdMY0r5y8OLaJORXQGfEDF
 Ztvu2G8PmELxV0B3JZcGR/zOcKxpOBsrGoVn0/EQIul3A/0C0ID7i5zwJAyX6LP7
 V+6vyF2eHMf10tB0rbfB
 =SpOo
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module fixes and a virtio block fix from Rusty Russell:
 "Various minor fixes, but a slightly more complex one to fix the
  per-cpu overload problem introduced recently by kvm id changes."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: put modules in list much earlier.
  module: add new state MODULE_STATE_UNFORMED.
  module: prevent warning when finit_module a 0 sized file
  virtio-blk: Don't free ida when disk is in use
2013-01-20 16:44:28 -08:00
Linus Torvalds
3a142ed962 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull misc syscall fixes from Al Viro:

 - compat syscall fixes (discussed back in December)

 - a couple of "make life easier for sigaltstack stuff by reducing
   inter-tree dependencies"

 - fix up compiler/asmlinkage calling convention disagreement of
   sys_clone()

 - misc

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  sys_clone() needs asmlinkage_protect
  make sure that /linuxrc has std{in,out,err}
  x32: fix sigtimedwait
  x32: fix waitid()
  switch compat_sys_wait4() and compat_sys_waitid() to COMPAT_SYSCALL_DEFINE
  switch compat_sys_sigaltstack() to COMPAT_SYSCALL_DEFINE
  CONFIG_GENERIC_SIGALTSTACK build breakage with asm-generic/syscalls.h
  Ensure that kernel_init_freeable() is not inlined into non __init code
2013-01-20 13:58:48 -08:00
Oleg Nesterov
edea0d03ee ia64: kill thread_matches(), unexport ptrace_check_attach()
The ia64 function "thread_matches()" has no users since commit
e868a55c2a ("[IA64] remove find_thread_for_addr()").  Remove it.

This allows us to make ptrace_check_attach() static to kernel/ptrace.c,
which is good since we'll need to change the semantics of it and fix up
all the callers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-20 12:26:05 -08:00
Al Viro
b1e0318b8c sys_clone() needs asmlinkage_protect
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-01-19 22:13:34 -05:00
Tejun Heo
774a1221e8 module, async: async_synchronize_full() on module init iff async is used
If the default iosched is built as module, the kernel may deadlock
while trying to load the iosched module on device probe if the probing
was running off async.  This is because async_synchronize_full() at
the end of module init ends up waiting for the async job which
initiated the module loading.

 async A				modprobe

 1. finds a device
 2. registers the block device
 3. request_module(default iosched)
					4. modprobe in userland
					5. load and init module
					6. async_synchronize_full()

Async A waits for modprobe to finish in request_module() and modprobe
waits for async A to finish in async_synchronize_full().

Because there's no easy to track dependency once control goes out to
userland, implementing properly nested flushing is difficult.  For
now, make module init perform async_synchronize_full() iff module init
has queued async jobs as suggested by Linus.

This avoids the described deadlock because iosched module doesn't use
async and thus wouldn't invoke async_synchronize_full().  This is
hacky and incomplete.  It will deadlock if async module loading nests;
however, this works around the known problem case and seems to be the
best of bad options.

For more details, please refer to the following thread.

  http://thread.gmane.org/gmane.linux.kernel/1420814

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-16 09:05:33 -08:00