Commit Graph

401248 Commits

Author SHA1 Message Date
Mikulas Patocka
8077c0d983 bdi: test bdi_init failure
There were two places where return value from bdi_init was not tested.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:44 -07:00
Mikulas Patocka
a207f59376 block: fix a probe argument to blk_register_region
The probe function is supposed to return NULL on failure (as we can see in
kobj_lookup: kobj = probe(dev, index, data); ... if (kobj) return kobj;

However, in loop and brd, it returns negative error from ERR_PTR.

This causes a crash if we simulate disk allocation failure and run
less -f /dev/loop0 because the negative number is interpreted as a pointer:

BUG: unable to handle kernel NULL pointer dereference at 00000000000002b4
IP: [<ffffffff8118b188>] __blkdev_get+0x28/0x450
PGD 23c677067 PUD 23d6d1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: loop hpfs nvidia(PO) ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_stats cpufreq_ondemand cpufreq_userspace cpufreq_powersave cpufreq_conservative hid_generic spadfs usbhid hid fuse raid0 snd_usb_audio snd_pcm_oss snd_mixer_oss md_mod snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib dmi_sysfs snd_rawmidi nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd soundcore lm85 hwmon_vid ohci_hcd ehci_pci ehci_hcd serverworks sata_svw libata acpi_cpufreq freq_table mperf ide_core usbcore kvm_amd kvm tg3 i2c_piix4 libphy microcode e100 usb_common ptp skge i2c_core pcspkr k10temp evdev floppy hwmon pps_core mii rtc_cmos button processor unix [last unloaded: nvidia]
CPU: 1 PID: 6831 Comm: less Tainted: P        W  O 3.10.15-devel #18
Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
task: ffff880203cc6bc0 ti: ffff88023e47c000 task.ti: ffff88023e47c000
RIP: 0010:[<ffffffff8118b188>]  [<ffffffff8118b188>] __blkdev_get+0x28/0x450
RSP: 0018:ffff88023e47dbd8  EFLAGS: 00010286
RAX: ffffffffffffff74 RBX: ffffffffffffff74 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88023e47dc18 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023f519658
R13: ffffffff8118c300 R14: 0000000000000000 R15: ffff88023f519640
FS:  00007f2070bf7700(0000) GS:ffff880247400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002b4 CR3: 000000023da1d000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 0000000000000002 0000001d00000000 000000003e47dc50 ffff88023f519640
 ffff88043d5bb668 ffffffff8118c300 ffff88023d683550 ffff88023e47de60
 ffff88023e47dc98 ffffffff8118c10d 0000001d81605698 0000000000000292
Call Trace:
 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
 [<ffffffff8118c10d>] blkdev_get+0x1dd/0x370
 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
 [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50
 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
 [<ffffffff8118c365>] blkdev_open+0x65/0x80
 [<ffffffff8114d12e>] do_dentry_open.isra.18+0x23e/0x2f0
 [<ffffffff8114d214>] finish_open+0x34/0x50
 [<ffffffff8115e122>] do_last.isra.62+0x2d2/0xc50
 [<ffffffff8115eb58>] path_openat.isra.63+0xb8/0x4d0
 [<ffffffff81115a8e>] ? might_fault+0x4e/0xa0
 [<ffffffff8115f4f0>] do_filp_open+0x40/0x90
 [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50
 [<ffffffff8116db85>] ? __alloc_fd+0xa5/0x1f0
 [<ffffffff8114e45f>] do_sys_open+0xef/0x1d0
 [<ffffffff8114e559>] SyS_open+0x19/0x20
 [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
Code: 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 89 d6 41 55 41 54 4c 8d 67 18 53 48 83 ec 18 89 75 cc e9 f2 00 00 00 0f 1f 44 00 00 <48> 8b 80 40 03 00 00 48 89 df 4c 8b 68 58 e8 d5
a4 07 00 44 89
RIP  [<ffffffff8118b188>] __blkdev_get+0x28/0x450
 RSP <ffff88023e47dbd8>
CR2: 00000000000002b4
---[ end trace bb7f32dbf02398dc ]---

The brd change should be backported to stable kernels starting with 2.6.25.
The loop change should be backported to stable kernels starting with 2.6.22.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org	# 2.6.22+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:39 -07:00
Mikulas Patocka
3ec981e30f loop: fix crash if blk_alloc_queue fails
loop: fix crash if blk_alloc_queue fails

If blk_alloc_queue fails, loop_add cleans up, but it doesn't clean up the
identifier allocated with idr_alloc. That causes crash on module unload in
idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); where we attempt to
remove non-existed device with that id.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000380
IP: [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
PGD 43d399067 PUD 43d0ad067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: loop(-) dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_ondemand cpufreq_conservative cpufreq_powersave spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc lm85 hwmon_vid snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq ohci_hcd freq_table tg3 ehci_pci mperf ehci_hcd kvm_amd kvm sata_svw serverworks libphy libata ide_core k10temp usbcore hwmon microcode ptp pcspkr pps_core e100 skge mii usb_common i2c_piix4 floppy evdev rtc_cmos i2c_core processor but!
 ton unix
CPU: 7 PID: 2735 Comm: rmmod Tainted: G        W    3.10.15-devel #15
Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
task: ffff88043d38e780 ti: ffff88043d21e000 task.ti: ffff88043d21e000
RIP: 0010:[<ffffffff812057c9>]  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
RSP: 0018:ffff88043d21fe10  EFLAGS: 00010282
RAX: ffffffffa05102e0 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88043ea82800 RDI: 0000000000000000
RBP: ffff88043d21fe48 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000ff
R13: 0000000000000080 R14: 0000000000000000 R15: ffff88043ea82800
FS:  00007ff646534700(0000) GS:ffff880447000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000380 CR3: 000000043e9bf000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffff8100aba4 0000000000000092 ffff88043d21fe48 ffff88043ea82800
 00000000000000ff ffff88043d21fe98 0000000000000000 ffff88043d21fe60
 ffffffffa05102b4 0000000000000000 ffff88043d21fe70 ffffffffa05102ec
Call Trace:
 [<ffffffff8100aba4>] ? native_sched_clock+0x24/0x80
 [<ffffffffa05102b4>] loop_remove+0x14/0x40 [loop]
 [<ffffffffa05102ec>] loop_exit_cb+0xc/0x10 [loop]
 [<ffffffff81217b74>] idr_for_each+0x104/0x190
 [<ffffffffa05102e0>] ? loop_remove+0x40/0x40 [loop]
 [<ffffffff8109adc5>] ? trace_hardirqs_on_caller+0x105/0x1d0
 [<ffffffffa05135dc>] loop_exit+0x34/0xa58 [loop]
 [<ffffffff810a98ea>] SyS_delete_module+0x13a/0x260
 [<ffffffff81221d5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
Code: f0 4c 8b 6d f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 4c 8d af 80 00 00 00 41 54 53 48 89 fb 48 83 ec 18 <48> 83 bf 80 03 00
00 00 74 4d e8 98 fe ff ff 31 f6 48 c7 c7 20
RIP  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
 RSP <ffff88043d21fe10>
CR2: 0000000000000380
---[ end trace 64ec069ec70f1309 ]---

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org	# 3.1+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:26 -07:00
Mikulas Patocka
fff4996b7d blk-core: Fix memory corruption if blkcg_init_queue fails
If blkcg_init_queue fails, blk_alloc_queue_node doesn't call bdi_destroy
to clean up structures allocated by the backing dev.

------------[ cut here ]------------
WARNING: at lib/debugobjects.c:260 debug_print_object+0x85/0xa0()
ODEBUG: free active (active state 0) object type: percpu_counter hint:           (null)
Modules linked in: dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev ipt_MASQUERADE iptable_nat nf_nat_ipv4 msr nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack lm85 hwmon_vid snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq freq_table mperf sata_svw serverworks kvm_amd ide_core ehci_pci ohci_hcd libata ehci_hcd kvm usbcore tg3 usb_common libphy k10temp pcspkr ptp i2c_piix4 i2c_core evdev microcode hwmon rtc_cmos pps_core e100 skge floppy mii processor button unix
CPU: 0 PID: 2739 Comm: lvchange Tainted: G        W
3.10.15-devel #14
Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
 0000000000000009 ffff88023c3c1ae8 ffffffff813c8fd4 ffff88023c3c1b20
 ffffffff810399eb ffff88043d35cd58 ffffffff81651940 ffff88023c3c1bf8
 ffffffff82479d90 0000000000000005 ffff88023c3c1b80 ffffffff81039a67
Call Trace:
 [<ffffffff813c8fd4>] dump_stack+0x19/0x1b
 [<ffffffff810399eb>] warn_slowpath_common+0x6b/0xa0
 [<ffffffff81039a67>] warn_slowpath_fmt+0x47/0x50
 [<ffffffff8122aaaf>] ? debug_check_no_obj_freed+0xcf/0x250
 [<ffffffff81229a15>] debug_print_object+0x85/0xa0
 [<ffffffff8122abe3>] debug_check_no_obj_freed+0x203/0x250
 [<ffffffff8113c4ac>] kmem_cache_free+0x20c/0x3a0
 [<ffffffff811f6709>] blk_alloc_queue_node+0x2a9/0x2c0
 [<ffffffff811f672e>] blk_alloc_queue+0xe/0x10
 [<ffffffffa04c0093>] dm_create+0x1a3/0x530 [dm_mod]
 [<ffffffffa04c6bb0>] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [<ffffffffa04c6c07>] dev_create+0x57/0x2b0 [dm_mod]
 [<ffffffffa04c6bb0>] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [<ffffffffa04c6bb0>] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [<ffffffffa04c6528>] ctl_ioctl+0x268/0x500 [dm_mod]
 [<ffffffff81097662>] ? get_lock_stats+0x22/0x70
 [<ffffffffa04c67ce>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
 [<ffffffff81161aad>] do_vfs_ioctl+0x2ed/0x520
 [<ffffffff8116cfc7>] ? fget_light+0x377/0x4e0
 [<ffffffff81161d2b>] SyS_ioctl+0x4b/0x90
 [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
---[ end trace 4b5ff0d55673d986 ]---
------------[ cut here ]------------

This fix should be backported to stable kernels starting with 2.6.37. Note
that in the kernels prior to 3.5 the affected code is different, but the
bug is still there - bdi_init is called and bdi_destroy isn't.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org	# 2.6.37+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:17 -07:00
Jeff Moyer
4912aa6c11 block: fix race between request completion and timeout handling
crocode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma dca be2net sg ses enclosure ext4 mbcache jbd2 sd_mod crc_t10dif ahci megaraid_sas(U) dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 491, comm: scsi_eh_0 Tainted: G        W  ----------------   2.6.32-220.13.1.el6.x86_64 #1 IBM  -[8722PAX]-/00D1461
RIP: 0010:[<ffffffff8124e424>]  [<ffffffff8124e424>] blk_requeue_request+0x94/0xa0
RSP: 0018:ffff881057eefd60  EFLAGS: 00010012
RAX: ffff881d99e3e8a8 RBX: ffff881d99e3e780 RCX: ffff881d99e3e8a8
RDX: ffff881d99e3e8a8 RSI: ffff881d99e3e780 RDI: ffff881d99e3e780
RBP: ffff881057eefd80 R08: ffff881057eefe90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff881057f92338
R13: 0000000000000000 R14: ffff881057f92338 R15: ffff883058188000
FS:  0000000000000000(0000) GS:ffff880040200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d3ec0 CR3: 000000302cd7d000 CR4: 00000000000406b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_0 (pid: 491, threadinfo ffff881057eee000, task ffff881057e29540)
Stack:
 0000000000001057 0000000000000286 ffff8810275efdc0 ffff881057f16000
<0> ffff881057eefdd0 ffffffff81362323 ffff881057eefe20 ffffffff8135f393
<0> ffff881057e29af8 ffff8810275efdc0 ffff881057eefe78 ffff881057eefe90
Call Trace:
 [<ffffffff81362323>] __scsi_queue_insert+0xa3/0x150
 [<ffffffff8135f393>] ? scsi_eh_ready_devs+0x5e3/0x850
 [<ffffffff81362a23>] scsi_queue_insert+0x13/0x20
 [<ffffffff8135e4d4>] scsi_eh_flush_done_q+0x104/0x160
 [<ffffffff8135fb6b>] scsi_error_handler+0x35b/0x660
 [<ffffffff8135f810>] ? scsi_error_handler+0x0/0x660
 [<ffffffff810908c6>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81090830>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 00 00 eb d1 4c 8b 2d 3c 8f 97 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00
RIP  [<ffffffff8124e424>] blk_requeue_request+0x94/0xa0
 RSP <ffff881057eefd60>

The RIP is this line:
        BUG_ON(blk_queued_rq(rq));

After digging through the code, I think there may be a race between the
request completion and the timer handler running.

A timer is started for each request put on the device's queue (see
blk_start_request->blk_add_timer).  If the request does not complete
before the timer expires, the timer handler (blk_rq_timed_out_timer)
will mark the request complete atomically:

static inline int blk_mark_rq_complete(struct request *rq)
{
        return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
}

and then call blk_rq_timed_out.  The latter function will call
scsi_times_out, which will return one of BLK_EH_HANDLED,
BLK_EH_RESET_TIMER or BLK_EH_NOT_HANDLED.  If BLK_EH_RESET_TIMER is
returned, blk_clear_rq_complete is called, and blk_add_timer is again
called to simply wait longer for the request to complete.

Now, if the request happens to complete while this is going on, what
happens?  Given that we know the completion handler will bail if it
finds the REQ_ATOM_COMPLETE bit set, we need to focus on the completion
handler running after that bit is cleared.  So, from the above
paragraph, after the call to blk_clear_rq_complete.  If the completion
sets REQ_ATOM_COMPLETE before the BUG_ON in blk_add_timer, we go boom
there (I haven't seen this in the cores).  Next, if we get the
completion before the call to list_add_tail, then the timer will
eventually fire for an old req, which may either be freed or reallocated
(there is evidence that this might be the case).  Finally, if the
completion comes in *after* the addition to the timeout list, I think
it's harmless.  The request will be removed from the timeout list,
req_atom_complete will be set, and all will be well.

This will only actually explain the coredumps *IF* the request
structure was freed, reallocated *and* queued before the error handler
thread had a chance to process it.  That is possible, but it may make
sense to keep digging for another race.  I think that if this is what
was happening, we would see other instances of this problem showing up
as null pointer or garbage pointer dereferences, for example when the
request structure was not re-used.  It looks like we actually do run
into that situation in other reports.

This patch moves the BUG_ON(test_bit(REQ_ATOM_COMPLETE,
&req->atomic_flags)); from blk_add_timer to the only caller that could
trip over it (blk_start_request).  It then inverts the calls to
blk_clear_rq_complete and blk_add_timer in blk_rq_timed_out to address
the race.  I've boot tested this patch, but nothing more.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:04 -07:00
Jan Kara
a404d5576b blktrace: Send BLK_TN_PROCESS events to all running traces
Currently each task sends BLK_TN_PROCESS event to the first traced
device it interacts with after a new trace is started. When there are
several traced devices and the task accesses more devices, this logic
can result in BLK_TN_PROCESS being sent several times to some devices
while it is never sent to other devices. Thus blkparse doesn't display
command name when parsing some blktrace files.

Fix the problem by sending BLK_TN_PROCESS event to all traced devices
when a task interacts with any of them.

Signed-off-by: Jan Kara <jack@suse.cz>
Review-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:00 -07:00
Linus Torvalds
5e01dc7b26 Linux 3.12 2013-11-03 15:41:51 -08:00
Linus Torvalds
17f6ee43c3 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
 "Three fixes across arch/mips with the most complex one being the GIC
  interrupt fix - at nine lines still not monster.  I'm confident this
  are the final MIPS patches even if there should go for an rc8"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: ralink: fix return value check in rt_timer_probe()
  MIPS: malta: Fix GIC interrupt offsets
  MIPS: Perf: Fix 74K cache map
2013-11-03 11:36:41 -08:00
Mathias Krause
9bf76ca325 ipc, msg: forbid negative values for "msg{max,mnb,mni}"
Negative message lengths make no sense -- so don't do negative queue
lenghts or identifier counts. Prevent them from getting negative.

Also change the underlying data types to be unsigned to avoid hairy
surprises with sign extensions in cases where those variables get
evaluated in unsigned expressions with bigger data types, e.g size_t.

In case a user still wants to have "unlimited" sizes she could just use
INT_MAX instead.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-03 10:53:11 -08:00
Linus Torvalds
9dc8c89dfb Last minute perf unbreakage for ARM modules; spent a day in linux-next.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSdC8JAAoJENkgDmzRrbjx6TEP/jLxlenB07cjc8SnPD1m6htT
 DLFr/t1Jz9qr10EzT9wbNlUtrVhw5HRztd3SDGc8I0OivvAzIxMTVv67CYXHXKMH
 ycrjUOIVnu0Dg/+R1QoP9KFG9aR5UIK0kctTWBhn7ayR0TJfoCN7x7fdWYZP2k6g
 QEXYUvAAaOp9Kg/0oZQ8bXmBVsdoRd3s3PGZUxcOqjD+kK3t+hczJtu6sYgLQvBK
 85Pi+oW1/RZ0ewL89BbU80zwn2vfObtS09sbkxlLKuXFdpp+Veay9Ps6imTzI6PB
 A953j3FwbxpCpwiFOSmF8Ba/7JU4cBW6asM7ZnDTRXkJhlIWxtr0DJxP++vGMo1u
 IZsZRJcwVIQJg1+WvLgw9/aVx1+Yz560jaFJYfr/Lvq6wgNimWtCmyQ8SlFsLxYm
 FO0QuImTGo1iB13hLZ4guMm73WxHGhIXn/29FHq5UmGG/FrCSR1YUq9thP221DrP
 MD3ufvTHvZv9Vb6kzMm2A6oM0Z/HsTATIyirryAPQC1sL3ExwMQibNmGBnV2lPL+
 veaVG4QuPiyHhUDigIxqaltGsCXC/MLv8GbwlqcKWYQENHrQ5qtQxMd2sBmALP7f
 mu4qwcbDBVPqJuCj/+H5QbRLiDfTWmKttUWGRwSIwFJDD/uCvl7yfqzFoqckAJql
 xCtWjFzu3NpM/T3xF39j
 =O8nP
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull ARM kallsyms fix from Rusty Russell:
 "Last minute perf unbreakage for ARM modules; spent a day in
  linux-next"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  scripts/kallsyms: filter symbols not in kernel address space
2013-11-02 10:27:29 -07:00
Vineet Gupta
9c41f4eeb9 ARC: Incorrect mm reference used in vmalloc fault handler
A vmalloc fault needs to sync up PGD/PTE entry from init_mm to current
task's "active_mm".  ARC vmalloc fault handler however was using mm.

A vmalloc fault for non user task context (actually pre-userland, from
init thread's open for /dev/console) caused the handler to deref NULL mm
(for mm->pgd)

The reasons it worked so far is amazing:

1. By default (!SMP), vmalloc fault handler uses a cached value of PGD.
   In SMP that MMU register is repurposed hence need for mm pointer deref.

2. In pre-3.12 SMP kernel, the problem triggering vmalloc didn't exist in
   pre-userland code path - it was introduced with commit 20bafb3d23
   "n_tty: Move buffers into n_tty_data"

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: stable@vger.kernel.org    #3.10 and 3.11
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-02 10:27:04 -07:00
Ming Lei
f6537f2f0e scripts/kallsyms: filter symbols not in kernel address space
This patch uses CONFIG_PAGE_OFFSET to filter symbols which
are not in kernel address space because these symbols are
generally for generating code purpose and can't be run at
kernel mode, so we needn't keep them in /proc/kallsyms.

For example, on ARM there are some symbols which may be
linked in relocatable code section, then perf can't parse
symbols any more from /proc/kallsyms, this patch fixes the
problem (introduced b9b32bf70f)

Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@vger.kernel.org
2013-11-02 09:13:02 +10:30
Linus Torvalds
9581b7d268 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two fixes:

   - Fix 'NMI handler took too long to run' false positives

     [ Genuine NMI overhead speedups will come for v3.13, this commit
       only fixes a measurement bug ]

   - Fix perf ring-buffer missed barrier causing (rare) ring-buffer data
     corruption on ppc64"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix NMI measurements
  perf: Fix perf ring buffer memory ordering
2013-11-01 12:54:51 -07:00
Linus Torvalds
9119e33e50 USB fixes for 3.12-final
Here is a set of patches that revert all of the changes done to the
 pl2303 USB serial driver in the 3.12-rc timeframe, as it turns out they
 break some devices that work just fine on 3.11.  As it's not a good idea
 to break working systems, drop them all and they will be reworked for
 future kernel versions such that there is no breakage.
 
 I've also included a MAINTAINERS update for the USB serial subsystem and
 a new device id for the ftdi_sio driver as well.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJz3VcACgkQMUfUDdst+ylxEQCgwiApcUC8e01hmRsuQxrbTC+v
 2P8AnA6PIkyK5vO6+8BCy4YDx93dIwyQ
 =Tn7c
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here is a set of patches that revert all of the changes done to the
  pl2303 USB serial driver in the 3.12-rc timeframe, as it turns out
  they break some devices that work just fine on 3.11.  As it's not a
  good idea to break working systems, drop them all and they will be
  reworked for future kernel versions such that there is no breakage.

  I've also included a MAINTAINERS update for the USB serial subsystem
  and a new device id for the ftdi_sio driver as well"

* tag 'usb-3.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: ftdi_sio: add id for Z3X Box device
  USB: Maintainers change for usb serial drivers
  Revert "USB: pl2303: restrict the divisor based baud rate encoding method to the "HX" chip type"
  Revert "usb: pl2303: fix+improve the divsor based baud rate encoding method"
  Revert "usb: pl2303: do not round to the next nearest standard baud rate for the divisor based baud rate encoding method"
  Revert "usb: pl2303: remove 500000 baud from the list of standard baud rates"
  Revert "usb: pl2303: move the two baud rate encoding methods to separate functions"
  Revert "usb: pl2303: increase the allowed baud rate range for the divisor based encoding method"
  Revert "usb: pl2303: also use the divisor based baud rate encoding method for baud rates < 115200 with HX chips"
  Revert "usb: pl2303: add two comments concerning the supported baud rates with HX chips"
  Revert "pl2303: simplify the else-if contruct for type_1 chips in pl2303_startup()"
  Revert "pl2303: improve the chip type information output on startup"
  Revert "pl2303: improve the chip type detection/distinction"
  Revert "USB: pl2303: distinguish between original and cloned HX chips"
2013-11-01 12:23:56 -07:00
Linus Torvalds
f9adfbfbf3 sound fixes #2 for 3.12-final
The fixes for random bugs that have been reported lately in the game:
 a few fixes in ASoC dpam and wm_hubs bugs spotted by Coverity, a
 one-liner HD-audio fixup, and a fix for Oops with DPCM.
 
 They are not so critically urgent bugs, but all small and safe.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJSc3xXAAoJEGwxgFQ9KSmkeRIP/R7+E7qukg+WOfB5DWr1QOkW
 7Sly5OsqwT6sEy7uXtYZRA1Y+kelLIew2uoSHkGibSmvlBCLQZkG4PWKBpL0jQRZ
 /y64puytmZTo8RVrxsVIkjJJlFhZ5QCzyahxJO2W0BAnDAxLJ3u6abb1M7qeF+zi
 kn8Ad+PRTUmp1ilJQuyl4A+67DJllk61whPkQFCTeZUVPTkWtiIHyicAkmJiptFz
 +zQSVa4hvoZzM8Pw50N+OhmMlWrvgOfEYg8qhv6zE7owhEeKQCuR5HW8QQdKtN4F
 WQeUgRNZYWYHHYGx+Cn1+tjZV8CmH51JKrsbj+xedmWJ4xyeoX2dURHEzZOltpV3
 FGnmttt/2o6bvZt0kwbYdsrGB+aW8bNV20SC036VUYJxXNdJTx3kLa7O9aAbANcs
 HSsTqvH2fqFqLWtjP7yXi+PdrIpap0WDLkQh1S3FNZXquWq9nc9rT0ZUTnoh+1ib
 hx1BCglTtQzprPW/iQ/Kt0Y3jRMuVcPFzuYurz7NVmefxBtSiNFjnX84QTbO6sjk
 Pq+8efxjlQYL4XVDGhCygaiTZmcEQSwH57M6y5Fwl163v0CBdWlr9oicmSyOh67c
 xiSQPrLXTPAsa5okzV09d47Gv4gS5QPuDyDcpR5oY07wCd+XDRsSTnMJzj8c8wLB
 8jf67Z2SDjC3b2Ai4HOd
 =6MXa
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull more sound fixes from Takashi Iwai:
 "The fixes for random bugs that have been reported lately in the game:
  a few fixes in ASoC dpam and wm_hubs bugs spotted by Coverity, a
  one-liner HD-audio fixup, and a fix for Oops with DPCM.

  They are not so critically urgent bugs, but all small and safe"

* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: fix oops in snd_pcm_info() caused by ASoC DPCM
  ASoC: wm_hubs: Add missing break in hp_supply_event()
  ALSA: hda - Add a fixup for ASUS N76VZ
  ASoC: dapm: Return -ENOMEM in snd_soc_dapm_new_dai_widgets()
  ASoC: dapm: Fix source list debugfs outputs
2013-11-01 12:23:22 -07:00
Linus Torvalds
68e952d5f9 Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux
Pull clock subsystem fixes from Mike Turquette.

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux:
  clk: fixup argument order when setting VCO parameters
  clk: socfpga: Fix incorrect sdmmc clock name
  clk: armada-370: fix tclk frequencies
  clk: nomadik: set all timers to use 2.4 MHz TIMCLK
2013-11-01 12:22:47 -07:00
Greg Thelen
6920a1bd03 memcg: remove incorrect underflow check
When a memcg is deleted mem_cgroup_reparent_charges() moves charged
memory to the parent memcg.  As of v3.11-9444-g3ea67d0 "memcg: add per
cgroup writeback pages accounting" there's bad pointer read.  The goal
was to check for counter underflow.  The counter is a per cpu counter
and there are two problems with the code:

 (1) per cpu access function isn't used, instead a naked pointer is used
     which easily causes oops.
 (2) the check doesn't sum all cpus

Test:
  $ cd /sys/fs/cgroup/memory
  $ mkdir x
  $ echo 3 > /proc/sys/vm/drop_caches
  $ (echo $BASHPID >> x/tasks && exec cat) &
  [1] 7154
  $ grep ^mapped x/memory.stat
  mapped_file 53248
  $ echo 7154 > tasks
  $ rmdir x
  <OOPS>

The fix is to remove the check.  It's currently dangerous and isn't
worth fixing it to use something expensive, such as
percpu_counter_sum(), for each reparented page.  __this_cpu_read() isn't
enough to fix this because there's no guarantees of the current cpus
count.  The only guarantees is that the sum of all per-cpu counter is >=
nr_pages.

Fixes: 3ea67d06e4 ("memcg: add per cgroup writeback pages accounting")
Reported-and-tested-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Sha Zhengju <handai.szj@taobao.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-01 12:22:28 -07:00
Алексей Крамаренко
e1466ad5b1 USB: serial: ftdi_sio: add id for Z3X Box device
Custom VID/PID for Z3X Box device, popular tool for cellphone flashing.

Signed-off-by: Alexey E. Kramarenko <alexeyk13@yandex.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:33:56 -07:00
Greg KH
f896b7968b USB: Maintainers change for usb serial drivers
Johan has been conned^Wgracious in accepting the maintainership of the
USB serial drivers, especially as he's been doing all of the real work
for the past few years.

At the same time, remove a bunch of old entries for USB serial drivers
that don't make sense anymore, given that the developers are no longer
around, and individual driver maintainerships for tiny things like this
is pretty pointless.

Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:20:37 -07:00
Greg Kroah-Hartman
54dc5792ea Revert "USB: pl2303: restrict the divisor based baud rate encoding method to the "HX" chip type"
This reverts commit b8bdad6082.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:19:56 -07:00
Greg Kroah-Hartman
1796a22876 Revert "usb: pl2303: fix+improve the divsor based baud rate encoding method"
This reverts commit 57ce61aad7.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:19:45 -07:00
Greg Kroah-Hartman
7e12a6fcbf Revert "usb: pl2303: do not round to the next nearest standard baud rate for the divisor based baud rate encoding method"
This reverts commit 75417d9f99.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:19:34 -07:00
Greg Kroah-Hartman
336b9daf90 Revert "usb: pl2303: remove 500000 baud from the list of standard baud rates"
This reverts commit b9208c721c.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:19:24 -07:00
Greg Kroah-Hartman
692ed4ddf0 Revert "usb: pl2303: move the two baud rate encoding methods to separate functions"
This reverts commit e917ba01d6.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:19:03 -07:00
Greg Kroah-Hartman
92dfe41088 Revert "usb: pl2303: increase the allowed baud rate range for the divisor based encoding method"
This reverts commit b5c16c6a03.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:18:47 -07:00
Greg Kroah-Hartman
e2afb1d666 Revert "usb: pl2303: also use the divisor based baud rate encoding method for baud rates < 115200 with HX chips"
This reverts commit 61fa8d694b.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:18:38 -07:00
Greg Kroah-Hartman
233c3dda5c Revert "usb: pl2303: add two comments concerning the supported baud rates with HX chips"
This reverts commit c23bda365d.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:18:25 -07:00
Greg Kroah-Hartman
281393ad0b Revert "pl2303: simplify the else-if contruct for type_1 chips in pl2303_startup()"
This reverts commit 73b583af59.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:18:10 -07:00
Greg Kroah-Hartman
b52e111363 Revert "pl2303: improve the chip type information output on startup"
This reverts commit a77a8c23e4.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:17:50 -07:00
Greg Kroah-Hartman
e8bbd5c42b Revert "pl2303: improve the chip type detection/distinction"
This reverts commit 034d1527ad.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:16:09 -07:00
Greg Kroah-Hartman
09169197c9 Revert "USB: pl2303: distinguish between original and cloned HX chips"
This reverts commit 7d26a78f62.

Revert all of the pl2303 changes that went into 3.12-rc1 and -rc2 as
they cause regressions on some versions of the chip.  This will all be
revisited for later kernel versions when we can figure out how to handle
this in a way that does not break working devices.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 09:12:52 -07:00
Linus Torvalds
4f794ee8c4 Merge branch 'akpm' (fixes from Andrew Morton)
Merge four more fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  lib/scatterlist.c: don't flush_kernel_dcache_page on slab page
  mm: memcg: fix test for child groups
  mm: memcg: lockdep annotation for memcg OOM lock
  mm: memcg: use proper memcg in limit bypass
2013-10-31 16:58:23 -07:00
Ming Lei
3d77b50c58 lib/scatterlist.c: don't flush_kernel_dcache_page on slab page
Commit b1adaf65ba ("[SCSI] block: add sg buffer copy helper
functions") introduces two sg buffer copy helpers, and calls
flush_kernel_dcache_page() on pages in SG list after these pages are
written to.

Unfortunately, the commit may introduce a potential bug:

 - Before sending some SCSI commands, kmalloc() buffer may be passed to
   block layper, so flush_kernel_dcache_page() can see a slab page
   finally

 - According to cachetlb.txt, flush_kernel_dcache_page() is only called
   on "a user page", which surely can't be a slab page.

 - ARCH's implementation of flush_kernel_dcache_page() may use page
   mapping information to do optimization so page_mapping() will see the
   slab page, then VM_BUG_ON() is triggered.

Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
before calling flush_kernel_dcache_page().

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Tested-by: Simon Baatz <gmbnomis@gmail.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>	[3.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Johannes Weiner
696ac172ff mm: memcg: fix test for child groups
When memcg code needs to know whether any given memcg has children, it
uses the cgroup child iteration primitives and returns true/false
depending on whether the iteration loop is executed at least once or
not.

Because a cgroup's list of children is RCU protected, these primitives
require the RCU read-lock to be held, which is not the case for all
memcg callers.  This results in the following splat when e.g.  enabling
hierarchy mode:

  WARNING: CPU: 3 PID: 1 at kernel/cgroup.c:3043 css_next_child+0xa3/0x160()
  CPU: 3 PID: 1 Comm: systemd Not tainted 3.12.0-rc5-00117-g83f11a9-dirty #18
  Hardware name: LENOVO 3680B56/3680B56, BIOS 6QET69WW (1.39 ) 04/26/2012
  Call Trace:
    dump_stack+0x54/0x74
    warn_slowpath_common+0x78/0xa0
    warn_slowpath_null+0x1a/0x20
    css_next_child+0xa3/0x160
    mem_cgroup_hierarchy_write+0x5b/0xa0
    cgroup_file_write+0x108/0x2a0
    vfs_write+0xbd/0x1e0
    SyS_write+0x4c/0xa0
    system_call_fastpath+0x16/0x1b

In the memcg case, we only care about children when we are attempting to
modify inheritable attributes interactively.  Racing with deletion could
mean a spurious -EBUSY, no problem.  Racing with addition is handled
just fine as well through the memcg_create_mutex: if the child group is
not on the list after the mutex is acquired, it won't be initialized
from the parent's attributes until after the unlock.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Johannes Weiner
0056f4e66a mm: memcg: lockdep annotation for memcg OOM lock
The memcg OOM lock is a mutex-type lock that is open-coded due to
memcg's special needs.  Add annotations for lockdep coverage.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Johannes Weiner
3168ecbe1c mm: memcg: use proper memcg in limit bypass
Commit 84235de394 ("fs: buffer: move allocation failure loop into the
allocator") allowed __GFP_NOFAIL allocations to bypass the limit if they
fail to reclaim enough memory for the charge.  But because the main test
case was on a 3.2-based system, the patch missed the fact that on newer
kernels the charge function needs to return root_mem_cgroup when
bypassing the limit, and not NULL.  This will corrupt whatever memory is
at NULL + percpu pointer offset.  Fix this quickly before problems are
reported.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Linus Torvalds
358eec1824 vfs: decrapify dput(), fix cache behavior under normal load
We do not want to dirty the dentry->d_flags cacheline in dput() just to
set the DCACHE_REFERENCED flag when it is already set in the common case
anyway.  This way the first cacheline of the dentry (which contains the
RCU lookup information etc) can stay shared among multiple CPU's.

This finishes off some of the details of all the scalability patches
merged during the merge window.

Also don't mark dentry_kill() for inlining, since it's the uncommon path
and inlining it just makes the common path slower due to extra function
entry/exit overhead.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 15:43:02 -07:00
Linus Torvalds
0baab4fd6d i915: fix compiler warning
The last i915 drm update brought with it this annoying warning

  drivers/gpu/drm/i915/intel_crt.c: In function ‘intel_crt_get_config’:
  drivers/gpu/drm/i915/intel_crt.c:110:21: warning: unused variable ‘dev’ [-Wunused-variable]
    struct drm_device *dev = encoder->base.dev;
                       ^

introduced by commit 7195a50b5c ("drm/i915: Add HSW CRT output readout
support").

Remove the offending pointless variable.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 15:28:23 -07:00
Linus Torvalds
52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NUMA balancing memory corruption fixes from Ingo Molnar:
 "So these fixes are definitely not something I'd like to sit on, but as
  I said to Mel at the KS the timing is quite tight, with Linus planning
  v3.12-final within a week.

  Fedora-19 is affected:

   comet:~> grep NUMA_BALANCING /boot/config-3.11.3-201.fc19.x86_64

   CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
   CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
   CONFIG_NUMA_BALANCING=y

  AFAICS Ubuntu will be affected as well, once it updates the kernel:

   hubble:~> grep NUMA_BALANCING /boot/config-3.8.0-32-generic

   CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
   CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
   CONFIG_NUMA_BALANCING=y

  These 6 commits are a minimalized set of cherry-picks needed to fix
  the memory corruption bugs.  All commits are fixes, except "mm: numa:
  Sanitize task_numa_fault() callsites" which is a cleanup that made two
  followup fixes simpler.

  I've done targeted testing with just this SHA1 to try to make sure
  there are no cherry-picking artifacts.  The original non-cherry-picked
  set of fixes were exposed to linux-next for a couple of weeks"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mm: Account for a THP NUMA hinting update as one PTE update
  mm: Close races between THP migration and PMD numa clearing
  mm: numa: Sanitize task_numa_fault() callsites
  mm: Prevent parallel splits during THP migration
  mm: Wait for THP migrations to complete during NUMA hinting faults
  mm: numa: Do not account for a hinting fault if we raced
2013-10-31 15:21:26 -07:00
Linus Torvalds
026f8f612a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 "A bit later than I would want, but the changes are very minor - a few
  new device IDs for new hardware in existing drivers, fix for battery
  in Wacom devices not be considered system battery and cause emergency
  hibernations, and a couple of other bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ALPS - add support for model found on Dell XT2
  Input: wacom - add support for ISDv4 0x10E sensor
  Input: wacom - add support for ISDv4 0x10F sensor
  Input: wacom - export battery scope
  Input: cm109 - convert high volume dev_err() to dev_err_ratelimited()
  Input: move name/timer init to input_alloc_dev()
  Input: i8042 - i8042_flush fix for a full 8042 buffer
  Input: pxa27x_keypad - fix NULL pointer dereference
2013-10-31 10:38:59 -07:00
Linus Torvalds
e7647027bd Last-minute ACPI and power management fixes for 3.12
- Revert epoll and select commits related to the freezer, introduced
    during the 3.11 cycle, that cause mysterious user space breakage
    to occur during resume from suspend to RAM for multiple users of
    32-bit x86 systems.  Material for 3.11.y stable kernels.
 
  - Revert a recent ACPI-based PCI hotplug (ACPIPHP) commit that was
    part of boot problem fixes for one machine, but turns out to cause
    issues with hotplug on Thunderbolt chains with multiple devices.
    It also turns out to be unnecessary after another fix in the
    same area that went in later.  From Mika Westerberg.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSclBPAAoJEILEb/54YlRxB0IQAI9Iq9VxxeGyXyK6xtEsZcoK
 PCEkuGBrZf0Ih6dyuijQ3PEQvTLIIGSzJmnkFdpn5k9YFF3GiUg+sIs1wFnry1lA
 LKxhsaRyXMv8J1yoByJYczRZNwtFSNKAC0U78BSLALZGqlw+kbVrcb7sYg1TodUJ
 nYKogS4SphSv3Jjccbw9PSUsW2FHJ6x0wUJ4WXPFQE+mO7VH5wGTmbv6lK5wOrKZ
 JAMW1RxkVffW3NLEIJHZ0sFFjn4Cp0oo6o/bV+rQx5XEHRfLDUSOMezpQim/+qPf
 NVHN1+8ydFL6bjpm+nX2Mtvy9j8x+E1iRp4EVpQOst2JaxpHXTkGWGKm8qxQ/eRO
 ksA0M09neoQKlN8uKcQ1Ank8BpX02ZKrZ3Jx/VwCvLQRk1UYAy5442WsTmnvOGlO
 /WFkhbqYFqAxngCVdiTT1ptdeT0a4VTtIZBXmA3j7qHrbo3RKSFDFDX8/82VeU8W
 GjMlFooQ6p4HgzBxNsPkBgI9d45U9lpRgdmBZK3T95AbFk+az79hm94/c4LUKnfk
 KsTOhhZ+B29LFHS9Dqz0jGkW1mXj3rXNE5Z01VA1qeWxWT6xdGsnLlgbJdvJwyZS
 lHP91ELAmPKnoxd3E36nocRRLAMZ9zq1b4kPJY155ig866kDpUaNqWjojZYBkHdC
 qGfRTBw0/35CEhdHEvgB
 =8S79
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.12-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael J Wysocki:
 "Last-minute ACPI and power management fixes for 3.12

   - Revert epoll and select commits related to the freezer, introduced
     during the 3.11 cycle, that cause mysterious user space breakage to
     occur during resume from suspend to RAM for multiple users of
     32-bit x86 systems.  Material for 3.11.y stable kernels.

   - Revert a recent ACPI-based PCI hotplug (ACPIPHP) commit that was
     part of boot problem fixes for one machine, but turns out to cause
     issues with hotplug on Thunderbolt chains with multiple devices.
     It also turns out to be unnecessary after another fix in the same
     area that went in later.  From Mika Westerberg"

* tag 'pm+acpi-3.12-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI / hotplug / PCI: Avoid doing too much for spurious notifies"
  Revert "select: use freezable blocking call"
  Revert "epoll: use freezable blocking call"
2013-10-31 10:13:28 -07:00
Russell King
a4461f41b9 ALSA: fix oops in snd_pcm_info() caused by ASoC DPCM
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = d5300000
[00000008] *pgd=0d265831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT ARM
CPU: 0 PID: 2295 Comm: vlc Not tainted 3.11.0+ #755
task: dee74800 ti: e213c000 task.ti: e213c000
PC is at snd_pcm_info+0xc8/0xd8
LR is at 0x30232065
pc : [<c031b52c>]    lr : [<30232065>]    psr: a0070013
sp : e213dea8  ip : d81cb0d0  fp : c05f7678
r10: c05f7770  r9 : fffffdfd  r8 : 00000000
r7 : d8a968a8  r6 : d8a96800  r5 : d8a96200  r4 : d81cb000
r3 : 00000000  r2 : d81cb000  r1 : 00000001  r0 : d8a96200
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 15300019  DAC: 00000015
Process vlc (pid: 2295, stack limit = 0xe213c248)
[<c031b52c>] (snd_pcm_info) from [<c031b570>] (snd_pcm_info_user+0x34/0x9c)
[<c031b570>] (snd_pcm_info_user) from [<c03164a4>] (snd_pcm_control_ioctl+0x274/0x280)
[<c03164a4>] (snd_pcm_control_ioctl) from [<c0311458>] (snd_ctl_ioctl+0xc0/0x55c)
[<c0311458>] (snd_ctl_ioctl) from [<c00eca84>] (do_vfs_ioctl+0x80/0x31c)
[<c00eca84>] (do_vfs_ioctl) from [<c00ecd5c>] (SyS_ioctl+0x3c/0x60)
[<c00ecd5c>] (SyS_ioctl) from [<c000e500>] (ret_fast_syscall+0x0/0x48)
Code: e1a00005 e59530dc e3a01001 e1a02004 (e5933008)
---[ end trace cb3d9bdb8dfefb3c ]---

This is provoked when the ASoC front end is open along with its backend,
(which causes the backend to have a runtime assigned to it) and then the
SNDRV_CTL_IOCTL_PCM_INFO is requested for the (visible) backend device.

Resolve this by ensuring that ASoC internal backend devices are not
visible to userspace, just as the commentry for snd_pcm_new_internal()
says it should be.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Mark Brown <broonie@linaro.org>
Cc: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-10-31 17:36:47 +01:00
Wei Yongjun
cd5d58108e MIPS: ralink: fix return value check in rt_timer_probe()
In case of error, the function devm_request_and_ioremap() returns NULL
pointer not ERR_PTR(). Fix it by using devm_ioremap_resource() instead
of devm_request_and_ioremap().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: John Crispin <blogic@openwrt.org>
Cc: grant.likely@linaro.org
Cc: rob.herring@calxeda.com
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6098/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-10-31 12:38:34 +01:00
Yunkang Tang
5beea882e6 Input: ALPS - add support for model found on Dell XT2
This patch adds support for touchpad found on Dell XT2. It's a dual device
with device ID: 73, 00, 14, that comply with "ALPS_PROTO_V2".

Signed-off-by: Yunkang Tang <yunkang.tang@cn.alps.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2013-10-31 00:59:20 -07:00
Dave Airlie
74c85e1357 Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a few small fixes for radeon (audio regression fix,
stability fix, and an endian bug noticed by coverity).

* 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon/dpm: fix incompatible casting on big endian
  drm/radeon: disable bapm on KB
  drm/radeon: use sw CTS/N values for audio on DCE4+
2013-10-31 15:29:10 +10:00
Linus Torvalds
12aee278b5 Merge branch 'akpm' (fixes from Andrew Morton)
Merge three fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting
  percpu: fix this_cpu_sub() subtrahend casting for unsigneds
  mm/pagewalk.c: fix walk_page_range() access of wrong PTEs
2013-10-30 14:27:10 -07:00
Greg Thelen
5e8cfc3c75 memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting
As of commit 3ea67d06e4 ("memcg: add per cgroup writeback pages
accounting") memcg counter errors are possible when moving charged
memory to a different memcg.  Charge movement occurs when processing
writes to memory.force_empty, moving tasks to a memcg with
memcg.move_charge_at_immigrate=1, or memcg deletion.

An example showing error after memory.force_empty:

  $ cd /sys/fs/cgroup/memory
  $ mkdir x
  $ rm /data/tmp/file
  $ (echo $BASHPID >> x/tasks && exec mmap_writer /data/tmp/file 1M) &
  [1] 13600
  $ grep ^mapped x/memory.stat
  mapped_file 1048576
  $ echo 13600 > tasks
  $ echo 1 > x/memory.force_empty
  $ grep ^mapped x/memory.stat
  mapped_file 4503599627370496

mapped_file should end with 0.
  4503599627370496 == 0x10,0000,0000,0000 == 0x100,0000,0000 pages
  1048576          == 0x10,0000           == 0x100 pages

This issue only affects the source memcg on 64 bit machines; the
destination memcg counters are correct.  So the rmdir case is not too
important because such counters are soon disappearing with the entire
memcg.  But the memcg.force_empty and memory.move_charge_at_immigrate=1
cases are larger problems as the bogus counters are visible for the
(possibly long) remaining life of the source memcg.

The problem is due to memcg use of __this_cpu_from(.., -nr_pages), which
is subtly wrong because it subtracts the unsigned int nr_pages (either
-1 or -512 for THP) from a signed long percpu counter.  When
nr_pages=-1, -nr_pages=0xffffffff.  On 64 bit machines stat->count[idx]
is signed 64 bit.  So memcg's attempt to simply decrement a count (e.g.
from 1 to 0) boils down to:

  long count = 1
  unsigned int nr_pages = 1
  count += -nr_pages  /* -nr_pages == 0xffff,ffff */
  count is now 0x1,0000,0000 instead of 0

The fix is to subtract the unsigned page count rather than adding its
negation.  This only works once "percpu: fix this_cpu_sub() subtrahend
casting for unsigneds" is applied to fix this_cpu_sub().

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Greg Thelen
bd09d9a351 percpu: fix this_cpu_sub() subtrahend casting for unsigneds
this_cpu_sub() is implemented as negation and addition.

This patch casts the adjustment to the counter type before negation to
sign extend the adjustment.  This helps in cases where the counter type
is wider than an unsigned adjustment.  An alternative to this patch is
to declare such operations unsupported, but it seemed useful to avoid
surprises.

This patch specifically helps the following example:
  unsigned int delta = 1
  preempt_disable()
  this_cpu_write(long_counter, 0)
  this_cpu_sub(long_counter, delta)
  preempt_enable()

Before this change long_counter on a 64 bit machine ends with value
0xffffffff, rather than 0xffffffffffffffff.  This is because
this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
which is basically:
  long_counter = 0 + 0xffffffff

Also apply the same cast to:
  __this_cpu_sub()
  __this_cpu_sub_return()
  this_cpu_sub_return()

All percpu_test.ko passes, especially the following cases which
previously failed:

  l -= ui_one;
  __this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);

  l -= ui_one;
  this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);
  CHECK(l, long_counter, 0xffffffffffffffff);

  ul -= ui_one;
  __this_cpu_sub(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, -1);
  CHECK(ul, ulong_counter, 0xffffffffffffffff);

  ul = this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 2);

  ul = __this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 1);

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Chen LinX
3017f079ef mm/pagewalk.c: fix walk_page_range() access of wrong PTEs
When walk_page_range walk a memory map's page tables, it'll skip
VM_PFNMAP area, then variable 'next' will to assign to vma->vm_end, it
maybe larger than 'end'.  In next loop, 'addr' will be larger than
'next'.  Then in /proc/XXXX/pagemap file reading procedure, the 'addr'
will growing forever in pagemap_pte_range, pte_to_pagemap_entry will
access the wrong pte.

  BUG: Bad page map in process procrank  pte:8437526f pmd:785de067
  addr:9108d000 vm_flags:00200073 anon_vma:f0d99020 mapping:  (null) index:9108d
  CPU: 1 PID: 4974 Comm: procrank Tainted: G    B   W  O 3.10.1+ #1
  Call Trace:
    dump_stack+0x16/0x18
    print_bad_pte+0x114/0x1b0
    vm_normal_page+0x56/0x60
    pagemap_pte_range+0x17a/0x1d0
    walk_page_range+0x19e/0x2c0
    pagemap_read+0x16e/0x200
    vfs_read+0x84/0x150
    SyS_read+0x4a/0x80
    syscall_call+0x7/0xb

Signed-off-by: Liu ShuoX <shuox.liu@intel.com>
Signed-off-by: Chen LinX <linx.z.chen@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>	[3.10.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Russell King
c56b097af2 mm: list_lru: fix almost infinite loop causing effective livelock
I've seen a fair number of issues with kswapd and other processes
appearing to get stuck in v3.12-rc.  Using sysrq-p many times seems to
indicate that it gets stuck somewhere in list_lru_walk_node(), called
from prune_icache_sb() and super_cache_scan().

I never seem to be able to trigger a calltrace for functions above that
point.

So I decided to add the following to super_cache_scan():

    @@ -81,10 +81,14 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
            inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid);
            dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid);
            total_objects = dentries + inodes + fs_objects + 1;
    +printk("%s:%u: %s: dentries %lu inodes %lu total %lu\n", current->comm, current->pid, __func__, dentries, inodes, total_objects);

            /* proportion the scan between the caches */
            dentries = mult_frac(sc->nr_to_scan, dentries, total_objects);
            inodes = mult_frac(sc->nr_to_scan, inodes, total_objects);
    +printk("%s:%u: %s: dentries %lu inodes %lu\n", current->comm, current->pid, __func__, dentries, inodes);
    +BUG_ON(dentries == 0);
    +BUG_ON(inodes == 0);

            /*
             * prune the dcache first as the icache is pinned by it, then
    @@ -99,7 +103,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
                    freed += sb->s_op->free_cached_objects(sb, fs_objects,
                                                           sc->nid);
            }
    -
    +printk("%s:%u: %s: dentries %lu inodes %lu freed %lu\n", current->comm, current->pid, __func__, dentries, inodes, freed);
            drop_super(sb);
            return freed;
     }

and shortly thereafter, having applied some pressure, I got this:

    update-apt-xapi:1616: super_cache_scan: dentries 25632 inodes 2 total 25635
    update-apt-xapi:1616: super_cache_scan: dentries 1023 inodes 0
    ------------[ cut here ]------------
    Kernel BUG at c0101994 [verbose debug info unavailable]
    Internal error: Oops - BUG: 0 [#3] SMP ARM
    Modules linked in: fuse rfcomm bnep bluetooth hid_cypress
    CPU: 0 PID: 1616 Comm: update-apt-xapi Tainted: G      D      3.12.0-rc7+ #154
    task: daea1200 ti: c3bf8000 task.ti: c3bf8000
    PC is at super_cache_scan+0x1c0/0x278
    LR is at trace_hardirqs_on+0x14/0x18
    Process update-apt-xapi (pid: 1616, stack limit = 0xc3bf8240)
    ...
    Backtrace:
      (super_cache_scan) from [<c00cd69c>] (shrink_slab+0x254/0x4c8)
      (shrink_slab) from [<c00d09a0>] (try_to_free_pages+0x3a0/0x5e0)
      (try_to_free_pages) from [<c00c59cc>] (__alloc_pages_nodemask+0x5)
      (__alloc_pages_nodemask) from [<c00e07c0>] (__pte_alloc+0x2c/0x13)
      (__pte_alloc) from [<c00e3a70>] (handle_mm_fault+0x84c/0x914)
      (handle_mm_fault) from [<c001a4cc>] (do_page_fault+0x1f0/0x3bc)
      (do_page_fault) from [<c001a7b0>] (do_translation_fault+0xac/0xb8)
      (do_translation_fault) from [<c000840c>] (do_DataAbort+0x38/0xa0)
      (do_DataAbort) from [<c00133f8>] (__dabt_usr+0x38/0x40)

Notice that we had a very low number of inodes, which were reduced to
zero my mult_frac().

Now, prune_icache_sb() calls list_lru_walk_node() passing that number of
inodes (0) into that as the number of objects to scan:

    long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan,
                         int nid)
    {
            LIST_HEAD(freeable);
            long freed;

            freed = list_lru_walk_node(&sb->s_inode_lru, nid, inode_lru_isolate,
                                           &freeable, &nr_to_scan);

which does:

    unsigned long
    list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
                       void *cb_arg, unsigned long *nr_to_walk)
    {

            struct list_lru_node    *nlru = &lru->node[nid];
            struct list_head *item, *n;
            unsigned long isolated = 0;

            spin_lock(&nlru->lock);
    restart:
            list_for_each_safe(item, n, &nlru->list) {
                    enum lru_status ret;

                    /*
                     * decrement nr_to_walk first so that we don't livelock if we
                     * get stuck on large numbesr of LRU_RETRY items
                     */
                    if (--(*nr_to_walk) == 0)
                            break;

So, if *nr_to_walk was zero when this function was entered, that means
we're wanting to operate on (~0UL)+1 objects - which might as well be
infinite.

Clearly this is not correct behaviour.  If we think about the behaviour
of this function when *nr_to_walk is 1, then clearly it's wrong - we
decrement first and then test for zero - which results in us doing
nothing at all.  A post-decrement would give the desired behaviour -
we'd try to walk one object and one object only if *nr_to_walk were one.

It also gives the correct behaviour for zero - we exit at this point.

Fixes: 5cedf721a7 ("list_lru: fix broken LRU_RETRY behaviour")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
[ Modified to make sure we never underflow the count: this function gets
  called in a loop, so the 0 -> ~0ul transition is dangerous  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:57:46 -07:00