kernel-ark

Author	SHA1	Message	Date
J. Bruce Fields	03a4e1f6dd	nfsd4: move principal name into svc_cred Instead of keeping the principal name associated with a request in a structure that's private to auth_gss and using an accessor function, move it to svc_cred. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:55 -04:00
Stanislav Kinsbursky	9793f7c889	SUNRPC: new svc_bind() routine introduced This new routine is responsible for service registration in a specified network context. The idea is to separate service creation from per-net operations. Note also: since registering service with svc_bind() can fail, the service will be destroyed and during destruction it will try to unregister itself from rpcbind. In this case unregistration has to be skipped. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:39 -04:00
Marcel Apfelbaum	3fc929e2d6	net/mlx4_core: Fix number of EQs used in ICM initialisation In SRIOV mode, the number of EQs used when computing the total ICM size was incorrect. To fix this, we do the following: 1. We add a new structure to mlx4_dev, mlx4_phys_caps, to contain physical HCA capabilities. The PPF uses the phys capabilities when it computes things like ICM size. The dev_caps structure will then contain the paravirtualized values, making bookkeeping much easier in SRIOV mode. We add a structure rather than a single parameter because there will be other fields in the phys_caps. The first field we add to the mlx4_phys_caps structure is num_phys_eqs. 2. In INIT_HCA, when running in SRIOV mode, the "log_num_eqs" parameter passed to the FW is the number of EQs per VF/PF; each function (PF or VF) has this number of EQs available. However, the total number of EQs which must be allowed for in the ICM is (1 << log_num_eqs) * (#VFs + #PFs). Rather than compute this quantity, we allocate ICM space for 1024 EQs (which is the device maximum number of EQs, and which is the value we place in the mlx4_phys_caps structure). For INIT_HCA, however, we use the per-function number of EQs as described above. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-31 18:18:16 -04:00
Linus Torvalds	76f901eb46	A bunch of fixes for v3.5, nothing extraordinary. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAABAgAGBQJPxr4VAAoJEGgI9fZJve1bmOkP/R+hVC0lHRzbevDiDXVzxx+M XNG3krM73Y6jC9sdIxUj5wU1/BpQ3z6wYNEKKPKeXJoHPW+UJaN+wjhm6+uYQPx/ 6QM7Fkraxcya98I7vKIsz+uVRd9vETMBvgrix6hZ/ec8xO9q62d5ozkXjfG3E4qO 3vUFSihGmeVVGES1BFehIMkLEHRqlEuiUsXwMw71cBaIYATXruzy46iRqS9e3fVS mLc5+Ylvsm9q65wY1djv2Kieq5AuZ1dOgH8du2FYWJED+vogkqRcZTWeJeuZ3HAc ql72WhN2ga2U+xuxypVt+mVl2Gb1pjfE1j802EDjZX6ir1E50iSWpcaKIM1M8th7 kZwCvzcMihtzBaKvZjXf1IVazZyBo8Chi02YNbG3UVWW/rSrYjV9GSSh5HPKfY3A Arw80C4t/I3kiCPr5uwdZZO/D5lsleMF677NEilTmsKZgPSOe9EqbZOTOGPKuALj A/y2SVCV0sPVb2ki8wYlQ5dpNRsz/Uya+I3cqRFW63YGeSyGHm8VysxWRjzzu6LN +n7I2q/3Un95aHGMMIoJ/3crHcdtKSqtXeBKbBSiRdRpyO4b4rVRkLJRARv36V7m eiaduSZJEf5ZrEb3z20FmM2SqjiFHz+d0Q1QH6MQmhwDQ44acKxx/iw6moxGtLTQ JsDneVe+4d3WlJFqXd3s =B6u0 -----END PGP SIGNATURE----- Merge tag 'for-v3.5' of git://git.infradead.org/battery-2.6 Pull battery updates from Anton Vorontsov: "A bunch of fixes for v3.5, nothing extraordinary." * tag 'for-v3.5' of git://git.infradead.org/battery-2.6: (27 commits) smb347-charger: Include missing <linux/err.h> smb347-charger: Clean up battery attributes max17042_battery: Add support for max17047/50 chip sbs-battery.c: Capacity attr = remaining relative capacity isp1704_charger: Use after free on probe error ds2781_battery: Use DS2781_PARAM_EEPROM_SIZE and DS2781_USER_EEPROM_SIZE power_supply: Fix a typo in BATTERY_DS2781 Kconfig entry charger-manager: Provide cm_notify_event function for in-kernel use charger-manager: Poll battery health in normal state smb347-charger: Convert to regmap API smb347-charger: Move IRQ enabling to the end of probe smb347-charger: Rename few functions to match better what they are doing smb347-charger: Convert to use module_i2c_driver() smb347_charger: Cleanup power supply registration code in probe ab8500: Clean up probe routines ab8500_fg: Harden platform data check ab8500_btemp: Harden platform data check ab8500_charger: Harden platform data check MAINTAINERS: Fix 'F' entry for the power supply class max17042_battery: Handle irq request failure case ...	2012-05-31 12:10:15 -07:00
Linus Torvalds	bd0e162d03	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm Pull two small kvm fixes from Avi Kivity: "A build fix for non-kvm archs and a transparent hugepage refcount bugfix on hosts with 4M pages." * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Export asm-generic/kvm_para.h KVM: MMU: fix huge page adapted on non-PAE host	2012-05-31 12:09:07 -07:00
Linus Torvalds	054552272e	SCSI misc on 20120531 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPx1M+AAoJEDeqqVYsXL0MNOMH/jSbgDAHQskBuZMCEoVUHykZ 3aKiPFJQfnF1nQqN/xxECGFc7glrKSHv1fpAG9wDk0HLHNhP+QoOBVYdDGHpzktk eP1hB6rWE/auJz90rIrKomJoD+cVYDRHkhlbNr1DsYBuXI+BGX0aUp+uAaajoxAT 8wp4/Z5007llQQXnep2Z0AvzIWBdCeR4PBXX5YvalJ8Qz3Rj8bYeY10oDpx6nO7v iGcyh+h0Eo+q9KEQ3PosoDnqaskq44yTY4MWeE1Kd64fQM1JYTJo0SxOGGVxHHwQ ZLfhX+fH3jCyBP0qRzCqBvSKTuiWeMBc8POdLbLMnq6ClCgQTr41iHH7UTuXXjE= =fZOy -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull final round of SCSI updates from James Bottomley: "This is primarily another round of driver updates (bnx2fc, qla2xxx, qla4xxx) including the target mode driver for qla2xxx. We've also got a couple of regression fixes (async scanning, broken this merge window and a fix to a long standing break in the scsi_wait_scan module)." * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (45 commits) [SCSI] fix scsi_wait_scan [SCSI] fix async probe regression [SCSI] be2iscsi: fix dma free size mismatch regression [SCSI] qla4xxx: Update driver version to 5.02.00-k17 [SCSI] qla4xxx: Capture minidump for ISP82XX on firmware failure [SCSI] qla4xxx: Add change_queue_depth API support [SCSI] qla4xxx: Fix clear ddb mbx command failure issue. [SCSI] qla4xxx: Fix kernel panic during discovery logout. [SCSI] qla4xxx: Correct early completion of pending mbox. [SCSI] fcoe, bnx2fc, libfcoe: SW FCoE and bnx2fc use FCoE Syfs [SCSI] libfcoe: Add fcoe_sysfs [SCSI] bnx2fc: Allocate fcoe_ctlr with bnx2fc_interface, not as a member [SCSI] fcoe: Allocate fcoe_ctlr with fcoe_interface, not as a member [SCSI] Fix dm-multipath starvation when scsi host is busy [SCSI] ufs: fix potential NULL pointer dereferencing error in ufshcd_prove. [SCSI] qla2xxx: don't free pool that wasn't allocated [SCSI] mptfusion: unlock on error in mpt_config() [SCSI] tcm_qla2xxx: Add >= 24xx series fabric module for target-core [SCSI] qla2xxx: Add LLD target-mode infrastructure for >= 24xx series [SCSI] Revert "qla2xxx: During loopdown perform Diagnostic loopback." ...	2012-05-31 12:02:41 -07:00
Trond Myklebust	1d59d61f60	NFS: Ensure that setattr and getattr wait for O_DIRECT write completion Use the same mechanism as the block devices are using, but move the helper functions from fs/direct-io.c into fs/inode.c to remove the dependency on CONFIG_BLOCK. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Fred Isaman <iisaman@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-31 11:41:36 -07:00
Linus Torvalds	13199a0845	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking changes from David S. Miller: 1) Fix IPSEC header length calculation for transport mode in ESP. The issue is whether to do the calculation before or after alignment. Fix from Benjamin Poirier. 2) Fix regression in IPV6 IPSEC fragment length calculations, from Gao Feng. This is another transport vs tunnel mode issue. 3) Handle AF_UNSPEC connect()s properly in L2TP to avoid OOPSes. Fix from James Chapman. 4) Fix USB ASIX driver's reception of full sized VLAN packets, from Eric Dumazet. 5) Allow drop monitor (and, more generically, all generic netlink protocols) to be automatically loaded as a module. From Neil Horman. Fix up trivial conflict in Documentation/feature-removal-schedule.txt due to new entries added next to each other at the end. As usual. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net/smsc911x: Repair broken failure paths virtio-net: remove useless disable on freeze netdevice: Update netif_dbg for CONFIG_DYNAMIC_DEBUG drop_monitor: Add module alias to enable automatic module loading genetlink: Build a generic netlink family module alias net: add MODULE_ALIAS_NET_PF_PROTO_NAME r6040: Do a Proper deinit at errorpath and also when driver unloads (calling r6040_remove_one) r6040: disable pci device if the subsequent calls (after pci_enable_device) fails skb: avoid unnecessary reallocations in __skb_cow net: sh_eth: fix the rxdesc pointer when rx descriptor empty happens asix: allow full size 8021Q frames to be received rds_rdma: don't assume infiniband device is PCI l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case mac80211: fix ADDBA declined after suspend with wowlan wlcore: fix undefined symbols when CONFIG_PM is not defined mac80211: fix flag check for QoS NOACK frames ath9k_hw: apply internal regulator settings on AR933x ath9k_hw: update AR933x initvals to fix issues with high power devices ath9k: fix a use-after-free-bug when ath_tx_setup_buffer() fails ath9k: stop rx dma before stopping tx ...	2012-05-31 10:32:36 -07:00
Al Viro	e5467859f7	split ->file_mmap() into ->mmap_addr()/->mmap_file() ... i.e. file-dependent and address-dependent checks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-31 13:11:54 -04:00
Al Viro	d007794a18	split cap_mmap_addr() out of cap_file_mmap() ... switch callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-31 13:10:54 -04:00
Naohiro Aota	a4f9a9a635	fsnotify: handle subfiles' perm events Recently I'm working on fanotify and found the following strange behaviors. I wrote a program to set fanotify_mark on "/tmp/block" and FAN_DENY all events notified. fanotify_mask = FAN_ALL_EVENTS \| FAN_ALL_PERM_EVENTS \| FAN_EVENT_ON_CHILD: $ cd /tmp/block; cat foo cat: foo: Operation not permitted Operation on the file is blocked as expected. But, fanotify_mask = FAN_ALL_PERM_EVENTS \| FAN_EVENT_ON_CHILD: $ cd /tmp/block; cat foo aaa It's not blocked anymore. This is confusing behavior. Also reading commit "fsnotify: call fsnotify_parent in perm events", it seems like fsnotify should handle subfiles' perm events as well as the other notify events. With this patch, regardless of FAN_ALL_EVENTS set or not: $ cd /tmp/block; cat foo cat: foo: Operation not permitted Operation on the file is now blocked properly. FS_OPEN_PERM and FS_ACCESS_PERM are not listed on FS_EVENTS_POSS_ON_CHILD. Due to fsnotify_inode_watches_children() check, if you only specify only these events as fsnotify_mask, you don't get subfiles' perm events notified. This patch add the events to FS_EVENTS_POSS_ON_CHILD to get them notified even if only these events are specified to fsnotify_mask. Signed-off-by: Naohiro Aota <naota@elisp.net> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:53 -04:00
Al Viro	bb8ac181a5	bury __kernel_nlink_t, make internal nlink_t consistent Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:50 -04:00
Joe Perches	0053ea9c34	netdevice: Update netif_dbg for CONFIG_DYNAMIC_DEBUG Make netif_dbg use dynamic debugging whenever CONFIG_DYNAMIC_DEBUG is enabled. commit `b558c96ffa` ("dynamic_debug: make dynamic-debug supersede DEBUG ccflag") missed updating the netif_dbg variant. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-30 16:34:27 -04:00
Linus Torvalds	af56e0aa35	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "There are some updates and cleanups to the CRUSH placement code, a bug fix with incremental maps, several cleanups and fixes from Josh Durgin in the RBD block device code, a series of cleanups and bug fixes from Alex Elder in the messenger code, and some miscellaneous bounds checking and gfp cleanups/fixes." Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the networking people preferring "unsigned int" over just "unsigned". * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits) libceph: fix pg_temp updates libceph: avoid unregistering osd request when not registered ceph: add auth buf in prepare_write_connect() ceph: rename prepare_connect_authorizer() ceph: return pointer from prepare_connect_authorizer() ceph: use info returned by get_authorizer ceph: have get_authorizer methods return pointers ceph: ensure auth ops are defined before use ceph: messenger: reduce args to create_authorizer ceph: define ceph_auth_handshake type ceph: messenger: check return from get_authorizer ceph: messenger: rework prepare_connect_authorizer() ceph: messenger: check prepare_write_connect() result ceph: don't set WRITE_PENDING too early ceph: drop msgr argument from prepare_write_connect() ceph: messenger: send banner in process_connect() ceph: messenger: reset connection kvec caller libceph: don't reset kvec in prepare_write_banner() ceph: ignore preferred_osd field ceph: fully initialize new layout ...	2012-05-30 11:17:19 -07:00
Linus Torvalds	42fe55ce90	Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull i2c updates from Jean Delvare. * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: i2c: Split I2C_M_NOSTART support out of I2C_FUNC_PROTOCOL_MANGLING i2c-dev: Add support for I2C_M_RECV_LEN	2012-05-30 10:03:46 -07:00
Linus Torvalds	19ce0a995f	Merge git://www.linux-watchdog.org/linux-watchdog Pull second set of watchdog updates from Wim Van Sebroeck: "This changeset contains following changes: * Add support for multiple watchdog devices. We use dynamically allocated device id's for this. * Add locking into the generic watchdog infrastructure. * Add support for dynamically allocated watchdog_device structs so that we can deal with devices that get unbound. * convert following drivers to the generic watchdog framework: sch5627, sch5636 and sp805_wdt. * Add DA9052/53 PMIC watchdog support * Fix printk format warnings for iTCO_wdt.c" * git://www.linux-watchdog.org/linux-watchdog: watchdog: iTCO_wdt.c: fix printk format warnings watchdog: sp805_wdt: Add clk_{un}prepare support watchdog: sp805_wdt: convert to watchdog core hwmon/sch56xx: Depend on watchdog for watchdog core functions watchdog: sch56xx-common: set correct bits in register() Watchdog: DA9052/53 PMIC watchdog support watchdog: sch56xx-common: Add proper ref-counting of watchdog data watchdog: sch56xx: Remove unnecessary checks for register changes watchdog: sch56xx: Use watchdog core watchdog: Add support for dynamically allocated watchdog_device structs watchdog: Add Locking support watchdog: watchdog_dev: Rewrite wrapper code watchdog: use dev_ functions watchdog: create all the proper device files watchdog: Add a flag to indicate the watchdog doesn't reboot things watchdog: Add multiple device support watchdog: watchdog_core.h: make functions extern watchdog: correct the name of the watchdog_core inlude file watchdog: Add watchdog_active() routine watchdog: watchdog_dev: include private header to pickup global symbol prototypes	2012-05-30 09:59:13 -07:00
Linus Torvalds	a70f35af4e	Merge branch 'for-3.5/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the driver related changes for 3.5. It contains: - The floppy changes from Jiri. Jiri is now also marked as the maintainer of floppy.c, I shall be publically branding his forehead with red hot iron at the next opportune moment. - A batch of drbd updates and fixes from the linbit crew, as well as fixes from others. - Two small fixes for xen-blkfront courtesy of Jan." * 'for-3.5/drivers' of git://git.kernel.dk/linux-block: (70 commits) floppy: take over maintainership floppy: remove floppy-specific O_EXCL handling floppy: convert to delayed work and single-thread wq xen-blkfront: module exit handling adjustments xen-blkfront: properly name all devices drbd: grammar fix in log message drbd: check MODULE for THIS_MODULE drbd: Restore the request restart logic drbd: introduce a bio_set to allocate housekeeping bios from drbd: remove unused define drbd: bm_page_async_io: properly initialize page->private drbd: use the newly introduced page pool for bitmap IO drbd: add page pool to be used for meta data IO drbd: allow bitmap to change during writeout from resync_finished drbd: fix race between drbdadm invalidate/verify and finishing resync drbd: fix resend/resubmit of frozen IO drbd: Ensure that data_size is not 0 before using data_size-1 as index drbd: Delay/reject other state changes while establishing a connection drbd: move put_ldev from __req_mod() to the endio callback drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE ...	2012-05-30 09:05:47 -07:00
Linus Torvalds	0d167518e0	Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block Merge block/IO core bits from Jens Axboe: "This is a bit bigger on the core side than usual, but that is purely because we decided to hold off on parts of Tejun's submission on 3.4 to give it a bit more time to simmer. As a consequence, it's seen a long cycle in for-next. It contains: - Bug fix from Dan, wrong locking type. - Relax splice gifting restriction from Eric. - A ton of updates from Tejun, primarily for blkcg. This improves the code a lot, making the API nicer and cleaner, and also includes fixes for how we handle and tie policies and re-activate on switches. The changes also include generic bug fixes. - A simple fix from Vivek, along with a fix for doing proper delayed allocation of the blkcg stats." Fix up annoying conflict just due to different merge resolution in Documentation/feature-removal-schedule.txt * 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits) blkcg: tg_stats_alloc_lock is an irq lock vmsplice: relax alignement requirements for SPLICE_F_GIFT blkcg: use radix tree to index blkgs from blkcg blkcg: fix blkcg->css ref leak in __blkg_lookup_create() block: fix elvpriv allocation failure handling block: collapse blk_alloc_request() into get_request() blkcg: collapse blkcg_policy_ops into blkcg_policy blkcg: embed struct blkg_policy_data in policy specific data blkcg: mass rename of blkcg API blkcg: style cleanups for blk-cgroup.h blkcg: remove blkio_group->path[] blkcg: blkg_rwstat_read() was missing inline blkcg: shoot down blkgs if all policies are deactivated blkcg: drop stuff unused after per-queue policy activation update blkcg: implement per-queue policy activation blkcg: add request_queue->root_blkg blkcg: make request_queue bypassing on allocation blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing blkcg: make blkg_conf_prep() take @pol and return with queue lock held blkcg: remove static policy ID enums ...	2012-05-30 08:52:42 -07:00
Linus Torvalds	2f83766d4b	IOMMU Updates for Linux 3.5 Not much stuff this time. The only change to the IOMMU core code is the addition of a handle to the fault handling code. A few updates to the AMD IOMMU driver to work around new errata. The other patches are mostly fixes and enhancements to the existing ARM IOMMU drivers and documentation updates. A new IOMMU driver for the Exynos platform was also underway but got merged via the Samsung tree and is not part of this tree. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPxfseAAoJECvwRC2XARrjvL4QAL39988y7ajHSI3ym3Dxovn9 w8md63xKNlTpCB8NJPRIJpcGrE7QFtNXPFCagTqO713ulwCoKayEwKGOU7VQagFc 0/JoHxE5usE5OuA6tyAJbpWK10kWKDzu6HjZfqF2yoa0q/REbsu65KsY7zc7HbpF qEAXX1xr9IC7GUM7gv75OR8CP2VJCW3+6VyhiD/37t3KpNwINMpRDO/eN/KiwoUI 1t+/DVwO6pH5UrGReWrmjs/gcxFMzkeelt+iCA32kzkWLtyWjeWBujVWnFvVtpkz R4pV2T2jvs6fWPU5MMBXZRd5AvLLqcu/g/Yr21WYHz07jCcGxlCUp9qpnGLt2el0 /YTY3LBZUQJ5sx3OSJV+oQVTtI5x0EkAiOrJ8Dx20wNAFqun9bhJb1WX0IXflmZc oC7SF5wjXq8pUQmX/wpGMbW7XYompypJGqlEsftJEytf4dfR6KJ2Vo1h3pHtpaex IaY6TqmdW44e0EgbFTM7RMNFtC7GrIY9NE+WKlrFtsHhUFrqt1NVBEcO3faU0ES6 UAguFRPM/HAdkVmY620+DUT/JkEMemWq2jgWExLGLC9gI8L1Xj2cdU8esstuMUoV GGG4u9a5W1rALwg+zPCQGoVxPKmd6fpeC3U+Rmg2639chy+h4c/cBXkzfUsxe2lg wvMDVbjDN1Fz0c29YJit =K23I -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "Not much stuff this time. The only change to the IOMMU core code is the addition of a handle to the fault handling code. A few updates to the AMD IOMMU driver to work around new errata. The other patches are mostly fixes and enhancements to the existing ARM IOMMU drivers and documentation updates. A new IOMMU driver for the Exynos platform was also underway but got merged via the Samsung tree and is not part of this tree." * tag 'iommu-updates-v3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: Documentation: kernel-parameters.txt Add amd_iommu_dump iommu/core: pass a user-provided token to fault handlers iommu/tegra: gart: Fix register offset correctly iommu: OMAP: device detach on domain destroy iommu: tegra/gart: Add device tree support iommu: tegra/gart: use correct gart_device iommu/tegra: smmu: Print device name correctly iommu/amd: Add workaround for event log erratum iommu/amd: Check for the right TLP prefix bit dma-debug: release free_entries_lock before saving stack trace	2012-05-30 08:49:28 -07:00
Mark Brown	14674e7011	i2c: Split I2C_M_NOSTART support out of I2C_FUNC_PROTOCOL_MANGLING Since there are uses for I2C_M_NOSTART which are much more sensible and standard than most of the protocol mangling functionality (the main one being gather writes to devices where something like a register address needs to be inserted before a block of data) create a new I2C_FUNC_NOSTART for this feature and update all the users to use it. Also strengthen the disrecommendation of the protocol mangling while we're at it. In the case of regmap-i2c we remove the requirement for mangling as I2C_M_NOSTART is the only mangling feature which is being used. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2012-05-30 10:55:34 +02:00
Hans de Goede	e907df3272	watchdog: Add support for dynamically allocated watchdog_device structs If a driver's watchdog_device struct is part of a dynamically allocated struct (which it often will be), merely locking the module is not enough, even with a drivers module locked, the driver can be unbound from the device, examples: 1) The root user can unbind it through sysfd 2) The i2c bus master driver being unloaded for an i2c watchdog I will gladly admit that these are corner cases, but we still need to handle them correctly. The fix for this consists of 2 parts: 1) Add ref / unref operations, so that the driver can refcount the struct holding the watchdog_device struct and delay freeing it until any open filehandles referring to it are closed 2) Most driver operations will do IO on the device and the driver should not do any IO on the device after it has been unbound. Rather then letting each driver deal with this internally, it is better to ensure at the watchdog core level that no operations (other then unref) will get called after the driver has called watchdog_unregister_device(). This actually is the bulk of this patch. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2012-05-30 07:55:31 +02:00
Hans de Goede	f4e9c82f64	watchdog: Add Locking support This patch fixes some potential multithreading issues, despite only allowing one process to open the /dev/watchdog device, we can still get called multiple times at the same time, since a program could be using thread, or could share the fd after a fork. This causes 2 potential problems: 1) watchdog_start / open do an unlocked test_n_set / test_n_clear, if these 2 race, the watchdog could be stopped while the active bit indicates it is running or visa versa. 2) Most watchdog_dev drivers probably assume that only one watchdog-op will get called at a time, this is not necessary true atm. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2012-05-30 07:55:23 +02:00
Alan Cox	d6b469d915	watchdog: create all the proper device files Create the watchdog class and it's associated devices. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2012-05-30 07:54:46 +02:00
Alan Cox	2bbeed016d	watchdog: Add a flag to indicate the watchdog doesn't reboot things Some watchdogs merely trigger external alarms and controls. In a managed environment this is very useful but we want drivers to be able to figure out which is which now multiple dogs can be loaded. Thus add an ALARMONLY feature flag. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2012-05-30 07:54:40 +02:00
Alan Cox	45f5fed30a	watchdog: Add multiple device support We keep the old /dev/watchdog interface file for the first watchdog via miscdev. This is basically a cut and paste of the relevant interface code from the rtc driver layer tweaked for watchdog. Revised to fix problems noted by Hans de Goede Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2012-05-30 07:54:25 +02:00
Viresh Kumar	257f8c4aae	watchdog: Add watchdog_active() routine Some watchdog may need to check if watchdog is ACTIVE or not, for example in their suspend/resume hooks. This patch adds this routine and changes the core drivers to use it. Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2012-05-30 07:53:46 +02:00
Andi Kleen	eea62f831b	brlocks/lglocks: turn into functions lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. Since there are at least two users it makes sense to share this code in a library. This is also easier maintainable than a macro forest. This will also make it later possible to dynamically allocate lglocks and also use them in modules (this would both still need some additional, but now straightforward, code) [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:41 -04:00
Rusty Russell	9dd6fa03ab	lglock: remove online variants of lock Optimizing the slow paths adds a lot of complexity. If you need to grab every lock often, you have other problems. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:41 -04:00
Al Viro	b0b0382bb4	->encode_fh() API change pass inode + parent's inode or NULL instead of dentry + bool saying whether we want the parent or not. NOTE: that needs ceph fix folded in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:33 -04:00
Neil Horman	e9412c3708	genetlink: Build a generic netlink family module alias Generic netlink searches for -type- formatted aliases when requesting a module to fulfill a protocol request (i.e. net-pf-16-proto-16-type-<x>, where x is a type value). However generic netlink protocols have no well defined type numbers, they have string names. Modify genl_ctrl_getfamily to request an alias in the format net-pf-16-proto-16-family-<x> instead, where x is a generic string, and add a macro that builds on the previously added MODULE_ALIAS_NET_PF_PROTO_NAME macro to allow modules to specifify those generic strings. Note, l2tp previously hacked together an net-pf-16-proto-16-type-l2tp alias using the MODULE_ALIAS macro, with these updates we can convert that to use the PROTO_NAME macro. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: James Chapman <jchapman@katalix.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-29 22:33:56 -04:00
Neil Horman	2033e9bf06	net: add MODULE_ALIAS_NET_PF_PROTO_NAME The MODULE_ALAIS_NET_PF macro set is missing a variant that allows for the appending of an arbitrary string to the net-pf-<x>-proto-<y> base. while MODULE_ALIAS_NET_PF_PROTO_NAME_TYPE allows an appending of a numerical type, we need to be able to append a generic string to support generic netlink families that have neither a fix numberical protocol nor type number Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-29 22:33:55 -04:00
Linus Torvalds	87a5af24e5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC internal API changes from Mauro Carvalho Chehab: "This changeset is the first part of a series of patches that fixes the EDAC sybsystem. On this set, it changes the Kernel EDAC API in order to properly represent the Intel i3/i5/i7, Xeon 3xxx/5xxx/7xxx, and Intel E5-xxxx memory controllers. The EDAC core used to assume that: - the DRAM chip select pin is directly accessed by the memory controller - when multiple channels are used, they're all filled with the same type of memory. None of the above premises is true on Intel memory controllers since 2002, when RAMBUS and FB-DIMMs were introduced, and Advanced Memory Buffer or by some similar technologies hides the direct access to the DRAM pins. So, the existing drivers for those chipsets had to lie to the EDAC core, in general telling that just one channel is filled. That produces some hard to understand error messages like: EDAC MC0: CE row 3, channel 0, label "DIMM1": 1 Unknown error(s): memory read error on FATAL area : cpu=0 Err=0008:00c2 (ch=2), addr = 0xad1f73480 => socket=0, Channel=0(mask=2), rank=1 The location information there (row3 channel 0) is completely bogus: it has no physical meaning, and are just some random values that the driver uses to talk with the EDAC core. The error actually happened at CPU socket 0, channel 0, slot 1, but this is not reported anywhere, as the EDAC core doesn't know anything about the memory layout. So, only advanced users that know how the EDAC driver works and that tests their systems to see how DIMMs are mapped can actually benefit for such error logs. This patch series fixes the error report logic, in order to allow the EDAC to expose the memory architecture used by them to the EDAC core. So, as the EDAC core now understands how the memory is organized, it can provide an useful report: EDAC MC0: CE memory read error on DIMM1 (channel:0 slot:1 page:0x364b1b offset:0x600 grain:32 syndrome:0x0 - count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:4) The location of the DIMM where the error happened is reported by "MC0" (cpu socket #0), at "channel:0 slot:1" location, and matches the physical location of the DIMM. There are two remaining issues not covered by this patch series: - The EDAC sysfs API will still report bogus values. So, userspace tools like edac-utils will still use the bogus data; - Add a new tracepoint-based way to get the binary information about the errors. Those are on a second series of patches (also at -next), but will probably miss the train for 3.5, due to the slow review process." Fix up trivial conflict (due to spelling correction of removed code) in drivers/edac/edac_device.c * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (42 commits) i7core: fix ranks information at the per-channel struct i5000: Fix the fatal error handling i5100_edac: Fix a warning when compiled with 32 bits i82975x_edac: Test nr_pages earlier to save a few CPU cycles e752x_edac: provide more info about how DIMMS/ranks are mapped i5000_edac: Fix the logic that retrieves memory information i5400_edac: improve debug messages to better represent the filled memory edac: Cleanup the logs for i7core and sb edac drivers edac: Initialize the dimm label with the known information edac: Remove the legacy EDAC ABI x38_edac: convert driver to use the new edac ABI tile_edac: convert driver to use the new edac ABI sb_edac: convert driver to use the new edac ABI r82600_edac: convert driver to use the new edac ABI ppc4xx_edac: convert driver to use the new edac ABI pasemi_edac: convert driver to use the new edac ABI mv64x60_edac: convert driver to use the new edac ABI mpc85xx_edac: convert driver to use the new edac ABI i82975x_edac: convert driver to use the new edac ABI i82875p_edac: convert driver to use the new edac ABI ...	2012-05-29 18:32:37 -07:00
Linus Torvalds	7e5b2db77b	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS updates from Ralf Baechle: "The whole series has been sitting in -next for quite a while with no complaints. The last change to the series was before the weekend the removal of an SPI patch which Grant - even though previously acked by himself - appeared to raise objections. So I removed it until the situation is clarified. Other than that all the patches have the acks from their respective maintainers, all MIPS and x86 defconfigs are building fine and I'm not aware of any problems introduced by this series. Among the key features for this patch series is a sizable patchset for Lantiq which among other things introduces support for Lantiq's flagship product, the FALCON SOC. It also means that the opensource developers behind this patchset have overtaken Lantiq's competing inhouse development team that was working behind closed doors. Less noteworthy the ath79 patchset which adds support for a few more chip variants, cleanups and fixes. Finally the usual dose of tweaking of generic code." Fix up trivial conflicts in arch/mips/lantiq/xway/gpio_{ebu,stp}.c where printk spelling fixes clashed with file move and eventual removal of the printk. * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (81 commits) MIPS: lantiq: remove orphaned code MIPS: Remove all -Wall and almost all -Werror usage from arch/mips. MIPS: lantiq: implement support for FALCON soc MTD: MIPS: lantiq: verify that the NOR interface is available on falcon soc MTD: MIPS: lantiq: implement OF support watchdog: MIPS: lantiq: implement OF support and minor fixes SERIAL: MIPS: lantiq: implement OF support GPIO: MIPS: lantiq: convert gpio-stp-xway to OF GPIO: MIPS: lantiq: convert gpio-mm-lantiq to OF and of_mm_gpio GPIO: MIPS: lantiq: move gpio-stp and gpio-ebu to the subsystem folder MIPS: pci: convert lantiq driver to OF MIPS: lantiq: convert dma to platform driver MIPS: lantiq: implement support for clkdev api MIPS: lantiq: drop ltq_gpio_request() and gpio_to_irq() OF: MIPS: lantiq: implement irq_domain support OF: MIPS: lantiq: implement OF support MIPS: lantiq: drop mips_machine support OF: PCI: const usage needed by MIPS MIPS: Cavium: Remove smp_reserve_lock. MIPS: Move cache setup to setup_arch(). ...	2012-05-29 18:27:19 -07:00
Wolfram Sang	eb86c3064b	rtc: ds1307: add trickle charger support Some DS13XX devices have "trickle chargers". Its configuration register is at different locations, the setup is the same, though. Since the configuration is board specific, introduce a platform_data to this driver. Tested with a DS1339 on a custom board. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Cc: Alessandro Zummo <alessandro.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:33 -07:00
Alexander Stein	e311c92959	rtc: add ioctl to get/clear battery low voltage status Currently there is no generic way to get the RTC battery status within an application. So add an ioctl to read the status bit. The idea is that the bit is set once a low voltage is detected. It stays there until it is reset using the RTC_VL_CLR ioctl. Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:33 -07:00
Stephen Boyd	4796dd200d	vsprintf: fix %ps on non symbols when using kallsyms Using %ps in a printk format will sometimes fail silently and print the empty string if the address passed in does not match a symbol that kallsyms knows about. But using %pS will fall back to printing the full address if kallsyms can't find the symbol. Make %ps act the same as %pS by falling back to printing the address. While we're here also make %ps print the module that a symbol comes from so that it matches what %pS already does. Take this simple function for example (in a module): static void test_printk(void) { int test; pr_info("with pS: %pS\n", &test); pr_info("with ps: %ps\n", &test); } Before this patch: with pS: 0xdff7df44 with ps: After this patch: with pS: 0xdff7df44 with ps: 0xdff7df44 Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:32 -07:00
Kim, Milo	8035a50224	include/linux/led-lm3530.h: comment correction about the range of brightness max brightness is 127, so the range of brt_val should be from 0 to 127 Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Cc: Shreshtha Kumar SAHU <shreshthakumar.sahu@stericsson.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Bryan Wu <bryan.wu@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:32 -07:00
Shuah Khan	b00961824a	leds: add new field to led_classdev struct to save activation state Add a new field to led_classdev to save activattion state after activate routine is successful. This saved state is used in deactivate routine to do cleanup such as removing device files, and free memory allocated during activation. Currently trigger_data not being null is used for this purpose. Existing triggers will need changes to use this new field. Signed-off-by: Shuah Khan <shuahkhan@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Bryan Wu <bryan.wu@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:31 -07:00
H Hartley Sweeten	1615d210db	drivers/video/backlight/apple_bl.c: include header for exported symbol prototypes Include the header to pickup the exported symbol prototype. Quiets the sparse warning: warning: symbol 'apple_bl_register' was not declared. Should it be static? warning: symbol 'apple_bl_unregister' was not declared. Should it be static? [akpm@linux-foundation.org: fix resulting build error] Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:29 -07:00
Inki Dae	d54ad83f3d	lcd: add callbacks for early fb event blank support This patchset adds early fb blank feature that a callback of lcd panel driver is called prior to specific fb driver's one. In the case of MIPI-DSI based video mode LCD Panel, for lcd power off, the power off commands should be transferred to lcd panel with display and mipi-dsi controller enabled because the commands is set to lcd panel at vsync porch period. and in opposite case, the callback of fb driver should be called prior to lcd panel driver's one because of same issue. Also if fb_blank mode is changed to FB_BLANK_POWERDOWN then display controller would be off(clock disable) but lcd panel would be still on. at this time, you could see some issue like sparkling on lcd panel because video clock to be delivered to ldi module of lcd panel was disabled. this issue could occurs for all lcd panels. The callback order is as the following: at fb_blank function of fbmem.c -> fb_notifier_call_chain(FB_EARLY_EVENT_BLANK) -> lcd panel driver's early_set_power() -> info->fbops->fb_blank() -> spcefic fb driver's fb_blank() -> fb_notifier_call_chain(FB_EVENT_BLANK) -> lcd panel driver's set_power() -> fb_notifier_call_chain(FB_R_EARLY_EVENT_BLANK) if info->fops->fb_blank() was failed. fb_notifier_call_chain(FB_R_EARLY_EVENT_BLANK) would be called to revert the effects of previous FB_EARLY_EVENT_BLANK call. and note that if early_set_power() of lcd_ops is NULL then early fb blank callback would be ignored. This patch: Add early_set_power and r_early_set_power callbacks. early_set_power callback is called prior to fb_blank() of fbmem.c and r_early_set_power callback is called if fb_blank() was failed to revert the effects of the early_set_power call of lcd panel driver. Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:29 -07:00
Inki Dae	bf05929f41	fbdev: add events for early fb event support Add FB_EARLY_EVENT_BLANK and FB_R_EARLY_EVENT_BLANK event mode supports. first, fb_notifier_call_chain() is called with FB_EARLY_EVENT_BLANK and fb_blank() of specific fb driver is called and then fb_notifier_call_chain() is called with FB_EVENT_BLANK again at fb_blank(). and if fb_blank() was failed then fb_nitifier_call_chain() would be called with FB_R_EARLY_EVENT_BLANK to revert the previous effects. Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:28 -07:00
Glauber Costa	3f13461939	memcg: decrement static keys at real destroy time We call the destroy function when a cgroup starts to be removed, such as by a rmdir event. However, because of our reference counters, some objects are still inflight. Right now, we are decrementing the static_keys at destroy() time, meaning that if we get rid of the last static_key reference, some objects will still have charges, but the code to properly uncharge them won't be run. This becomes a problem specially if it is ever enabled again, because now new charges will be added to the staled charges making keeping it pretty much impossible. We just need to be careful with the static branch activation: since there is no particular preferred order of their activation, we need to make sure that we only start using it after all call sites are active. This is achieved by having a per-memcg flag that is only updated after static_key_slow_inc() returns. At this time, we are sure all sites are active. This is made per-memcg, not global, for a reason: it also has the effect of making socket accounting more consistent. The first memcg to be limited will trigger static_key() activation, therefore, accounting. But all the others will then be accounted no matter what. After this patch, only limited memcgs will have its sockets accounted. [akpm@linux-foundation.org: move enum sock_flag_bits into sock.h, document enum sock_flag_bits, convert memcg_proto_active() and memcg_proto_activated() to test_bit(), redo tcp_update_limit() comment to 80 cols] Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:28 -07:00
Hugh Dickins	fa9add641b	mm/memcg: apply add/del_page to lruvec Take lruvec further: pass it instead of zone to add_page_to_lru_list() and del_page_from_lru_list(); and pagevec_lru_move_fn() pass lruvec down to its target functions. This cleanup eliminates a swathe of cruft in memcontrol.c, including mem_cgroup_lru_add_list(), mem_cgroup_lru_del_list() and mem_cgroup_lru_move_lists() - which never actually touched the lists. In their place, mem_cgroup_page_lruvec() to decide the lruvec, previously a side-effect of add, and mem_cgroup_update_lru_size() to maintain the lru_size stats. Whilst these are simplifications in their own right, the goal is to bring the evaluation of lruvec next to the spin_locking of the lrus, in preparation for a future patch. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:28 -07:00
Hugh Dickins	4d7dcca213	mm/memcg: get_lru_size not get_lruvec_size Konstantin just introduced mem_cgroup_get_lruvec_size() and get_lruvec_size(), I'm about to add mem_cgroup_update_lru_size(): but we're dealing with the same thing, lru_size[lru]. We ought to agree on the naming, and I do think lru_size is the more correct: so rename his ones to get_lru_size(). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:28 -07:00
Glauber Costa	04eac7ffde	rescounter: remove __must_check from res_counter_charge_nofail() Since we will succeed with the allocation no matter what, there isn't a need to use __must_check with it. It can very well be optional. Signed-off-by: Glauber Costa <glommer@parallels.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ying Han <yinghan@google.com> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:27 -07:00
Frederic Weisbecker	2bb2ba9d51	rescounters: add res_counter_uncharge_until() When killing a res_counter which is a child of other counter, we need to do res_counter_uncharge(child, xxx) res_counter_charge(parent, xxx) This is not atomic and wastes CPU. This patch adds res_counter_uncharge_until(). This function's uncharge propagates to ancestors until specified res_counter. res_counter_uncharge_until(child, parent, xxx) Now the operation is atomic and efficient. Signed-off-by: Frederic Weisbecker <fweisbec@redhat.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ying Han <yinghan@google.com> Cc: Glauber Costa <glommer@parallels.com> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:27 -07:00
Konstantin Khlebnikov	c56d5c7dfe	mm/vmscan: push lruvec pointer into inactive_list_is_low() Switch mem_cgroup_inactive_anon_is_low() to lruvec pointers, mem_cgroup_get_lruvec_size() is more effective than mem_cgroup_zone_nr_lru_pages() Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:26 -07:00
Konstantin Khlebnikov	074291fea8	mm/vmscan: replace zone_nr_lru_pages() with get_lruvec_size() If memory cgroup is enabled we always use lruvecs which are embedded into struct mem_cgroup_per_zone, so we can reach lru_size counters via container_of(). Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:26 -07:00
Konstantin Khlebnikov	7f5e86c2cc	mm: add link from struct lruvec to struct zone This is the first stage of struct mem_cgroup_zone removal. Further patches replace struct mem_cgroup_zone with a pointer to struct lruvec. If CONFIG_CGROUP_MEM_RES_CTLR=n lruvec_zone() is just container_of(). Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:26 -07:00
Konstantin Khlebnikov	bbf808ed7d	mm/memcg: kill mem_cgroup_lru_del() This patch kills mem_cgroup_lru_del(), we can use mem_cgroup_lru_del_list() instead. On 0-order isolation we already have right lru list id. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:25 -07:00
Konstantin Khlebnikov	f3fd4a6192	mm: remove lru type checks from __isolate_lru_page() After patch "mm: forbid lumpy-reclaim in shrink_active_list()" we can completely remove anon/file and active/inactive lru type filters from __isolate_lru_page(), because isolation for 0-order reclaim always isolates pages from right lru list. And pages-isolation for lumpy shrink_inactive_list() or memory-compaction anyway allowed to isolate pages from all evictable lru lists. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:25 -07:00
Konstantin Khlebnikov	014483bccc	mm: mark mm-inline functions as __always_inline GCC sometimes ignores "inline" directives even for small and simple functions. This supposed to be fixed in gcc 4.7, but it was released only yesterday. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:25 -07:00
Hugh Dickins	89abfab133	mm/memcg: move reclaim_stat into lruvec With mem_cgroup_disabled() now explicit, it becomes clear that the zone_reclaim_stat structure actually belongs in lruvec, per-zone when memcg is disabled but per-memcg per-zone when it's enabled. We can delete mem_cgroup_get_reclaim_stat(), and change update_page_reclaim_stat() to update just the one set of stats, the one which get_scan_count() will actually use. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:25 -07:00
KAMEZAWA Hiroyuki	4b91355e9d	memcg: fix/change behavior of shared anon at moving task This patch changes memcg's behavior at task_move(). At task_move(), the kernel scans a task's page table and move the changes for mapped pages from source cgroup to target cgroup. There has been a bug at handling shared anonymous pages for a long time. Before patch: - The spec says 'shared anonymous pages are not moved.' - The implementation was 'shared anonymoys pages may be moved'. If page_mapcount <=2, shared anonymous pages's charge were moved. After patch: - The spec says 'all anonymous pages are moved'. - The implementation is 'all anonymous pages are moved'. Considering usage of memcg, this will not affect user's experience. 'shared anonymous' pages only exists between a tree of processes which don't do exec(). Moving one of process without exec() seems not sane. For example, libcgroup will not be affected by this change. (Anyway, no one noticed the implementation for a long time...) Below is a discussion log: - current spec/implementation are complex - Now, shared file caches are moved - It adds unclear check as page_mapcount(). To do correct check, we should check swap users, etc. - No one notice this implementation behavior. So, no one get benefit from the design. - In general, once task is moved to a cgroup for running, it will not be moved.... - Finally, we have control knob as memory.move_charge_at_immigrate. Here is a patch to allow moving shared pages, completely. This makes memcg simpler and fix current broken code. Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:24 -07:00
Pravin B Shelar	5bf5f03c27	mm: fix slab->page flags corruption Transparent huge pages can change page->flags (PG_compound_lock) without taking Slab lock. Since THP can not break slab pages we can safely access compound page without taking compound lock. Specifically this patch fixes a race between compound_unlock() and slab functions which perform page-flags updates. This can occur when get_page()/put_page() is called on a page from slab. [akpm@linux-foundation.org: tweak comment text, fix comment layout, fix label indenting] Reported-by: Amey Bhide <abhide@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:24 -07:00
Andrea Arcangeli	26c191788f	mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition When holding the mmap_sem for reading, pmd_offset_map_lock should only run on a pmd_t that has been read atomically from the pmdp pointer, otherwise we may read only half of it leading to this crash. PID: 11679 TASK: f06e8000 CPU: 3 COMMAND: "do_race_2_panic" #0 [f06a9dd8] crash_kexec at c049b5ec #1 [f06a9e2c] oops_end at c083d1c2 #2 [f06a9e40] no_context at c0433ded #3 [f06a9e64] bad_area_nosemaphore at c043401a #4 [f06a9e6c] __do_page_fault at c0434493 #5 [f06a9eec] do_page_fault at c083eb45 #6 [f06a9f04] error_code (via page_fault) at c083c5d5 EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP: 00000000 DS: 007b ESI: 9e201000 ES: 007b EDI: 01fb4700 GS: 00e0 CS: 0060 EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246 #7 [f06a9f38] _spin_lock at c083bc14 #8 [f06a9f44] sys_mincore at c0507b7d #9 [f06a9fb0] system_call at c083becd start len EAX: ffffffda EBX: 9e200000 ECX: 00001000 EDX: 6228537f DS: 007b ESI: 00000000 ES: 007b EDI: 003d0f00 SS: 007b ESP: 62285354 EBP: 62285388 GS: 0033 CS: 0073 EIP: 00291416 ERR: 000000da EFLAGS: 00000286 This should be a longstanding bug affecting x86 32bit PAE without THP. Only archs with 64bit large pmd_t and 32bit unsigned long should be affected. With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad() would partly hide the bug when the pmd transition from none to stable, by forcing a re-read of the pmd in pmd_offset_map_lock, but when THP is enabled a new set of problem arises by the fact could then transition freely in any of the none, pmd_trans_huge or pmd_trans_stable states. So making the barrier in pmd_none_or_trans_huge_or_clear_bad() unconditional isn't good idea and it would be a flakey solution. This should be fully fixed by introducing a pmd_read_atomic that reads the pmd in order with THP disabled, or by reading the pmd atomically with cmpxchg8b with THP enabled. Luckily this new race condition only triggers in the places that must already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix is localized there but this bug is not related to THP. NOTE: this can trigger on x86 32bit systems with PAE enabled with more than 4G of ram, otherwise the high part of the pmd will never risk to be truncated because it would be zero at all times, in turn so hiding the SMP race. This bug was discovered and fully debugged by Ulrich, quote: ---- [..] pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and eax. 496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t pmd) 497 { 498 /* depend on compiler for an atomic pmd read / 499 pmd_t pmdval = pmd; // edi = pmd pointer 0xc0507a74 <sys_mincore+548>: mov 0x8(%esp),%edi ... // edx = PTE page table high address 0xc0507a84 <sys_mincore+564>: mov 0x4(%edi),%edx ... // eax = PTE page table low address 0xc0507a8e <sys_mincore+574>: mov (%edi),%eax [..] Please note that the PMD is not read atomically. These are two "mov" instructions where the high order bits of the PMD entry are fetched first. Hence, the above machine code is prone to the following race. - The PMD entry {high\|low} is 0x0000000000000000. The "mov" at 0xc0507a84 loads 0x00000000 into edx. - A page fault (on another CPU) sneaks in between the two "mov" instructions and instantiates the PMD. - The PMD entry {high\|low} is now 0x00000003fda38067. The "mov" at 0xc0507a8e loads 0xfda38067 into eax. ---- Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:24 -07:00
David Rientjes	a7f638f999	mm, oom: normalize oom scores to oom_score_adj scale only for userspace The oom_score_adj scale ranges from -1000 to 1000 and represents the proportion of memory available to the process at allocation time. This means an oom_score_adj value of 300, for example, will bias a process as though it was using an extra 30.0% of available memory and a value of -350 will discount 35.0% of available memory from its usage. The oom killer badness heuristic also uses this scale to report the oom score for each eligible process in determining the "best" process to kill. Thus, it can only differentiate each process's memory usage by 0.1% of system RAM. On large systems, this can end up being a large amount of memory: 256MB on 256GB systems, for example. This can be fixed by having the badness heuristic to use the actual memory usage in scoring threads and then normalizing it to the oom_score_adj scale for userspace. This results in better comparison between eligible threads for kill and no change from the userspace perspective. Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:24 -07:00
Hugh Dickins	17cf28afea	mm/fs: remove truncate_range Remove vmtruncate_range(), and remove the truncate_range method from struct inode_operations: only tmpfs ever supported it, and tmpfs has now converted over to using the fallocate method of file_operations. Update Documentation accordingly, adding (setlease and) fallocate lines. And while we're in mm.h, remove duplicate declarations of shmem_lock() and shmem_file_setup(): everyone is now using the ones in shmem_fs.h. Based-on-patch-by: Cong Wang <amwang@redhat.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:23 -07:00
Hugh Dickins	bde05d1ccd	shmem: replace page if mapping excludes its zone The GMA500 GPU driver uses GEM shmem objects, but with a new twist: the backing RAM has to be below 4GB. Not a problem while the boards supported only 4GB: but now Intel's D2700MUD boards support 8GB, and their GMA3600 is managed by the GMA500 driver. shmem/tmpfs has never pretended to support hardware restrictions on the backing memory, but it might have appeared to do so before v3.1, and even now it works fine until a page is swapped out then back in. When read_cache_page_gfp() supplied a freshly allocated page for copy, that compensated for whatever choice might have been made by earlier swapin readahead; but swapoff was likely to destroy the illusion. We'd like to continue to support GMA500, so now add a new shmem_should_replace_page() check on the zone when about to move a page from swapcache to filecache (in swapin and swapoff cases), with shmem_replace_page() to allocate and substitute a suitable page (given gma500/gem.c's mapping_set_gfp_mask GFP_KERNEL \| __GFP_DMA32). This does involve a minor extension to mem_cgroup_replace_page_cache() (the page may or may not have already been charged); and I've removed a comment and call to mem_cgroup_uncharge_cache_page(), which in fact is always a no-op while PageSwapCache. Also removed optimization of an unlikely path in shmem_getpage_gfp(), now that we need to check PageSwapCache more carefully (a racing caller might already have made the copy). And at one point shmem_unuse_inode() needs to use the hitherto private page_swapcount(), to guard against racing with inode eviction. It would make sense to extend shmem_should_replace_page(), to cover cpuset and NUMA mempolicy restrictions too, but set that aside for now: needs a cleanup of shmem mempolicy handling, and more testing, and ought to handle swap faults in do_swap_page() as well as shmem. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephane Marchesin <marcheu@chromium.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rob Clark <rob.clark@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:22 -07:00
Bartlomiej Zolnierkiewicz	5ceb9ce6fe	mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE type pageblock (and some MIGRATE_MOVABLE pages are left in it) waiting until an allocation takes ownership of the block may take too long. The type of the pageblock remains unchanged so the pageblock cannot be used as a migration target during compaction. Fix it by: * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE], and COMPACT_SYNC) and then converting sync field in struct compact_control to use it. * Adding nr_pageblocks_skipped field to struct compact_control and tracking how many destination pageblocks were of MIGRATE_UNMOVABLE type. If COMPACT_ASYNC_MOVABLE mode compaction ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies that there is not a suitable page for allocation. In this case then check how if there were enough MIGRATE_UNMOVABLE pageblocks to try a second pass in COMPACT_ASYNC_UNMOVABLE mode. * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC and COMPACT_ASYNC_UNMOVABLE compaction modes) and building a count based on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages. If all pages within the MIGRATE_UNMOVABLE pageblock are in one of those three sets change the whole pageblock type to MIGRATE_MOVABLE. My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means 131072 standard 4KiB pages in 'Normal' zone) is to: - allocate 120000 pages for kernel's usage - free every second page (60000 pages) of memory just allocated - allocate and use 60000 pages from user space - free remaining 60000 pages of kernel memory (now we have fragmented memory occupied mostly by user space pages) - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage The results: - with compaction disabled I get 11 successful allocations - with compaction enabled - 14 successful allocations - with this patch I'm able to get all 100 successful allocations NOTE: If we can make kswapd aware of order-0 request during compaction, we can enhance kswapd with changing mode to COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE). Please see the following thread: http://marc.info/?l=linux-mm&m=133552069417068&w=2 [minchan@kernel.org: minor cleanups] Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:22 -07:00
Johannes Weiner	238305bb4d	mm: remove sparsemem allocation details from the bootmem allocator alloc_bootmem_section() derives allocation area constraints from the specified sparsemem section. This is a bit specific for a generic memory allocator like bootmem, though, so move it over to sparsemem. As __alloc_bootmem_node_nopanic() already retries failed allocations with relaxed area constraints, the fallback code in sparsemem.c can be removed and the code becomes a bit more compact overall. [akpm@linux-foundation.org: fix build] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:22 -07:00
Alex Shi	2099597401	mm: move is_vma_temporary_stack() declaration to huge_mm.h When transparent_hugepage_enabled() is used outside mm/, such as in arch/x86/xx/tlb.c: + if (!cpu_has_invlpg \|\| vma->vm_flags & VM_HUGETLB + \|\| transparent_hugepage_enabled(vma)) { + flush_tlb_mm(vma->vm_mm); is_vma_temporary_stack() isn't referenced in huge_mm.h, so it has compile errors: arch/x86/mm/tlb.c: In function `flush_tlb_range': arch/x86/mm/tlb.c:324:4: error: implicit declaration of function `is_vma_temporary_stack' [-Werror=implicit-function-declaration] Since is_vma_temporay_stack() is just used in rmap.c and huge_memory.c, it is better to move it to huge_mm.h from rmap.h to avoid such errors. Signed-off-by: Alex Shi <alex.shi@intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:21 -07:00
Ulrich Drepper	9295b7a07c	kbuild: install kernel-page-flags.h Programs using /proc/kpageflags need to know about the various flags. The <linux/kernel-page-flags.h> provides them and the comments in the file indicate that it is supposed to be used by user-level code. But the file is not installed. Install the headers and mark the unstable flags as out-of-bounds. The page-type tool is also adjusted to not duplicate the definitions Signed-off-by: Ulrich Drepper <drepper@gmail.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:21 -07:00
Konstantin Khlebnikov	02602a18c3	bug: completely remove code generated by disabled VM_BUG_ON() Even if CONFIG_DEBUG_VM=n gcc genereates code for some VM_BUG_ON() for example VM_BUG_ON(!PageCompound(page) \|\| !PageHead(page)); in do_huge_pmd_wp_page() generates 114 bytes of code. But they mostly disappears when I split this VM_BUG_ON into two: -VM_BUG_ON(!PageCompound(page) \|\| !PageHead(page)); +VM_BUG_ON(!PageCompound(page)); +VM_BUG_ON(!PageHead(page)); weird... but anyway after this patch code disappears completely. add/remove: 0/0 grow/shrink: 7/97 up/down: 135/-1784 (-1649) Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:20 -07:00
Konstantin Khlebnikov	baf05aa927	bug: introduce BUILD_BUG_ON_INVALID() macro Sometimes we want to check some expressions correctness at compile time. "(void)(e);" or "if (e);" can be dangerous if the expression has side-effects, and gcc sometimes generates a lot of code, even if the expression has no effect. This patch introduces macro BUILD_BUG_ON_INVALID() for such checks, it forces a compilation error if expression is invalid without any extra code. [Cast to "long" required because sizeof does not work for bit-fields.] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:20 -07:00
Johannes Weiner	c3ac9a8ade	mm: memcg: count pte references from every member of the reclaimed hierarchy The rmap walker checking page table references has historically ignored references from VMAs that were not part of the memcg that was being reclaimed during memcg hard limit reclaim. When transitioning global reclaim to memcg hierarchy reclaim, I missed that bit and now references from outside a memcg are ignored even during global reclaim. Reverting back to traditional behaviour - count all references during global reclaim and only mind references of the memcg being reclaimed during limit reclaim would be one option. However, the more generic idea is to ignore references exactly then when they are outside the hierarchy that is currently under reclaim; because only then will their reclamation be of any use to help the pressure situation. It makes no sense to ignore references from a sibling memcg and then evict a page that will be immediately refaulted by that sibling which contributes to the same usage of the common ancestor under reclaim. The solution: make the rmap walker ignore references from VMAs that are not part of the hierarchy that is being reclaimed. Flat limit reclaim will stay the same, hierarchical limit reclaim will mind the references only to pages that the hierarchy owns. Global reclaim, since it reclaims from all memcgs, will be fixed to regard all references. [akpm@linux-foundation.org: name the args in the declaration] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Konstantin Khlebnikov<khlebnikov@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:20 -07:00
Andrew Morton	0ce72d4f73	mm: do_migrate_pages(): rename arguments s/from_nodes/from and s/to_nodes/to/. The "_nodes" is redundant - it duplicates the argument's type. Done in a fit of irritation over 80-col issues :( Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <mkosaki@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:20 -07:00
Mel Gorman	23b9da55c5	mm: vmscan: remove reclaim_mode_t There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t as well and improves the documentation about what reclaim/compaction is and when it is triggered. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Hugh Dickins <hughd@google.com> Cc: Ying Han <yinghan@google.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:19 -07:00
Mel Gorman	41ac1999c3	mm: vmscan: do not stall on writeback during memory compaction This patch stops reclaim/compaction entering sync reclaim as this was only intended for lumpy reclaim and an oversight. Page migration has its own logic for stalling on writeback pages if necessary and memory compaction is already using it. Waiting on page writeback is bad for a number of reasons but the primary one is that waiting on writeback to a slow device like USB can take a considerable length of time. Page reclaim instead uses wait_iff_congested() to throttle if too many dirty pages are being scanned. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Hugh Dickins <hughd@google.com> Cc: Ying Han <yinghan@google.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:19 -07:00
Mel Gorman	c53919adc0	mm: vmscan: remove lumpy reclaim This series removes lumpy reclaim and some stalling logic that was unintentionally being used by memory compaction. The end result is that stalling on dirty pages during page reclaim now depends on wait_iff_congested(). Four kernels were compared 3.3.0 vanilla 3.4.0-rc2 vanilla 3.4.0-rc2 lumpyremove-v2 is patch one from this series 3.4.0-rc2 nosync-v2r3 is the full series Removing lumpy reclaim saves almost 900 bytes of text whereas the full series removes 1200 bytes. text data bss dec hex filename `6740375` 1927944 2260992 10929311 a6c49f vmlinux-3.4.0-rc2-vanilla 6739479 1927944 2260992 10928415 a6c11f vmlinux-3.4.0-rc2-lumpyremove-v2 6739159 1927944 2260992 10928095 a6bfdf vmlinux-3.4.0-rc2-nosync-v2 There are behaviour changes in the series and so tests were run with monitoring of ftrace events. This disrupts results so the performance results are distorted but the new behaviour should be clearer. fs-mark running in a threaded configuration showed little of interest as it did not push reclaim aggressively FS-Mark Multi Threaded 3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3 Files/s min 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) Files/s mean 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) Files/s stddev 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Files/s max 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) Overhead min 508667.00 ( 0.00%) 521350.00 (-2.49%) 544292.00 (-7.00%) 547168.00 (-7.57%) Overhead mean 551185.00 ( 0.00%) 652690.73 (-18.42%) 991208.40 (-79.83%) 570130.53 (-3.44%) Overhead stddev 18200.69 ( 0.00%) 331958.29 (-1723.88%) 1579579.43 (-8578.68%) 9576.81 (47.38%) Overhead max 576775.00 ( 0.00%) 1846634.00 (-220.17%) 6901055.00 (-1096.49%) 585675.00 (-1.54%) MMTests Statistics: duration Sys Time Running Test (seconds) 309.90 300.95 307.33 298.95 User+Sys Time Running Test (seconds) 319.32 309.67 315.69 307.51 Total Elapsed Time (seconds) 1187.85 1193.09 1191.98 1193.73 MMTests Statistics: vmstat Page Ins 80532 82212 81420 79480 Page Outs 111434984 111456240 111437376 111582628 Swap Ins 0 0 0 0 Swap Outs 0 0 0 0 Direct pages scanned 44881 27889 27453 34843 Kswapd pages scanned 25841428 25860774 25861233 25843212 Kswapd pages reclaimed 25841393 25860741 25861199 25843179 Direct pages reclaimed 44881 27889 27453 34843 Kswapd efficiency 99% 99% 99% 99% Kswapd velocity 21754.791 21675.460 21696.029 21649.127 Direct efficiency 100% 100% 100% 100% Direct velocity 37.783 23.375 23.031 29.188 Percentage direct scans 0% 0% 0% 0% ftrace showed that there was no stalling on writeback or pages submitted for IO from reclaim context. postmark was similar and while it was more interesting, it also did not push reclaim heavily. POSTMARK 3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3 Transactions per second: 16.00 ( 0.00%) 20.00 (25.00%) 18.00 (12.50%) 17.00 ( 6.25%) Data megabytes read per second: 18.80 ( 0.00%) 24.27 (29.10%) 22.26 (18.40%) 20.54 ( 9.26%) Data megabytes written per second: 35.83 ( 0.00%) 46.25 (29.08%) 42.42 (18.39%) 39.14 ( 9.24%) Files created alone per second: 28.00 ( 0.00%) 38.00 (35.71%) 34.00 (21.43%) 30.00 ( 7.14%) Files create/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%) Files deleted alone per second: 556.00 ( 0.00%) 1224.00 (120.14%) 3062.00 (450.72%) 6124.00 (1001.44%) Files delete/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%) MMTests Statistics: duration Sys Time Running Test (seconds) 113.34 107.99 109.73 108.72 User+Sys Time Running Test (seconds) 145.51 139.81 143.32 143.55 Total Elapsed Time (seconds) 1159.16 899.23 980.17 1062.27 MMTests Statistics: vmstat Page Ins 13710192 13729032 13727944 13760136 Page Outs 43071140 42987228 42733684 42931624 Swap Ins 0 0 0 0 Swap Outs 0 0 0 0 Direct pages scanned 0 0 0 0 Kswapd pages scanned `9941613` `9937443` 9939085 `9929154` Kswapd pages reclaimed 9940926 9936751 9938397 9928465 Direct pages reclaimed 0 0 0 0 Kswapd efficiency 99% 99% 99% 99% Kswapd velocity 8576.567 11051.058 10140.164 9347.109 Direct efficiency 100% 100% 100% 100% Direct velocity 0.000 0.000 0.000 0.000 It looks like here that the full series regresses performance but as ftrace showed no usage of wait_iff_congested() or sync reclaim I am assuming it's a disruption due to monitoring. Other data such as memory usage, page IO, swap IO all looked similar. Running a benchmark with a plain DD showed nothing very interesting. The full series stalled in wait_iff_congested() slightly less but stall times on vanilla kernels were marginal. Running a benchmark that hammered on file-backed mappings showed stalls due to congestion but not in sync writebacks MICRO 3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3 MMTests Statistics: duration Sys Time Running Test (seconds) 308.13 294.50 298.75 299.53 User+Sys Time Running Test (seconds) 330.45 316.28 318.93 320.79 Total Elapsed Time (seconds) 1814.90 1833.88 1821.14 1832.91 MMTests Statistics: vmstat Page Ins 108712 120708 97224 110344 Page Outs 155514576 156017404 155813676 156193256 Swap Ins 0 0 0 0 Swap Outs 0 0 0 0 Direct pages scanned 2599253 1550480 2512822 2414760 Kswapd pages scanned 69742364 71150694 68839041 69692533 Kswapd pages reclaimed 34824488 34773341 34796602 34799396 Direct pages reclaimed 53693 94750 61792 75205 Kswapd efficiency 49% 48% 50% 49% Kswapd velocity 38427.662 38797.901 37799.972 38022.889 Direct efficiency 2% 6% 2% 3% Direct velocity 1432.174 845.464 1379.807 1317.446 Percentage direct scans 3% 2% 3% 3% Page writes by reclaim 0 0 0 0 Page writes file 0 0 0 0 Page writes anon 0 0 0 0 Page reclaim immediate 0 0 0 1218 Page rescued immediate 0 0 0 0 Slabs scanned 15360 16384 13312 16384 Direct inode steals 0 0 0 0 Kswapd inode steals 4340 4327 1630 4323 FTrace Reclaim Statistics: congestion_wait Direct number congest waited 0 0 0 0 Direct time congest waited 0ms 0ms 0ms 0ms Direct full congest waited 0 0 0 0 Direct number conditional waited 900 870 754 789 Direct time conditional waited 0ms 0ms 0ms 20ms Direct full conditional waited 0 0 0 0 KSwapd number congest waited 2106 2308 2116 1915 KSwapd time congest waited 139924ms 157832ms 125652ms 132516ms KSwapd full congest waited 1346 1530 1202 1278 KSwapd number conditional waited 12922 16320 10943 14670 KSwapd time conditional waited 0ms 0ms 0ms 0ms KSwapd full conditional waited 0 0 0 0 Reclaim statistics are not radically changed. The stall times in kswapd are massive but it is clear that it is due to calls to congestion_wait() and that is almost certainly the call in balance_pgdat(). Otherwise stalls due to dirty pages are non-existant. I ran a benchmark that stressed high-order allocation. This is very artifical load but was used in the past to evaluate lumpy reclaim and compaction. Generally I look at allocation success rates and latency figures. STRESS-HIGHALLOC 3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3 Pass 1 81.00 ( 0.00%) 28.00 (-53.00%) 24.00 (-57.00%) 28.00 (-53.00%) Pass 2 82.00 ( 0.00%) 39.00 (-43.00%) 38.00 (-44.00%) 43.00 (-39.00%) while Rested 88.00 ( 0.00%) 87.00 (-1.00%) 88.00 ( 0.00%) 88.00 ( 0.00%) MMTests Statistics: duration Sys Time Running Test (seconds) 740.93 681.42 685.14 684.87 User+Sys Time Running Test (seconds) 2922.65 3269.52 3281.35 3279.44 Total Elapsed Time (seconds) 1161.73 1152.49 1159.55 1161.44 MMTests Statistics: vmstat Page Ins 4486020 2807256 2855944 2876244 Page Outs 7261600 7973688 7975320 7986120 Swap Ins 31694 0 0 0 Swap Outs 98179 0 0 0 Direct pages scanned 53494 57731 34406 113015 Kswapd pages scanned 6271173 1287481 1278174 1219095 Kswapd pages reclaimed 2029240 1281025 1260708 1201583 Direct pages reclaimed 1468 14564 16649 92456 Kswapd efficiency 32% 99% 98% 98% Kswapd velocity 5398.133 1117.130 1102.302 1049.641 Direct efficiency 2% 25% 48% 81% Direct velocity 46.047 50.092 29.672 97.306 Percentage direct scans 0% 4% 2% 8% Page writes by reclaim 1616049 0 0 0 Page writes file 1517870 0 0 0 Page writes anon 98179 0 0 0 Page reclaim immediate 103778 27339 9796 17831 Page rescued immediate 0 0 0 0 Slabs scanned 1096704 986112 980992 998400 Direct inode steals 223 215040 216736 247881 Kswapd inode steals 175331 61548 68444 63066 Kswapd skipped wait 21991 0 1 0 THP fault alloc 1 135 125 134 THP collapse alloc 393 311 228 236 THP splits 25 13 7 8 THP fault fallback 0 0 0 0 THP collapse fail 3 5 7 7 Compaction stalls 865 1270 1422 1518 Compaction success 370 401 353 383 Compaction failures 495 869 1069 1135 Compaction pages moved 870155 3828868 4036106 4423626 Compaction move failure 26429 23865 29742 27514 Success rates are completely hosed for 3.4-rc2 which is almost certainly due to commit `fe2c2a1066` ("vmscan: reclaim at order 0 when compaction is enabled"). I expected this would happen for kswapd and impair allocation success rates (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much a difference: 80% less scanning, 37% less reclaim by kswapd In comparison, reclaim/compaction is not aggressive and gives up easily which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be much more aggressive about reclaim/compaction than THP allocations are. The stress test above is allocating like neither THP or hugetlbfs but is much closer to THP. Mainline is now impaired in terms of high order allocation under heavy load although I do not know to what degree as I did not test with __GFP_REPEAT. Keep this in mind for bugs related to hugepage pool resizing, THP allocation and high order atomic allocation failures from network devices. In terms of congestion throttling, I see the following for this test FTrace Reclaim Statistics: congestion_wait Direct number congest waited 3 0 0 0 Direct time congest waited 0ms 0ms 0ms 0ms Direct full congest waited 0 0 0 0 Direct number conditional waited 957 512 1081 1075 Direct time conditional waited 0ms 0ms 0ms 0ms Direct full conditional waited 0 0 0 0 KSwapd number congest waited 36 4 3 5 KSwapd time congest waited 3148ms 400ms 300ms 500ms KSwapd full congest waited 30 4 3 5 KSwapd number conditional waited 88514 197 332 542 KSwapd time conditional waited 4980ms 0ms 0ms 0ms KSwapd full conditional waited 49 0 0 0 The "conditional waited" times are the most interesting as this is directly impacted by the number of dirty pages encountered during scan. As lumpy reclaim is no longer scanning contiguous ranges, it is finding fewer dirty pages. This brings wait times from about 5 seconds to 0. kswapd itself is still calling congestion_wait() so it'll still stall but it's a lot less. In terms of the type of IO we were doing, I see this FTrace Reclaim Statistics: mm_vmscan_writepage Direct writes anon sync 0 0 0 0 Direct writes anon async 0 0 0 0 Direct writes file sync 0 0 0 0 Direct writes file async 0 0 0 0 Direct writes mixed sync 0 0 0 0 Direct writes mixed async 0 0 0 0 KSwapd writes anon sync 0 0 0 0 KSwapd writes anon async 91682 0 0 0 KSwapd writes file sync 0 0 0 0 KSwapd writes file async 822629 0 0 0 KSwapd writes mixed sync 0 0 0 0 KSwapd writes mixed async 0 0 0 0 In 3.2, kswapd was doing a bunch of async writes of pages but reclaim/compaction was never reaching a point where it was doing sync IO. This does not guarantee that reclaim/compaction was not calling wait_on_page_writeback() but I would consider it unlikely. It indicates that merging patches 2 and 3 to stop reclaim/compaction calling wait_on_page_writeback() should be safe. This patch: Lumpy reclaim had a purpose but in the mind of some, it was to kick the system so hard it trashed. For others the purpose was to complicate vmscan.c. Over time it was giving softer shoes and a nicer attitude but memory compaction needs to step up and replace it so this patch sends lumpy reclaim to the farm. The tracepoint format changes for isolating LRU pages with this patch applied. Furthermore reclaim/compaction can no longer queue dirty pages in pageout() if the underlying BDI is congested. Lumpy reclaim used this logic and reclaim/compaction was using it in error. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Hugh Dickins <hughd@google.com> Cc: Ying Han <yinghan@google.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:19 -07:00
Rik van Riel	e709ffd616	mm: remove swap token code The swap token code no longer fits in with the current VM model. It does not play well with cgroups or the better NUMA placement code in development, since we have only one swap token globally. It also has the potential to mess with scalability of the system, by increasing the number of non-reclaimable pages on the active and inactive anon LRU lists. Last but not least, the swap token code has been broken for a year without complaints, as reported by Konstantin Khlebnikov. This suggests we no longer have much use for it. The days of sub-1G memory systems with heavy use of swap are over. If we ever need thrashing reducing code in the future, we will have to implement something that does scale. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hughd@google.com> Acked-by: Bob Picco <bpicco@meloft.net> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:19 -07:00
Paul Gortmaker	af2e840971	pagemap.h: fix warning about possibly used before init var Commit `f56f821feb` ("mm: extend prefault helpers to fault in more than PAGE_SIZE") added in the new functions: fault_in_multipages_writeable() and fault_in_multipages_readable(). However, we currently see: include/linux/pagemap.h:492: warning: 'ret' may be used uninitialized in this function include/linux/pagemap.h:492: note: 'ret' was declared here Unlike a lot of gcc nags, this one appears somewhat legit. i.e. passing in an invalid negative value of "size" does make it look like all the conditionals in there would be bypassed and the uninitialized value would be returned. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:18 -07:00
Felix Fietkau	617c8c1123	skb: avoid unnecessary reallocations in __skb_cow At the beginning of __skb_cow, headroom gets set to a minimum of NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not cloned and the headroom is just below NET_SKB_PAD, but still more than the amount requested by the caller. This was showing up frequently in my tests on VLAN tx, where vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN). Locally generated packets should have enough headroom, and for forward paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to add any extra space here. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-29 17:30:08 -04:00
Linus Torvalds	4b78147468	MFD changes for 3.5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPw2QKAAoJEIqAPN1PVmxKfv8P/2L5d38tc3+9wYuGI1l+k7Mz xQt2PdAx/kHQGTjLE1DSoeOD6dn4aodFbPaTcsLsU9Eo4IiJnT68b8adr/bqYHKU Cod6NSPJMaBxLBJZxXsA7nY69Z6O5SMjXxEQsiDc24gaP0jjwaeY35KJSfMug8nF DA6rvEpchkF8QXzBmkO2t2/uPYr1YWqDZQkauLDnLRG01JnGXFz5ajv9N5pYhiFt QyYtheg8yEnfwnQ6AlmRtGK75jZRVmrj0kOzRjE9UL7ZwtzswWJes+RE3tlgk89m JQ7KASRmmqLpvcVJ9fG9SlGX//yBO6OEp5Km06RTxgmt0XftBDVqBTjk1EG2tfMR SR0NIz6gJ0twKAe6U0d+5HMYalOU45H5ha9e3vCqZ8vl9JfmM95RS+TmWbGcRIqj 04Y5x3I4zq6e9D0u+211BeuRfzkQiefwWJmdPpn0oac3u5LeYbRj/aQ85fqwJWzG f99D9VU5xgfFHPAtL3SLFiwgd9yOiMBar6eeIva+okDyOW3KaEUzs8Y4dgDyvYcg IU//JGK51vLVmI5kXtGCwYkgRLF/Y7WKZ8TwypT+SY6iv6tPQVvApOZljq7RC9GI mXx2z2slA90jlg3TlEFZfxr1WqbZ3TCbonU1riLeMEtkiXUpLtmKC8gbhOqfGvvn Nzgt+YqRJXafZdELb/S+ =Rh0r -----END PGP SIGNATURE----- Merge tag 'mfd-3.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 Pull MFD changes from Samuel Ortiz: "Besides the usual cleanups, this one brings: * Support for 5 new chipsets: Intel's ICH LPC and SCH Centerton, ST-E's STAX211, Samsung's MAX77693 and TI's LM3533. * Device tree support for the twl6040, tps65910, da9502 and ab8500 drivers. * Fairly big tps56910, ab8500 and db8500 updates. * i2c support for mc13xxx. * Our regular update for the wm8xxx driver from Mark." Fix up various conflicts with other trees, largely due to ab5500 removal etc. * tag 'mfd-3.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (106 commits) mfd: Fix build break of max77693 by adding REGMAP_I2C option mfd: Fix twl6040 build failure mfd: Fix max77693 build failure mfd: ab8500-core should depend on MFD_DB8500_PRCMU gpio: tps65910: dt: process gpio specific device node info mfd: Remove the parsing of dt info for tps65910 gpio mfd: Save device node parsed platform data for tps65910 sub devices mfd: Add r_select to lm3533 platform data gpio: Add Intel Centerton support to gpio-sch mfd: Emulate active low IRQs as well as active high IRQs for wm831x mfd: Mark two lm3533 zone registers as volatile mfd: Fix return type of lm533 attribute is_visible mfd: Enable Device Tree support in the ab8500-pwm driver mfd: Enable Device Tree support in the ab8500-sysctrl driver mfd: Add support for Device Tree to twl6040 mfd: Register the twl6040 child for the ASoC codec unconditionally mfd: Allocate twl6040 IRQ numbers dynamically mfd: twl6040 code cleanup in interrupt initialization part mfd: Enable ab8500-gpadc driver for Device Tree mfd: Prevent unassigned pointer from being used in ab8500-gpadc driver ...	2012-05-29 11:53:11 -07:00
Linus Torvalds	53f2c4a8fd	NFS client updates for Linux 3.5 New features include: - Rewrite the O_DIRECT code so that it can share the same coalescing and pNFS functionality as the page cache code. - Allow the server to provide hints as to when we should use pNFS, and when it is more efficient to read and write through the metadata server. - NFS cache consistency updates: - Use the ctime to emulate a change attribute for NFSv2/v3 so that all NFS versions can share the same cache management code. - New cache management code will only look at the change attribute and size attribute when deciding whether or not our cached data is still valid or not. - Don't request NFSv4 post-op attributes on writes in cases such as O_DIRECT, where we don't care about data cache consistency, or when we have a write delegation, and know that our cache is still consistent. - Don't request NFSv4 post-op attributes on operations such as COMMIT, where there are no expected metadata updates. - Don't request NFSv4 directory post-op attributes in cases where the operations themselves already return change attribute updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and RENAME. - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS if we detect no attempts to lookup filenames. - Improve the code sharing between NFSv2/v3 and v4 mounts - NFSv4.1 state management efficiency improvements - More patches in preparation for NFSv4/v4.1 migration functionality. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPw/MNAAoJEGcL54qWCgDyxU8P/2kKqhAlhoLEArBqo9FT3/OK YrNs5uO/erTgnCG8L0XQvTKjHB9F7TAeFXqTmBZuPlb1afRpHHt2vzPqzIvUCeOC ZXm8vzZf4nxWZgEFoTDdUBvqQi9lLdIzCRhSaVCKcRnNwiuaKDd/iwykbWGcHqmv jtR4lzXPllJdKCUL3yb3juVrpq6Vvn254ID2pqdnYcEtIJIHgaRZpwdp4Iz9+8b5 Moishiw2rgCBJIhf+VCYd8B2oYfMgSDPxG1o3etkwY46qo+4s+CIls9Vu/6YzGXK 3+NdLatRDqKhQpLm0/R+dI3rntnTZ8x6LgWnTGxUsiqb6pAaHZPK284rf2eh/s7M Q4G4203r0uw539kIt6eKOGqC9c8kZAPCHlQSPCaImZyCJsz+6OMShNlGB5bZpFPr tbdxaxudrhCF7UVKXicJCWgv2nIHtek6fNwey1jqFoYgZP5ipiBKymvXQC5WAMBw 7RHJor/JEC+UJkVg/7Mkpg0UNw3E36CTYLeRJKlNCS6YO9NJQseCDxhhMNAy/ab7 RGO8DVMkUsOUH20S+a19LyeFQtveWFIE0DiDqRn0KnNGhGwHrv2t4xFukjlrf4Sw 8FQUBRdtFxfmspfA1IdoTY49XZQda5eagvTy1MyaWEh+jPSJ4G5j3sSjFiaKAJqw 79iQKFGkxPOSHx2yCdAF =suVW -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "New features include: - Rewrite the O_DIRECT code so that it can share the same coalescing and pNFS functionality as the page cache code. - Allow the server to provide hints as to when we should use pNFS, and when it is more efficient to read and write through the metadata server. - NFS cache consistency updates: * Use the ctime to emulate a change attribute for NFSv2/v3 so that all NFS versions can share the same cache management code. * New cache management code will only look at the change attribute and size attribute when deciding whether or not our cached data is still valid or not. * Don't request NFSv4 post-op attributes on writes in cases such as O_DIRECT, where we don't care about data cache consistency, or when we have a write delegation, and know that our cache is still consistent. * Don't request NFSv4 post-op attributes on operations such as COMMIT, where there are no expected metadata updates. * Don't request NFSv4 directory post-op attributes in cases where the operations themselves already return change attribute updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and RENAME. - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS if we detect no attempts to lookup filenames. - Improve the code sharing between NFSv2/v3 and v4 mounts - NFSv4.1 state management efficiency improvements - More patches in preparation for NFSv4/v4.1 migration functionality." Fix trivial conflict in fs/nfs/nfs4proc.c that was due to the dcache qstr name initialization changes (that made the length/hash a 64-bit union) * tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (146 commits) NFSv4: Add debugging printks to state manager NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager NFSv4.1: Handle errors in nfs4_bind_conn_to_session NFSv4.1: nfs4_bind_conn_to_session should drain the session NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid NFSv4.1: Add DESTROY_CLIENTID NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session NFSv4.1: Ensure we use the correct credentials for session create/destroy NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM NFSv4: Clean up the error handling for nfs4_reclaim_lease NFSv4.1: Exchange ID must use GFP_NOFS allocation mode nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN* nfs4.1: add BIND_CONN_TO_SESSION operation NFSv4.1 test the mdsthreshold hint parameters ...	2012-05-29 10:43:51 -07:00
Avi Kivity	56457f38f2	KVM: Export asm-generic/kvm_para.h Prevents build failures on non-KVM archs. Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-05-29 12:31:01 +03:00
Mauro Carvalho Chehab	5926ff502f	edac: Initialize the dimm label with the known information While userspace doesn't fill the dimm labels, add there the dimm location, as described by the used memory model. This could eventually match what is described at the dmidecode, making easier for people to identify the memory. For example, on an Intel motherboard where the DMI table is reliable, the first memory stick is described as: Memory Device Array Handle: 0x0029 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 2048 MB Form Factor: DIMM Set: 1 Locator: A1_DIMM0 Bank Locator: A1_Node0_Channel0_Dimm0 Type: <OUT OF SPEC> Type Detail: Synchronous Speed: 800 MHz Manufacturer: A1_Manufacturer0 Serial Number: A1_SerNum0 Asset Tag: A1_AssetTagNum0 Part Number: A1_PartNum0 The memory named as "A1_DIMM0" is physically located at the first memory controller (node 0), at channel 0, dimm slot 0. After this patch, the memory label will be filled with: /sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0 And (after the new EDAC API patches) as: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0 So, even if the memory label is not initialized on userspace, an useful information with the error location is filled there, expecially since several systems/motherboards are provided with enough info to map from channel/slot (or branch/channel/slot) into the DIMM label. So, letting the EDAC core fill it by default is a good thing. It should noticed that, as the label filling happens at the edac_mc_alloc(), drivers can override it to better describe the memories (and some actually do it). Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:50 -03:00
Mauro Carvalho Chehab	4275be6355	edac: Change internal representation to work with layers Change the EDAC internal representation to work with non-csrow based memory controllers. There are lots of those memory controllers nowadays, and more are coming. So, the EDAC internal representation needs to be changed, in order to work with those memory controllers, while preserving backward compatibility with the old ones. The edac core was written with the idea that memory controllers are able to directly access csrows. This is not true for FB-DIMM and RAMBUS memory controllers. Also, some recent advanced memory controllers don't present a per-csrows view. Instead, they view memories as DIMMs, instead of ranks. So, change the allocation and error report routines to allow them to work with all types of architectures. This will allow the removal of several hacks with FB-DIMM and RAMBUS memory controllers. Also, several tests were done on different platforms using different x86 drivers. TODO: a multi-rank DIMMs are currently represented by multiple DIMM entries in struct dimm_info. That means that changing a label for one rank won't change the same label for the other ranks at the same DIMM. This bug is present since the beginning of the EDAC, so it is not a big deal. However, on several drivers, it is possible to fix this issue, but it should be a per-driver fix, as the csrow => DIMM arrangement may not be equal for all. So, don't try to fix it here yet. I tried to make this patch as short as possible, preceding it with several other patches that simplified the logic here. Yet, as the internal API changes, all drivers need changes. The changes are generally bigger in the drivers for FB-DIMMs. Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:59 -03:00
Mauro Carvalho Chehab	982216a429	edac.h: Add generic layers for describing a memory location The edac core were written with the idea that memory controllers are able to directly access csrows, and that the channels are used inside a csrows select. This is not true for FB-DIMM and RAMBUS memory controllers. Also, some recent advanced memory controllers don't present a per-csrows view. Instead, they view memories as DIMMs, instead of ranks, accessed via csrow/channel. So, changes are needed in order to allow the EDAC core to work with all types of architectures. In preparation for handling non-csrows based memory controllers, add some memory structs and a macro: enum hw_event_mc_err_type: describes the type of error (corrected, uncorrected, fatal) To be used by the new edac_mc_handle_error function; enum edac_mc_layer: describes the type of a given memory architecture layer (branch, channel, slot, csrow). struct edac_mc_layer: describes the properties of a memory layer (type, size, and if the layer will be used on a virtual csrow. EDAC_DIMM_PTR() - as the number of layers can vary from 1 to 3, this macro converts from an address with up to 3 layers into a linear address. Reviewed-by: Borislav Petkov <bp@amd64.org> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:59 -03:00
Mauro Carvalho Chehab	a895bf8b1e	edac: move nr_pages to dimm struct The number of pages is a dimm property. Move it to the dimm struct. After this change, it is possible to add sysfs nodes for the DIMM's that will properly represent the DIMM stick properties, including its size. A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when the memory controller represents the memory via chip select rows. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	084a4fccef	edac: move dimm properties to struct dimm_info On systems based on chip select rows, all channels need to use memories with the same properties, otherwise the memories on channels A and B won't be recognized. However, such assumption is not true for all types of memory controllers. Controllers for FB-DIMM's don't have such requirements. Also, modern Intel controllers seem to be capable of handling such differences. So, we need to get rid of storing the DIMM information into a per-csrow data, storing it, instead at the right place. The first step is to move grain, mtype, dtype and edac_mode to the per-dimm struct. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Mike Williams <mike@mikebwilliams.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	a7d7d2e1a0	edac: Create a dimm struct and move the labels into it The way a DIMM is currently represented implies that they're linked into a per-csrow struct. However, some drivers don't see csrows, as they're ridden behind some chip like the AMB's on FBDIMM's, for example. This forced drivers to fake^Wvirtualize a csrow struct, and to create a mess under csrow/channel original's concept. Move the DIMM labels into a per-DIMM struct, and add there the real location of the socket, in terms of csrow/channel. Latter patches will modify the location to properly represent the memory architecture. All other drivers will use a per-csrow type of location. Some of those drivers will require a latter conversion, as they also fake the csrows internally. TODO: While this patch doesn't change the existing behavior, on csrows-based memory controllers, a csrow/channel pair points to a memory rank. There's a known bug at the EDAC core that allows having different labels for the same DIMM, if it has more than one rank. A latter patch is need to merge the several ranks for a DIMM into the same dimm_info struct, in order to avoid having different labels for the same DIMM. The edac_mc_alloc() will now contain a per-dimm initialization loop that will be changed by latter patches in order to match other types of memory architectures. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:57 -03:00
Linus Torvalds	90324cc1b1	avoid iput() from flusher thread -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP +AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV 4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv 5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63 BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF /9c4zxxekjHZnn2oooEa =oLXf -----END PGP SIGNATURE----- Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback tree from Wu Fengguang: "Mainly from Jan Kara to avoid iput() in the flusher threads." * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Avoid iput() from flusher thread vfs: Rename end_writeback() to clear_inode() vfs: Move waiting for inode writeback from end_writeback() to evict_inode() writeback: Refactor writeback_single_inode() writeback: Remove wb->list_lock from writeback_single_inode() writeback: Separate inode requeueing after writeback writeback: Move I_DIRTY_PAGES handling writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() writeback: Move clearing of I_SYNC into inode_sync_complete() writeback: initialize global_dirty_limit fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds mm: page-writeback.c: local functions should not be exposed globally	2012-05-28 09:54:45 -07:00
Florian Tobias Schandinat	d85d135d8b	Omapdss driver changes for 3.5 merge window. Lots of normal development commits, but perhaps most notable changes are: * HDMI rework to properly decouple the HDMI audio part from the HDMI video part. * Restructure omapdss core driver so that it's possible to implement device tree support. This included changing how platform data is passed to the drivers, changing display device registration and improving the panel driver's ability to configure the underlying video output interface. * Basic support for DSI packet interleaving -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPu2LWAAoJEPo9qoy8lh71bo0P/2iTw1WLHiRqOwwXSqOQHm2U EFzA4T36qS29h5g9yA1uHnRo2CO7UVL6kOFShk5vzpiBjwZ0e0nPPUxK919hyYEP vbrOq4dzdIx4+IYhlFusMKi1OR2JhbmOjE7gx3e1fNby7XxXY2TO2/i98lVKT0bi wcJN3cTtXcwZOjApxudIf0J4A/0YRzqGIumnkYKwZWqiW5Rv1+dfb5/Ml5fhYvsH IehLQZs8IHtCbM7qw1yDeVAnBUgsuLPCyep3W/zm1MEscboevifw50sFIRwG5GBQ cmid+Fi7u3R0/yv/UK2XBGFf7PbeZxWyM5nuZ5raajS/X0mxT1fkGcre1AxNzvgE 3gjfS9m40WKLpod1hsbXZsX1ksCiBddvT5xkgoiyhfa2G2TDGnOEHmKE4sYuq7qF Zc2YuJMahb+iWrPN966Io4PpgscMEjP732b0tg03MtwgR+liajqiuMzA56PDHaTA bwwFNS3DVIoEpgeN778PWQJ1mRprlYnK7lyJvpGlrEnDh9tM0Xi/35QDlFl1hvAp ZKD9oSkK0cIvZB690J6pRoaVv0PfjHspxFDX28FICTQROV2lJ5P9JOwGi+Bk9FwD eBPchUsivnAuhVthp3YwFod5JyN5ZVSD+9Xe9dXUwstRJo9dJMYLY+E41+N4UUS9 BS2/SKvWqc2NcmIgerO3 =I8Se -----END PGP SIGNATURE----- Merge tag 'omapdss-for-3.5' of git://github.com/tomba/linux into fbdev-next Omapdss driver changes for 3.5 merge window. Lots of normal development commits, but perhaps most notable changes are: * HDMI rework to properly decouple the HDMI audio part from the HDMI video part. * Restructure omapdss core driver so that it's possible to implement device tree support. This included changing how platform data is passed to the drivers, changing display device registration and improving the panel driver's ability to configure the underlying video output interface. * Basic support for DSI packet interleaving	2012-05-27 20:58:20 +00:00
Darrick J. Wong	e93376c20b	ext4/jbd2: add metadata checksumming to the list of supported features Activate the metadata checksumming feature by adding it to ext4 and jbd2's lists of supported features. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 08:12:42 -04:00
Darrick J. Wong	4fd5ea43bc	jbd2: checksum journal superblock Calculate and verify a checksum covering the journal superblock. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 08:08:22 -04:00
Darrick J. Wong	01b5adcebb	jbd2: Grab a reference to the crc32c driver if necessary Obtain a reference to the crc32c driver if needed for the v2 checksum. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 07:50:56 -04:00
Gao feng	0c1833797a	ipv6: fix incorrect ipsec fragment Since commit `ad0081e43a` "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed" the fragment of packets is incorrect. because tunnel mode needs IPsec headers and trailer for all fragments, while on transport mode it is sufficient to add the headers to the first fragment and the trailer to the last. so modify mtu and maxfraglen base on ipsec mode and if fragment is first or last. with my test,it work well(every fragment's size is the mtu) and does not trigger slow fragment path. Changes from v1: though optimization, mtu_prev and maxfraglen_prev can be delete. replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL. add fuction ip6_append_data_mtu to make codes clearer. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-27 01:11:22 -04:00
Linus Torvalds	1e2aec873a	Merge branch 'generic-string-functions' This makes <asm/word-at-a-time.h> actually live up to its promise of allowing architectures to help tune the string functions that do their work a word at a time. David had already taken the x86 strncpy_from_user() function, modified it to work on sparc, and then done the extra work to make it generically useful. This then expands on that work by making x86 use that generic version, completing the circle. But more importantly, it fixes up the word-at-a-time interfaces so that it's now easy to also support things like strnlen_user(), and pretty much most random string functions. David reports that it all works fine on sparc, and Jonas Bonn reported that an earlier version of this worked on OpenRISC too. It's pretty easy for architectures to add support for this and just replace their private versions with the generic code. * generic-string-functions: sparc: use the new generic strnlen_user() function x86: use the new generic strnlen_user() function lib: add generic strnlen_user() function word-at-a-time: make the interfaces truly generic x86: use generic strncpy_from_user routine	2012-05-26 16:57:16 -07:00
Linus Torvalds	ae32adc1e0	Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux Pull i2c-embedded changes from Wolfram Sang: "Major changes: - lots of devicetree additions for existing drivers. I tried hard to make sure the bindings are proper. In more complicated cases, I requested acks from people having more experience with them than me. That took a bit of extra time and also some time went into discussions with developers about what bindings are and what not. I have the feeling that the workflow with bindings should be improved to scale better. I will spend some more thought on this... - i2c-muxes are succesfully used meanwhile, so we dropped EXPERIMENTAL for them and renamed the drivers to a standard pattern to match the rest of the subsystem. They can also be used with devicetree now. - ixp2000 was removed since the whole platform goes away. - cleanups (strlcpy instead of strcpy, NULL instead of 0) - The rest is typical driver fixes I assume. All patches have been in linux-next at least since v3.4-rc6." Fixed up trivial conflict in arch/arm/mach-lpc32xx/common.c due to the same patch already having come in through the arm/soc trees, with additional patches on top of it. * 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux: (35 commits) i2c: davinci: Free requested IRQ in remove i2c: ocores: register OF i2c devices i2c: tegra: notify transfer-complete after clearing status. I2C: xiic: Add OF binding support i2c: Rename last mux driver to standard pattern i2c: tegra: fix 10bit address configuration i2c: muxes: rename first set of drivers to a standard pattern of/i2c: implement of_find_i2c_adapter_by_node i2c: implement i2c_verify_adapter i2c-s3c2410: Add HDMIPHY quirk for S3C2440 i2c-s3c2410: Rework device type handling i2c: muxes are not EXPERIMENTAL anymore i2c/of: Automatically populate i2c mux busses from device tree data. i2c: Add a struct device * parameter to i2c_add_mux_adapter() of/i2c: call i2c_verify_client from of_find_i2c_device_by_node i2c: designware: Add clk_{un}prepare() support i2c: designware: add PM support i2c: ixp2000: remove driver i2c: pnx: add device tree support i2c: imx: don't use strcpy but strlcpy ...	2012-05-26 13:35:03 -07:00
Linus Torvalds	84a442b9a1	arm-soc: device tree conversions, part 2 These continue the device tree work from part 1, this set is for the tegra, mxs and imx platforms, all of which have dependencies on clock or pinctrl changes submitted earlier. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPuex7AAoJEIwa5zzehBx3xsQP/jkyt74MvuKUi8pi2zkeMIgn 4XieyqcA0KZjJzfB22q3GIZjNIf/mEIGE4E/3bneVMPh/E2zaiohaXFExBmjNjml hhzWeZlFGPBjrZsfpIXJIIUhwSI7gX2rjYh4npJmdNhZmy8Y89XnpNJhN1kOwMuV oN23hPWoSVGbyDMQ0fmHx9GyOL8m7yap+joG13aljDa2OKpQg+pYvdwft+k1K9di 8yPF+qA043UUR7dSsjmTbiCcjZy2eySdCmfOAkEG4inSgxNoM7GBs3MuwZo/veCD v5WssJqWDbLXtqKn5Uo2bvGWiEcf0xtwOAqhSpbaup3dQFJSWMEenBTtA9UlxFhk 6gdY62O+7k6N0thkxXyLNGkgaGzexZAsK7dM6XSDB+PqD+OSNJS7dvmxZM8tuaRn rvCM1XWcNeN/dpnLbgwCR12efkwWtJoqqUZUUp/tFFaTo8HriqeAIYk7obnR8s9n S5x9LeueQGNgaxXJzVdh481YKG/1lqjG/a06HbVgYS4XQvtdA+4khalOefJv10tm Nkg8+4/8pMthWJfhhlfPUgWFXOXFF2AGPG4su2XwKuFXypO8599lzi7gUQaEZu2U 7caqoWP69KsKvK5iAAmA4DQ2rcsgHd44NXx/8Jjes9ma8knlYjrf42dBH6AZMQBG 69I9sJ1cyqusBwx72NPN =WeDQ -----END PGP SIGNATURE----- Merge tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull arm-soc device tree conversions (part 2) from Olof Johansson: "These continue the device tree work from part 1, this set is for the tegra, mxs and imx platforms, all of which have dependencies on clock or pinctrl changes submitted earlier." Fix up trivial conflicts due to nearby changes in drivers/{gpio/gpio,i2c/busses/i2c}-mxs.c * tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (73 commits) ARM: dt: tegra: invert status=disable vs status=okay ARM: dt: tegra: consistent basic property ordering ARM: dt: tegra: sort nodes based on bus order ARM: dt: tegra: remove duplicate device_type property ARM: dt: tegra: consistenly use lower-case for hex constants ARM: dt: tegra: format regs properties consistently ARM: dt: tegra: gpio comment cleanup ARM: dt: tegra: remove unnecessary unit addresses ARM: dt: tegra: whitespace cleanup ARM: dt: tegra cardhu: fix typo in SDHCI node name ARM: dt: tegra: cardhu: register core regulator tps62361 ARM: dt: tegra30.dtsi: Add SMMU node ARM: dt: tegra20.dtsi: Add GART node ARM: dt: tegra30.dtsi: Add Memory Controller(MC) nodes ARM: dt: tegra20.dtsi: Add Memory Controller(MC) nodes ARM: dt: tegra: Add device tree support for AHB ARM: dts: enable audio support for imx28-evk ARM: dts: enable i2c device for imx28-evk i2c: mxs: add device tree probe support ARM: dts: enable mmc for imx28-evk ...	2012-05-26 12:57:47 -07:00
Linus Torvalds	39b6cc668c	arm-soc: add stmp-dev library code A number of devices are using a common register layout, this adds support code for it in lib/stmp_device.c so we do not need to duplicate it in each driver. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPuexlAAoJEIwa5zzehBx31P0P/jK7GC5Ln2gr/bV+4Kt9fStS VcGI/ARsyQtwaNTJQfPkg8Weg3DhbPRlUWeimVKMFo3uEle3VjnPBjdcMPUKtW3x SPka/W591LGEdKQRmXZrISm2OiQXVvM2zkhSJV89n/tJBdHd+tDWDDq4Y784F8Cj hWmcIi66G4RBPj5pplf80UhNAEg5HoZHQnlgrS1iLMpBTwXAesv7zyZpvnsMzdpg qSJTfcifgLULtM0WFbooNGojBn8ftuA67psrw78vgV2bz7bVBioZHYFyqPWK9Gr0 vtiKuyXqiDA65mueXA+E5RXXLCLQSyGdV8y0xiSYjilRVkziPcMKnQT07keb8SJN CCDpetjEULiQpgKvVWc7sDGlb5ePd/C5rs31S0fFOKjeRJNlfG5+OuqZPiobO7hk F2Fx3gq4LPLel7gwjK3T4XTmmL9kNt/y1sIfXx5WybJL8N5n6TdZIfWm6yOZYwfX jvG/CnvVvhgdWk/ebaTEOG1MaeNAY3uwGpSBuEEoXUDHatQdOYAsgLfJJv/H4zKp 2AY9qvXTDtFYys/hs2WhwmS7s1WFlIrA+voEPBDa3WT2qGup8HAL/C9kL3ms2zqk 8JL/yQ/IJpTHPb4bCGo9C08qdi1YtMbylHB0/ELvG1BNoHOnCDV3wZlVG3ZTQQb5 c/Lb2H8crk5HVbpCPLQU =VHLM -----END PGP SIGNATURE----- Merge tag 'stmp-dev' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull arm-soc stmp-dev library code from Olof Johansson: "A number of devices are using a common register layout, this adds support code for it in lib/stmp_device.c so we do not need to duplicate it in each driver." Fix up trivial conflicts in drivers/i2c/busses/i2c-mxs.c and lib/Makefile * tag 'stmp-dev' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: i2c: mxs: use global reset function lib: add support for stmp-style devices	2012-05-26 12:50:04 -07:00
Linus Torvalds	2795343705	arm-soc: clock driver changes The new clock subsystem was merged in linux-3.4 without any users, this now moves the first three platforms over to it: imx, mxs and spear. The series also contains the changes for the clock subsystem itself, since Mike preferred to have it together with the platforms that require these changes, in order to avoid interdependencies and conflicts. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPuexPAAoJEIwa5zzehBx3YBsP/0nFhXjb5t1PdLfFzGKtcZVB j4zXWXMHQ1fA7wIfEpZF3Nnco6MQkufF5wJPoPdn1+wmkzCn3D6IwNVWVtW4U5i9 VGyShSbgusAAYXUe/9yYj8eN+bbRQSvdN4eWYWU6+rRXShGZ5dZZmp+IPNl54dnW 6F8uCnHX0cnIMCpGqV+41zZgZ/4wL2k9gdqu0LO6pi07o4tGd0Z4gcySgUFAnn1R kofNHueYIP4UgOg8DREoBzVKlpRqMou3S2kSZUfMeb3Q9ryF7UIvaGqIILyi7PKL kWd3nptg0EPavfL21SwXHiGpnDpB/Gj/F70kcPLus5RYujB24C9bvBmc26z68NZx Sz9mbElkkIU5duZsl1nxBWJ8IZ/tSWdtmC2xQMznmV7gHyGgVwr4j47f4Uv5sBvM 14JHDO7mqN6E6FnTFZu/oPAN5pDjgL+TVNK5BU6Wkq0zitrA6eyKDqCvBCqkO6Nn tNzOuyRDzMOwM7HzqXhxqtzJWXylO1Mldc4bM8X4Cocf4pnLna/X6uP6dgE6A+JY azVYx4I/0NdEPerDTzIcEhBDgZeBVROhUQr+kHxc4rf6WzUUbu/wEo1UKXWV66oW 1jb1yAFFWqYjkQuQc2PD4JSx35sFJaoSaoneRtmzBzRDfzSr5KjKj1E0e1skyMFq 7ZVLCqZD0cB9DhmMDkWP =rwFF -----END PGP SIGNATURE----- Merge tag 'clock' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull arm-soc clock driver changes from Olof Johansson: "The new clock subsystem was merged in linux-3.4 without any users, this now moves the first three platforms over to it: imx, mxs and spear. The series also contains the changes for the clock subsystem itself, since Mike preferred to have it together with the platforms that require these changes, in order to avoid interdependencies and conflicts." Fix up trivial conflicts in arch/arm/mach-kirkwood/common.c (code removed in one branch, added OF support in another) and drivers/dma/imx-sdma.c (independent changes next to each other). * tag 'clock' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (97 commits) clk: Fix CLK_SET_RATE_GATE flag validation in clk_set_rate(). clk: Provide dummy clk_unregister() SPEAr: Update defconfigs SPEAr: Add SMI NOR partition info in dts files SPEAr: Switch to common clock framework SPEAr: Call clk_prepare() before calling clk_enable SPEAr: clk: Add General Purpose Timer Synthesizer clock SPEAr: clk: Add Fractional Synthesizer clock SPEAr: clk: Add Auxiliary Synthesizer clock SPEAr: clk: Add VCO-PLL Synthesizer clock SPEAr: Add DT bindings for SPEAr's timer ARM i.MX: remove now unused clock files ARM: i.MX6: implement clocks using common clock framework ARM i.MX35: implement clocks using common clock framework ARM i.MX5: implement clocks using common clock framework ARM: Kirkwood: Replace clock gating ARM: Orion: Audio: Add clk/clkdev support ARM: Orion: PCIE: Add support for clk ARM: Orion: XOR: Add support for clk ARM: Orion: CESA: Add support for clk ...	2012-05-26 12:42:29 -07:00
Linus Torvalds	36126f8f2e	word-at-a-time: make the interfaces truly generic This changes the interfaces in <asm/word-at-a-time.h> to be a bit more complicated, but a lot more generic. In particular, it allows us to really do the operations efficiently on both little-endian and big-endian machines, pretty much regardless of machine details. For example, if you can rely on a fast population count instruction on your architecture, this will allow you to make your optimized <asm/word-at-a-time.h> file with that. NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is not truly generic, it actually only works on big-endian. Why? Because on little-endian the generic algorithms are wasteful, since you can inevitably do better. The x86 implementation is an example of that. (The only truly non-generic part of the asm-generic implementation is the "find_zero()" function, and you could make a little-endian version of it. And if the Kbuild infrastructure allowed us to pick a particular header file, that would be lovely) The <asm/word-at-a-time.h> functions are as follows: - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm uses. - has_zero(): take a word, and determine if it has a zero byte in it. It gets the word, the pointer to the constant pool, and a pointer to an intermediate "data" field it can set. This is the "quick-and-dirty" zero tester: it's what is run inside the hot loops. - "prep_zero_mask()": take the word, the data that has_zero() produced, and the constant pool, and generate an exact mask of which byte had the first zero. This is run directly outside the loop, and allows the "has_zero()" function to answer the "is there a zero byte" question without necessarily getting exactly which byte is the first one to contain a zero. If you do multiple byte lookups concurrently (eg "hash_name()", which looks for both NUL and '/' bytes), after you've done the prep_zero_mask() phase, the result of those can be or'ed together to get the "either or" case. - The result from "prep_zero_mask()" can then be fed into "find_zero()" (to find the byte offset of the first byte that was zero) or into "zero_bytemask()" (to find the bytemask of the bytes preceding the zero byte). The existence of zero_bytemask() is optional, and is not necessary for the normal string routines. But dentry name hashing needs it, so if you enable DENTRY_WORD_AT_A_TIME you need to expose it. This changes the generic strncpy_from_user() function and the dentry hashing functions to use these modified word-at-a-time interfaces. This gets us back to the optimized state of the x86 strncpy that we lost in the previous commit when moving over to the generic version. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-26 11:33:40 -07:00
Trond Myklebust	32b0131069	NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid If the EXCHGID4_FLAG_CONFIRMED_R flag is set, the client is in theory supposed to already know the correct value of the seqid, in which case RFC5661 states that it should ignore the value returned. Also ensure that if the sanity check in nfs4_check_cl_exchange_flags fails, then we must not change the nfs_client fields. Finally, clean up the code: we don't need to retest the value of 'status' unless it can change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-26 14:17:31 -04:00
Trond Myklebust	6624553910	NFSv4.1: Add DESTROY_CLIENTID Ensure that we destroy our lease on last unmount Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-26 14:17:30 -04:00
Linus Torvalds	fa2af6e4fe	Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull tile updates from Chris Metcalf: "These changes cover a range of new arch/tile features and optimizations. They've been through LKML review and on linux-next for a month or so. There's also one bug-fix that just missed 3.4, which I've marked for stable." Fixed up trivial conflict in arch/tile/Kconfig (new added tile Kconfig entries clashing with the generic timer/clockevents changes). * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: default to tilegx_defconfig for ARCH=tile tile: fix bug where fls(0) was not returning 0 arch/tile: mark TILEGX as not EXPERIMENTAL tile/mm/fault.c: Port OOM changes to handle_page_fault arch/tile: add descriptive text if the kernel reports a bad trap arch/tile: allow querying cpu module information from the hypervisor arch/tile: fix hardwall for tilegx and generalize for idn and ipi arch/tile: support multiple huge page sizes dynamically mm: add new arch_make_huge_pte() method for tile support arch/tile: support kexec() for tilegx arch/tile: support <asm/cachectl.h> header for cacheflush() syscall arch/tile: Allow tilegx to build with either 16K or 64K page size arch/tile: optimize get_user/put_user and friends arch/tile: support building big-endian kernel arch/tile: allow building Linux with transparent huge pages enabled arch/tile: use interrupt critical sections less	2012-05-25 15:59:38 -07:00
Trond Myklebust	ad24ecfbcd	NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations For backward compatibility with nfs-utils. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com>	2012-05-25 18:02:09 -04:00
Chris Metcalf	d9ed9faac2	mm: add new arch_make_huge_pte() method for tile support The tile support for multiple-size huge pages requires tagging the hugetlb PTE with a "super" bit for PTEs that are multiples of the basic size of a pagetable span. To set that bit properly we need to tweak the PTe in make_huge_pte() based on the vma. This change provides the API for a subsequent tile-specific change to use. Reviewed-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2012-05-25 12:48:26 -04:00
Chris Metcalf	73636b1aac	arch/tile: allow building Linux with transparent huge pages enabled The change adds some infrastructure for managing tile pmd's more generally, using pte_pmd() and pmd_pte() methods to translate pmd values to and from ptes, since on TILEPro a pmd is really just a nested structure holding a pgd (aka pte). Several existing pmd methods are moved into this framework, and a whole raft of additional pmd accessors are defined that are used by the transparent hugepage framework. The tile PTE now has a "client2" bit. The bit is used to indicate a transparent huge page is in the process of being split into subpages. This change also fixes a generic bug where the return value of the generic pmdp_splitting_flush() was incorrect. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2012-05-25 12:48:21 -04:00

1 2 3 4 5 ...

50919 Commits