Commit graph

1131 commits

Author SHA1 Message Date
Tao Huang
daf2796321 Merge commit '52f971ee6e'
* commit '52f971ee6e': (84565 commits)
  arm64: dts: rockchip: rk3562: Enable viLKsvPwrActive for soc bus
  mtd: spi-nor: esmt: Support New devices
  mtd: spi-nor: fmsh: Support New devices
  mtd: spi-nor: gigadevice: Support New devices
  mtd: spinand: gsto: Add code
  mtd: spinand: hyf: Support new devices
  mmc: convert thunder boot dependency
  ARM: dts: rockchip: rv1106: add node for system sleep
  ARM: rockchip: support rv1106 suspend
  ARM: rockchip: add some pm-related functions
  video: rockchip: mpp: fix rk3528 avsd not probe issue
  arm64: dts: rockchip: rk3588-vehicle-maxim-serdes: Add BOE AV156FHT L83 support
  arm64: rockchip_defconfig: Enable CONFIG_DRM_PANEL_MAXIM_MAX96752F
  drm/panel: Add panel driver for Maxim MAX96752F based LCDs
  media: i2c: techpoint: add support 4 channel 2 lane mode
  drm/rockchip: dsi2: fix NULL in component_ops .unbind helper
  media: rockchip: vicap: fixes cma can not alloc when capture raw
  media: rockchip: vicap: fixed vc err for multi channel
  media: rockchip: hdmirx: fix timing info for interlaced resolution
  media: rockchip: hdmirx: fix code error for cec register failed
  ...

Change-Id: Ia7ac365455d87a295e62bbf481d80694a9712f30

Conflicts:
	.gitignore
	Documentation/devicetree/bindings/clock/rockchip,px30-cru.txt
	Documentation/devicetree/bindings/connector/usb-connector.yaml
	Documentation/devicetree/bindings/display/rockchip/dw_hdmi-rockchip.txt
	Documentation/devicetree/bindings/hwmon/pwm-fan.txt
	Documentation/devicetree/bindings/iio/light/vl6180.txt
	Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
	Documentation/devicetree/bindings/mtd/rockchip,nand-controller.yaml
	Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
	Documentation/devicetree/bindings/net/snps,dwmac.yaml
	Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.yaml
	Documentation/devicetree/bindings/power/rockchip-io-domain.txt
	Documentation/devicetree/bindings/regulator/fan53555.txt
	Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
	Documentation/devicetree/bindings/sound/rockchip,pdm.yaml
	Documentation/devicetree/bindings/sound/rockchip-spdif.yaml
	Documentation/devicetree/bindings/spi/spi-rockchip.yaml
	Documentation/devicetree/bindings/thermal/rockchip-thermal.txt
	Documentation/devicetree/bindings/usb/usb-xhci.txt
	Documentation/filesystems/erofs.rst
	arch/arm/Kconfig
	arch/arm/Makefile
	arch/arm/boot/compressed/head.S
	arch/arm/boot/dts/rk3036.dtsi
	arch/arm/boot/dts/rk3066a-rayeager.dts
	arch/arm/boot/dts/rk3066a.dtsi
	arch/arm/boot/dts/rk322x.dtsi
	arch/arm/boot/dts/rk3288.dtsi
	arch/arm/boot/dts/rk3xxx.dtsi
	arch/arm64/boot/dts/rockchip/Makefile
	arch/arm64/boot/dts/rockchip/px30.dtsi
	arch/arm64/boot/dts/rockchip/rk3308.dtsi
	arch/arm64/boot/dts/rockchip/rk3399-opp.dtsi
	arch/arm64/boot/dts/rockchip/rk3399.dtsi
	arch/arm64/boot/dts/rockchip/rk3566.dtsi
	arch/arm64/boot/dts/rockchip/rk3568-pinctrl.dtsi
	arch/arm64/boot/dts/rockchip/rk3568.dtsi
	arch/arm64/boot/dts/rockchip/rockchip-pinconf.dtsi
	arch/arm64/kernel/process.c
	arch/arm64/mm/Makefile
	arch/arm64/mm/fault.c
	arch/arm64/mm/init.c
	drivers/Kconfig
	drivers/Makefile
	drivers/android/Kconfig
	drivers/ata/ahci_platform.c
	drivers/char/hw_random/Kconfig
	drivers/char/hw_random/Makefile
	drivers/clk/clk.c
	drivers/clk/rockchip/Kconfig
	drivers/clk/rockchip/Makefile
	drivers/clk/rockchip/clk-cpu.c
	drivers/clk/rockchip/clk-rk3036.c
	drivers/clk/rockchip/clk-rk3188.c
	drivers/clk/rockchip/clk-rk3308.c
	drivers/clk/rockchip/clk-rk3399.c
	drivers/clk/rockchip/clk-rk3568.c
	drivers/clk/rockchip/clk-rv1126.c
	drivers/clk/rockchip/clk.c
	drivers/clk/rockchip/clk.h
	drivers/cpufreq/cpufreq-dt.c
	drivers/crypto/Kconfig
	drivers/devfreq/Makefile
	drivers/devfreq/devfreq.c
	drivers/dma-buf/dma-buf.c
	drivers/dma-buf/heaps/Makefile
	drivers/dma/pl330.c
	drivers/firmware/Kconfig
	drivers/gpio/Kconfig
	drivers/gpio/Makefile
	drivers/gpio/gpio-rockchip.c
	drivers/gpu/Makefile
	drivers/gpu/drm/Kconfig
	drivers/gpu/drm/Makefile
	drivers/gpu/drm/bridge/Kconfig
	drivers/gpu/drm/bridge/Makefile
	drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
	drivers/gpu/drm/bridge/analogix/analogix_dp_core.h
	drivers/gpu/drm/bridge/analogix/analogix_dp_reg.c
	drivers/gpu/drm/bridge/display-connector.c
	drivers/gpu/drm/bridge/sii902x.c
	drivers/gpu/drm/bridge/synopsys/Makefile
	drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
	drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
	drivers/gpu/drm/drm_atomic_helper.c
	drivers/gpu/drm/drm_crtc_internal.h
	drivers/gpu/drm/drm_edid.c
	drivers/gpu/drm/panel/panel-simple.c
	drivers/gpu/drm/rockchip/Kconfig
	drivers/gpu/drm/rockchip/Makefile
	drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
	drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
	drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
	drivers/gpu/drm/rockchip/inno_hdmi.c
	drivers/gpu/drm/rockchip/rockchip_drm_drv.c
	drivers/gpu/drm/rockchip/rockchip_drm_drv.h
	drivers/gpu/drm/rockchip/rockchip_drm_fbdev.c
	drivers/gpu/drm/rockchip/rockchip_drm_gem.c
	drivers/gpu/drm/rockchip/rockchip_drm_vop.c
	drivers/gpu/drm/rockchip/rockchip_drm_vop.h
	drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
	drivers/gpu/drm/rockchip/rockchip_lvds.c
	drivers/gpu/drm/rockchip/rockchip_rgb.c
	drivers/gpu/drm/rockchip/rockchip_vop2_reg.c
	drivers/gpu/drm/rockchip/rockchip_vop_reg.c
	drivers/gpu/drm/rockchip/rockchip_vop_reg.h
	drivers/hwmon/pwm-fan.c
	drivers/hwspinlock/Kconfig
	drivers/hwspinlock/Makefile
	drivers/i2c/busses/i2c-rk3x.c
	drivers/i2c/i2c-core-base.c
	drivers/iio/adc/Kconfig
	drivers/iio/adc/rockchip_saradc.c
	drivers/iio/industrialio-event.c
	drivers/input/touchscreen/Makefile
	drivers/iommu/iommu.c
	drivers/iommu/rockchip-iommu.c
	drivers/irqchip/irq-gic-v3-its.c
	drivers/leds/Makefile
	drivers/mailbox/Kconfig
	drivers/media/common/videobuf2/Makefile
	drivers/media/i2c/Kconfig
	drivers/media/i2c/Makefile
	drivers/media/i2c/dw9714.c
	drivers/media/i2c/hi556.c
	drivers/media/i2c/imx214.c
	drivers/media/i2c/imx258.c
	drivers/media/i2c/imx334.c
	drivers/media/i2c/imx335.c
	drivers/media/i2c/ov5648.c
	drivers/media/i2c/ov5670.c
	drivers/media/i2c/ov5695.c
	drivers/media/i2c/ov7251.c
	drivers/media/platform/Kconfig
	drivers/media/platform/Makefile
	drivers/media/platform/rockchip/Kconfig
	drivers/media/spi/Kconfig
	drivers/media/spi/Makefile
	drivers/media/usb/uvc/uvc_driver.c
	drivers/media/usb/uvc/uvcvideo.h
	drivers/media/v4l2-core/v4l2-async.c
	drivers/media/v4l2-core/v4l2-ioctl.c
	drivers/mfd/rk808.c
	drivers/mmc/core/block.c
	drivers/mmc/core/host.c
	drivers/mmc/core/mmc.c
	drivers/mmc/core/mmc_ops.c
	drivers/mmc/host/Makefile
	drivers/mmc/host/dw_mmc-rockchip.c
	drivers/mmc/host/dw_mmc.c
	drivers/mmc/host/dw_mmc.h
	drivers/mmc/host/sdhci-of-dwcmshc.c
	drivers/mtd/nand/Makefile
	drivers/mtd/nand/raw/Kconfig
	drivers/mtd/nand/raw/Makefile
	drivers/mtd/nand/raw/rockchip-nand-controller.c
	drivers/mtd/nand/spi/Makefile
	drivers/mtd/nand/spi/core.c
	drivers/mtd/nand/spi/gigadevice.c
	drivers/mtd/nand/spi/macronix.c
	drivers/mtd/nand/spi/xtx.c
	drivers/mtd/spi-nor/Kconfig
	drivers/mtd/spi-nor/Makefile
	drivers/mtd/spi-nor/core.c
	drivers/mtd/spi-nor/core.h
	drivers/mtd/spi-nor/eon.c
	drivers/mtd/spi-nor/esmt.c
	drivers/mtd/spi-nor/gigadevice.c
	drivers/mtd/spi-nor/macronix.c
	drivers/mtd/spi-nor/winbond.c
	drivers/mtd/spi-nor/xmc.c
	drivers/net/ethernet/stmicro/stmmac/Makefile
	drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
	drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
	drivers/net/ethernet/stmicro/stmmac/stmmac.h
	drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	drivers/net/phy/Kconfig
	drivers/net/phy/motorcomm.c
	drivers/net/phy/phy_device.c
	drivers/nvmem/Kconfig
	drivers/nvmem/Makefile
	drivers/pci/controller/dwc/Makefile
	drivers/pci/controller/dwc/pcie-designware-host.c
	drivers/pci/controller/dwc/pcie-dw-rockchip.c
	drivers/pci/controller/pcie-rockchip-host.c
	drivers/pci/controller/pcie-rockchip.h
	drivers/pci/pci-sysfs.c
	drivers/pci/pcie/Makefile
	drivers/phy/rockchip/Kconfig
	drivers/phy/rockchip/Makefile
	drivers/phy/rockchip/phy-rockchip-inno-dsidphy.c
	drivers/phy/rockchip/phy-rockchip-inno-usb2.c
	drivers/phy/rockchip/phy-rockchip-naneng-combphy.c
	drivers/phy/rockchip/phy-rockchip-snps-pcie3.c
	drivers/phy/rockchip/phy-rockchip-typec.c
	drivers/pinctrl/Kconfig
	drivers/pinctrl/Makefile
	drivers/pinctrl/pinctrl-rk805.c
	drivers/pinctrl/pinctrl-rockchip.c
	drivers/pinctrl/pinctrl-rockchip.h
	drivers/power/supply/Kconfig
	drivers/power/supply/Makefile
	drivers/power/supply/bq25890_charger.c
	drivers/power/supply/rk817_charger.c
	drivers/pwm/core.c
	drivers/pwm/pwm-rockchip.c
	drivers/regulator/fan53555.c
	drivers/regulator/rk808-regulator.c
	drivers/rtc/rtc-hym8563.c
	drivers/soc/rockchip/Kconfig
	drivers/soc/rockchip/Makefile
	drivers/soc/rockchip/grf.c
	drivers/soc/rockchip/io-domain.c
	drivers/soc/rockchip/pm_domains.c
	drivers/spi/Kconfig
	drivers/spi/spi-rockchip-sfc.c
	drivers/spi/spi-rockchip.c
	drivers/spi/spidev.c
	drivers/staging/android/ion/heaps/ion_system_heap.c
	drivers/thermal/rockchip_thermal.c
	drivers/tty/serial/8250/8250_dma.c
	drivers/tty/serial/8250/8250_dw.c
	drivers/tty/serial/8250/8250_dwlib.c
	drivers/tty/serial/8250/8250_port.c
	drivers/usb/dwc2/platform.c
	drivers/usb/dwc3/core.c
	drivers/usb/dwc3/core.h
	drivers/usb/dwc3/ep0.c
	drivers/usb/dwc3/gadget.c
	drivers/usb/gadget/configfs.c
	drivers/usb/gadget/function/f_fs.c
	drivers/usb/gadget/function/f_uvc.c
	drivers/usb/gadget/function/uvc.h
	drivers/usb/gadget/function/uvc_configfs.c
	drivers/usb/gadget/function/uvc_queue.c
	drivers/usb/gadget/function/uvc_v4l2.c
	drivers/usb/gadget/function/uvc_video.c
	drivers/usb/gadget/udc/core.c
	drivers/usb/host/ehci-hcd.c
	drivers/usb/host/ehci-platform.c
	drivers/usb/storage/unusual_uas.h
	drivers/usb/typec/altmodes/Kconfig
	drivers/usb/typec/altmodes/displayport.c
	drivers/usb/typec/class.c
	drivers/usb/typec/tcpm/tcpm.c
	fs/Kconfig
	fs/cifs/inode.c
	fs/dax.c
	fs/erofs/data.c
	fs/erofs/inode.c
	fs/erofs/internal.h
	fs/erofs/super.c
	fs/f2fs/super.c
	fs/fuse/dev.c
	include/drm/bridge/dw_hdmi.h
	include/drm/drm_connector.h
	include/drm/drm_edid.h
	include/dt-bindings/clock/rk3568-cru.h
	include/dt-bindings/power/rk3568-power.h
	include/dt-bindings/power/rk3588-power.h
	include/linux/clk-provider.h
	include/linux/cma.h
	include/linux/dma-buf.h
	include/linux/dma-heap.h
	include/linux/mfd/rk808.h
	include/linux/mtd/spi-nor.h
	include/linux/mtd/spinand.h
	include/linux/phy/pcie.h
	include/linux/pwm.h
	include/linux/sched/sysctl.h
	include/linux/slub_def.h
	include/linux/stmmac.h
	include/linux/usb/typec.h
	include/media/v4l2-async.h
	include/soc/rockchip/pm_domains.h
	include/uapi/drm/drm_fourcc.h
	include/uapi/linux/iio/types.h
	include/uapi/linux/media-bus-format.h
	init/Kconfig
	init/main.c
	kernel/printk/printk.c
	kernel/rcu/Kconfig.debug
	kernel/rcu/tree_stall.h
	kernel/sched/core.c
	kernel/sched/cpufreq_schedutil.c
	kernel/sched/fair.c
	kernel/sched/pelt.c
	kernel/sched/rt.c
	kernel/sched/sched.h
	kernel/softirq.c
	kernel/sysctl.c
	mm/Makefile
	mm/cma.c
	mm/page_alloc.c
	mm/slub.c
	scripts/.gitignore
	scripts/headers_install.sh
	sound/soc/codecs/Kconfig
	sound/soc/codecs/Makefile
	sound/soc/codecs/es8326.c
	sound/soc/codecs/es8326.h
	sound/soc/codecs/hdmi-codec.c
	sound/soc/codecs/rk817_codec.c
	sound/soc/rockchip/Kconfig
	sound/soc/rockchip/Makefile
	sound/soc/rockchip/rockchip_i2s.c
	sound/soc/rockchip/rockchip_i2s_tdm.c
	sound/soc/rockchip/rockchip_i2s_tdm.h
	sound/soc/rockchip/rockchip_pdm.c
	sound/soc/rockchip/rockchip_spdif.c
	sound/soc/soc-generic-dmaengine-pcm.c
	tools/iio/iio_event_monitor.c
2023-05-20 18:57:29 +08:00
Tao Huang
cc17504307 Merge tag 'android12-5.10-2023-02_r1' of https://android.googlesource.com/kernel/common
android12-5.10 February 2023 release 1

Artifacts:
  https://ci.android.com/builds/submitted/9611440/kernel_aarch64/latest

* tag 'android12-5.10-2023-02_r1': (5560 commits)
  ANDROID: GKI: Enable ARM64_ERRATUM_2454944
  ANDROID: dma-ops: Add restricted vendor hook
  ANDROID: arm64: Work around Cortex-A510 erratum 2454944
  ANDROID: mm/vmalloc: Add override for lazy vunmap
  ANDROID: cpuidle-psci: Fix suspicious RCU usage
  ANDROID: ABI: update allowed list for galaxy
  FROMGIT: f2fs: add sysfs nodes to set last_age_weight
  FROMGIT: f2fs: fix wrong calculation of block age
  ANDROID: struct io_uring ABI preservation hack for 5.10.162 changes
  ANDROID: fix up struct task_struct ABI change in 5.10.162
  ANDROID: add flags variable back to struct proto_ops
  UPSTREAM: io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups
  UPSTREAM: eventfd: provide a eventfd_signal_mask() helper
  UPSTREAM: eventpoll: add EPOLL_URING_WAKE poll wakeup flag
  UPSTREAM: Revert "proc: don't allow async path resolution of /proc/self components"
  UPSTREAM: Revert "proc: don't allow async path resolution of /proc/thread-self components"
  UPSTREAM: net: remove cmsg restriction from io_uring based send/recvmsg calls
  UPSTREAM: task_work: unconditionally run task_work from get_signal()
  UPSTREAM: signal: kill JOBCTL_TASK_WORK
  UPSTREAM: io_uring: import 5.15-stable io_uring
  ...

Change-Id: I2b16474d6e3a91f1d702486ec6d1565a7bc310e3

Conflicts:
	Documentation/ABI/testing/configfs-usb-gadget-uac2
	Documentation/usb/gadget-testing.rst
	Makefile
	arch/arm/boot/dts/rk3288-evb-act8846.dts
	arch/arm64/mm/Makefile
	drivers/dma-buf/dma-buf.c
	drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
	drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
	drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
	drivers/gpu/drm/rockchip/rockchip_drm_vop.c
	drivers/mmc/core/mmc.c
	drivers/pci/controller/dwc/pcie-designware-host.c
	drivers/pinctrl/pinctrl-rockchip.c
	drivers/regulator/core.c
	drivers/usb/dwc3/ep0.c
	drivers/usb/dwc3/gadget.c
	drivers/usb/gadget/function/f_hid.c
	drivers/usb/gadget/function/f_uac1.c
	drivers/usb/gadget/function/f_uac2.c
	drivers/usb/gadget/function/u_audio.c
	drivers/usb/gadget/function/u_audio.h
	drivers/usb/gadget/function/u_uac2.h
	drivers/usb/host/xhci.h
	drivers/usb/storage/unusual_uas.h
	drivers/usb/typec/altmodes/displayport.c
	include/linux/page_ext.h
	mm/cma.c
	mm/page_ext.c
	sound/core/pcm_dmaengine.c
	sound/soc/codecs/hdmi-codec.c
	include/linux/stmmac.h
	sound/drivers/aloop.c
	drivers/pci/controller/dwc/pcie-designware.h
2023-03-14 09:44:51 +08:00
Jason A. Donenfeld
81895a65ec treewide: use prandom_u32_max() when possible, part 1
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
        ... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
        print("Skipping 0x%x for cleanup elsewhere" % (value))
        cocci.include_match(False)
elif value & (value + 1) != 0:
        print("Skipping 0x%x because it's not a power of two minus one" % (value))
        cocci.include_match(False)
elif literal.startswith('0x'):
        coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
        coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p & (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-       T VAR;
        ... when != VAR
 }

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:55 -06:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Alexander Potapenko
68ef169a1d mm: kmsan: call KMSAN hooks from SLUB code
In order to report uninitialized memory coming from heap allocations KMSAN
has to poison them unless they're created with __GFP_ZERO.

It's handy that we need KMSAN hooks in the places where
init_on_alloc/init_on_free initialization is performed.

In addition, we apply __no_kmsan_checks to get_freepointer_safe() to
suppress reports when accessing freelist pointers that reside in freed
objects.

Link: https://lkml.kernel.org/r/20220915150417.722975-16-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:20 -07:00
Vlastimil Babka
00a7829ba8 Merge branch 'slab/for-6.1/slub_validation_locking' into slab/for-next
A fix for a regression in slub_debug caches that could cause slab page
leaks and subsequent warnings on cache shutdown, by Feng Tang.
2022-09-30 16:46:18 +02:00
Feng Tang
b731e3575f mm/slub: fix a slab missed to be freed problem
When enable kasan and kfence's in-kernel kunit test with slub_debug on,
it caught a problem (in linux-next tree):

 ------------[ cut here ]------------
 kmem_cache_destroy test: Slab cache still has objects when called from test_exit+0x1a/0x30
 WARNING: CPU: 3 PID: 240 at mm/slab_common.c:492 kmem_cache_destroy+0x16c/0x170
 Modules linked in:
 CPU: 3 PID: 240 Comm: kunit_try_catch Tainted: G    B            N 6.0.0-rc7-next-20220929 #52
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
 RIP: 0010:kmem_cache_destroy+0x16c/0x170
 Code: 41 5c 41 5d e9 a5 04 0b 00 c3 cc cc cc cc 48 8b 55 60 48 8b 4c 24 20 48 c7 c6 40 37 d2 82 48 c7 c7 e8 a0 33 83 e8 4e d7 14 01 <0f> 0b eb a7 41 56 41 89 d6 41 55 49 89 f5 41 54 49 89 fc 55 48 89
 RSP: 0000:ffff88800775fea0 EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffffffff83bdec48 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 1ffff11000eebf9e RDI: ffffed1000eebfc6
 RBP: ffff88804362fa00 R08: ffffffff81182e58 R09: ffff88800775fbdf
 R10: ffffed1000eebf7b R11: 0000000000000001 R12: 000000008c800d00
 R13: ffff888005e78040 R14: 0000000000000000 R15: ffff888005cdfad0
 FS:  0000000000000000(0000) GS:ffff88807ed00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000000360e001 CR4: 0000000000370ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  test_exit+0x1a/0x30
  kunit_try_run_case+0xad/0xc0
  kunit_generic_run_threadfn_adapter+0x26/0x50
  kthread+0x17b/0x1b0

It was biscted to commit c7323a5ad0 ("mm/slub: restrict sysfs
validation to debug caches and make it safe")

The problem is inside free_debug_processing(), under certain
circumstances the slab can be removed from the partial list but not
freed by discard_slab() and thus n->nr_slabs is not decreased
accordingly. During shutdown, this non-zero n->nr_slabs is detected and
reported.

Specifically, the problem is that there are two checks for detecting a
full partial list by comparing n->nr_partial >= s->min_partial where the
latter check is affected by remove_partial() decreasing n->nr_partial
between the checks. Reoganize the code so there is a single check
upfront.

Link: https://lore.kernel.org/all/20220930100730.250248-1-feng.tang@intel.com/
Fixes: c7323a5ad0 ("mm/slub: restrict sysfs validation to debug caches and make it safe")
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-30 16:19:33 +02:00
Greg Kroah-Hartman
1d17080edb This is the 5.10.146 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmM0D5YACgkQONu9yGCS
 aT60zQ//azKm1LwkEJrXhq9W8RH0qFooR5ktMtD77mX7jznl6QrebRycyD0lj67H
 QqkSWLKWocMiGNjCBHA4LS/OXVoMvjfWvdha1ExHO/1fqkM6MVqfy8+z8Tngzky/
 iTfaOjA6BSiQNnAyC+LPtJb5dCnvFYHL78+vZ3Kr6xHhX/MBCoTL+pP5bBp82ES+
 4N5mirDlLgLxI2d2KCfpwVkaRC+Ylsz5/PLkvzYpXz7RnXLL7PAu/tbHvJpM9qqj
 lONQU3av0utXPLzV8FdeejspFdTacG+V9d1AAfXivYQTBI5dyaUEPoR6qkZ4WgsN
 zZ6huMi/7Q0uL9QxGvvSqpEMPeq7hikanqFAZsfgNtXLZQM2Th8GyaqhVKtBN31n
 75z4dMrV5Whb0K6fo4yOZAzPL/safwHtqtEIsZsgpjCnUKgl0YWyRlmrjQyOdTcI
 2DY/wTwf+f+D/U0CNfYd0xrmlDMsRgUQ3pjtT98kLHk0K8VPRySlSvkk9YW0qsLf
 4Hc8DCIiVa5lB5Rl8nGTUq0iIl9t17lpfy1Iboibhxay1IUMLBYdRNQ/bnOD2Y0W
 ZYimIghn6x0KuvqiQkktzMqtRdlzIhvnu3ytOWBL7hNnVlGaa4kEY8zr0Ia5zwMP
 XKA18+ip/qV9qENnrjck/sh69itVR2q2qWa/BlV3cYnQsyTu62Y=
 =dY1i
 -----END PGP SIGNATURE-----

Merge 5.10.146 into android12-5.10-lts

Changes in 5.10.146
	drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
	drm/amdgpu: indirect register access for nv12 sriov
	drm/amdgpu: Separate vf2pf work item init from virt data exchange
	drm/amdgpu: make sure to init common IP before gmc
	usb: typec: intel_pmc_mux: Update IOM port status offset for AlderLake
	usb: typec: intel_pmc_mux: Add new ACPI ID for Meteor Lake IOM device
	usb: dwc3: gadget: Avoid starting DWC3 gadget during UDC unbind
	usb: dwc3: Issue core soft reset before enabling run/stop
	usb: dwc3: gadget: Prevent repeat pullup()
	usb: dwc3: gadget: Refactor pullup()
	usb: dwc3: gadget: Don't modify GEVNTCOUNT in pullup()
	usb: dwc3: gadget: Avoid duplicate requests to enable Run/Stop
	usb: xhci-mtk: get the microframe boundary for ESIT
	usb: xhci-mtk: add only one extra CS for FS/LS INTR
	usb: xhci-mtk: use @sch_tt to check whether need do TT schedule
	usb: xhci-mtk: add a function to (un)load bandwidth info
	usb: xhci-mtk: add some schedule error number
	usb: xhci-mtk: allow multiple Start-Split in a microframe
	usb: xhci-mtk: relax TT periodic bandwidth allocation
	mmc: core: Fix inconsistent sd3_bus_mode at UHS-I SD voltage switch failure
	serial: atmel: remove redundant assignment in rs485_config
	tty: serial: atmel: Preserve previous USART mode if RS485 disabled
	usb: add quirks for Lenovo OneLink+ Dock
	usb: gadget: udc-xilinx: replace memcpy with memcpy_toio
	usb: cdns3: fix incorrect handling TRB_SMM flag for ISOC transfer
	usb: cdns3: fix issue with rearming ISO OUT endpoint
	Revert "usb: add quirks for Lenovo OneLink+ Dock"
	vfio/type1: Change success value of vaddr_get_pfn()
	vfio/type1: Prepare for batched pinning with struct vfio_batch
	vfio/type1: Unpin zero pages
	Revert "usb: gadget: udc-xilinx: replace memcpy with memcpy_toio"
	arm64: Restrict ARM64_BTI_KERNEL to clang 12.0.0 and newer
	arm64/bti: Disable in kernel BTI when cross section thunks are broken
	USB: core: Fix RST error in hub.c
	USB: serial: option: add Quectel BG95 0x0203 composition
	USB: serial: option: add Quectel RM520N
	ALSA: hda/tegra: set depop delay for tegra
	ALSA: hda: add Intel 5 Series / 3400 PCI DID
	ALSA: hda/realtek: Add quirk for Huawei WRT-WX9
	ALSA: hda/realtek: Enable 4-speaker output Dell Precision 5570 laptop
	ALSA: hda/realtek: Re-arrange quirk table entries
	ALSA: hda/realtek: Add pincfg for ASUS G513 HP jack
	ALSA: hda/realtek: Add pincfg for ASUS G533Z HP jack
	ALSA: hda/realtek: Add quirk for ASUS GA503R laptop
	ALSA: hda/realtek: Enable 4-speaker output Dell Precision 5530 laptop
	iommu/vt-d: Check correct capability for sagaw determination
	media: flexcop-usb: fix endpoint type check
	efi: x86: Wipe setup_data on pure EFI boot
	efi: libstub: check Shim mode using MokSBStateRT
	wifi: mt76: fix reading current per-tid starting sequence number for aggregation
	gpio: mockup: fix NULL pointer dereference when removing debugfs
	gpiolib: cdev: Set lineevent_state::irq after IRQ register successfully
	riscv: fix a nasty sigreturn bug...
	can: flexcan: flexcan_mailbox_read() fix return value for drop = true
	mm/slub: fix to return errno if kmalloc() fails
	KVM: SEV: add cache flush to solve SEV cache incoherency issues
	interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate
	xfs: fix up non-directory creation in SGID directories
	xfs: reorder iunlink remove operation in xfs_ifree
	xfs: validate inode fork size against fork format
	arm64: dts: rockchip: Pull up wlan wake# on Gru-Bob
	drm/mediatek: dsi: Add atomic {destroy,duplicate}_state, reset callbacks
	arm64: dts: rockchip: Set RK3399-Gru PCLK_EDP to 24 MHz
	dmaengine: ti: k3-udma-private: Fix refcount leak bug in of_xudma_dev_get()
	arm64: dts: rockchip: Remove 'enable-active-low' from rk3399-puma
	netfilter: nf_conntrack_sip: fix ct_sip_walk_headers
	netfilter: nf_conntrack_irc: Tighten matching on DCC message
	netfilter: nfnetlink_osf: fix possible bogus match in nf_osf_find()
	iavf: Fix cached head and tail value for iavf_get_tx_pending
	ipvlan: Fix out-of-bound bugs caused by unset skb->mac_header
	net: let flow have same hash in two directions
	net: core: fix flow symmetric hash
	net: phy: aquantia: wait for the suspend/resume operations to finish
	scsi: mpt3sas: Force PCIe scatterlist allocations to be within same 4 GB region
	scsi: mpt3sas: Fix return value check of dma_get_required_mask()
	net: bonding: Share lacpdu_mcast_addr definition
	net: bonding: Unsync device addresses on ndo_stop
	net: team: Unsync device addresses on ndo_stop
	drm/panel: simple: Fix innolux_g121i1_l01 bus_format
	MIPS: lantiq: export clk_get_io() for lantiq_wdt.ko
	MIPS: Loongson32: Fix PHY-mode being left unspecified
	iavf: Fix bad page state
	iavf: Fix set max MTU size with port VLAN and jumbo frames
	i40e: Fix VF set max MTU size
	i40e: Fix set max_tx_rate when it is lower than 1 Mbps
	sfc: fix TX channel offset when using legacy interrupts
	sfc: fix null pointer dereference in efx_hard_start_xmit
	drm/hisilicon/hibmc: Allow to be built if COMPILE_TEST is enabled
	drm/hisilicon: Add depends on MMU
	of: mdio: Add of_node_put() when breaking out of for_each_xx
	net: ipa: fix assumptions about DMA address size
	net: ipa: fix table alignment requirement
	net: ipa: avoid 64-bit modulus
	net: ipa: DMA addresses are nicely aligned
	net: ipa: kill IPA_TABLE_ENTRY_SIZE
	net: ipa: properly limit modem routing table use
	wireguard: ratelimiter: disable timings test by default
	wireguard: netlink: avoid variable-sized memcpy on sockaddr
	net: enetc: move enetc_set_psfp() out of the common enetc_set_features()
	net: socket: remove register_gifconf
	net/sched: taprio: avoid disabling offload when it was never enabled
	net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
	netfilter: nf_tables: fix nft_counters_enabled underflow at nf_tables_addchain()
	netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
	netfilter: ebtables: fix memory leak when blob is malformed
	can: gs_usb: gs_can_open(): fix race dev->can.state condition
	perf jit: Include program header in ELF files
	perf kcore_copy: Do not check /proc/modules is unchanged
	drm/mediatek: dsi: Move mtk_dsi_stop() call back to mtk_dsi_poweroff()
	net/smc: Stop the CLC flow if no link to map buffers on
	net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
	net: sched: fix possible refcount leak in tc_new_tfilter()
	selftests: forwarding: add shebang for sch_red.sh
	drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV
	serial: Create uart_xmit_advance()
	serial: tegra: Use uart_xmit_advance(), fixes icount.tx accounting
	serial: tegra-tcu: Use uart_xmit_advance(), fixes icount.tx accounting
	s390/dasd: fix Oops in dasd_alias_get_start_dev due to missing pavgroup
	usb: xhci-mtk: fix issue of out-of-bounds array access
	vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external()
	drm/amdgpu: Fix check for RAS support
	cifs: use discard iterator to discard unneeded network data more efficiently
	cifs: always initialize struct msghdr smb_msg completely
	Drivers: hv: Never allocate anything besides framebuffer from framebuffer memory region
	drm/gma500: Fix BUG: sleeping function called from invalid context errors
	drm/amdgpu: use dirty framebuffer helper
	drm/amd/display: Limit user regamma to a valid value
	drm/amd/display: Mark dml30's UseMinimumDCFCLK() as noinline for stack usage
	drm/rockchip: Fix return type of cdn_dp_connector_mode_valid
	workqueue: don't skip lockdep work dependency in cancel_work_sync()
	i2c: imx: If pm_runtime_get_sync() returned 1 device access is possible
	i2c: mlxbf: incorrect base address passed during io write
	i2c: mlxbf: prevent stack overflow in mlxbf_i2c_smbus_start_transaction()
	i2c: mlxbf: Fix frequency calculation
	devdax: Fix soft-reservation memory description
	ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0
	ext4: limit the number of retries after discarding preallocations blocks
	ext4: make directory inode spreading reflect flexbg size
	Linux 5.10.146

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I45edad7e4191aad7a85278b43fa9909a6253643f
2022-09-29 17:06:30 +02:00
Vlastimil Babka
af961f8059 Merge branch 'slab/for-6.1/slub_debug_waste' into slab/for-next
A patch from Feng Tang that enhances the existing debugfs alloc_traces
file for kmalloc caches with information about how much space is wasted
by allocations that needs less space than the particular kmalloc cache
provides.
2022-09-29 11:28:26 +02:00
Vlastimil Babka
0bdcef54a2 Merge branch 'slab/for-6.1/trivial' into slab/for-next
Additional cleanup by Chao Yu removing a BUG_ON() in create_unique_id().
2022-09-29 11:27:58 +02:00
Chao Yu
379ac7905f mm/slub: fix to return errno if kmalloc() fails
commit 7e9c323c52 upstream.

In create_unique_id(), kmalloc(, GFP_KERNEL) can fail due to
out-of-memory, if it fails, return errno correctly rather than
triggering panic via BUG_ON();

kernel BUG at mm/slub.c:5893!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

Call trace:
 sysfs_slab_add+0x258/0x260 mm/slub.c:5973
 __kmem_cache_create+0x60/0x118 mm/slub.c:4899
 create_cache mm/slab_common.c:229 [inline]
 kmem_cache_create_usercopy+0x19c/0x31c mm/slab_common.c:335
 kmem_cache_create+0x1c/0x28 mm/slab_common.c:390
 f2fs_kmem_cache_create fs/f2fs/f2fs.h:2766 [inline]
 f2fs_init_xattr_caches+0x78/0xb4 fs/f2fs/xattr.c:808
 f2fs_fill_super+0x1050/0x1e0c fs/f2fs/super.c:4149
 mount_bdev+0x1b8/0x210 fs/super.c:1400
 f2fs_mount+0x44/0x58 fs/f2fs/super.c:4512
 legacy_get_tree+0x30/0x74 fs/fs_context.c:610
 vfs_get_tree+0x40/0x140 fs/super.c:1530
 do_new_mount+0x1dc/0x4e4 fs/namespace.c:3040
 path_mount+0x358/0x914 fs/namespace.c:3370
 do_mount fs/namespace.c:3383 [inline]
 __do_sys_mount fs/namespace.c:3591 [inline]
 __se_sys_mount fs/namespace.c:3568 [inline]
 __arm64_sys_mount+0x2f8/0x408 fs/namespace.c:3568

Cc: <stable@kernel.org>
Fixes: 81819f0fc8 ("SLUB core")
Reported-by: syzbot+81684812ea68216e08c5@syzkaller.appspotmail.com
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-28 11:10:28 +02:00
Chao Yu
d65360f224 mm/slub: clean up create_unique_id()
As Christophe JAILLET suggested [1]

In create_unique_id(),

"looks that ID_STR_LENGTH could even be reduced to 32 or 16.

The 2nd BUG_ON at the end of the function could certainly be just
removed as well or remplaced by a:
        if (p > name + ID_STR_LENGTH - 1) {
                kfree(name);
                return -E<something>;
        }
"

According to above suggestion, let's do below cleanups:
1. reduce ID_STR_LENGTH to 32, as the buffer size should be enough;
2. use WARN_ON instead of BUG_ON() and return error if check condition
is true;
3. use snprintf instead of sprintf to avoid overflow.

[1] https://lore.kernel.org/linux-mm/2025305d-16db-abdf-6cd3-1fb93371c2b4@wanadoo.fr/

Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-26 16:25:40 +02:00
Feng Tang
6edf2576a6 mm/slub: enable debugging memory wasting of kmalloc
kmalloc's API family is critical for mm, with one nature that it will
round up the request size to a fixed one (mostly power of 2). Say
when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
could be allocated, so in worst case, there is around 50% memory
space waste.

The wastage is not a big issue for requests that get allocated/freed
quickly, but may cause problems with objects that have longer life
time.

We've met a kernel boot OOM panic (v5.10), and from the dumped slab
info:

    [   26.062145] kmalloc-2k            814056KB     814056KB

From debug we found there are huge number of 'struct iova_magazine',
whose size is 1032 bytes (1024 + 8), so each allocation will waste
1016 bytes. Though the issue was solved by giving the right (bigger)
size of RAM, it is still nice to optimize the size (either use a
kmalloc friendly size or create a dedicated slab for it).

And from lkml archive, there was another crash kernel OOM case [1]
back in 2019, which seems to be related with the similar slab waste
situation, as the log is similar:

    [    4.332648] iommu: Adding device 0000:20:02.0 to group 16
    [    4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
    ...
    [    4.857565] kmalloc-2048           59164KB      59164KB

The crash kernel only has 256M memory, and 59M is pretty big here.
(Note: the related code has been changed and optimised in recent
kernel [2], these logs are just picked to demo the problem, also
a patch changing its size to 1024 bytes has been merged)

So add an way to track each kmalloc's memory waste info, and
leverage the existing SLUB debug framework (specifically
SLUB_STORE_USER) to show its call stack of original allocation,
so that user can evaluate the waste situation, identify some hot
spots and optimize accordingly, for a better utilization of memory.

The waste info is integrated into existing interface:
'/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of
'kmalloc-4k' after boot is:

 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1
     __kmem_cache_alloc_node+0x11f/0x4e0
     __kmalloc_node+0x4e/0x140
     ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe]
     ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe]
     ixgbe_probe+0x165f/0x1d20 [ixgbe]
     local_pci_probe+0x78/0xc0
     work_for_cpu_fn+0x26/0x40
     ...

which means in 'kmalloc-4k' slab, there are 126 requests of
2240 bytes which got a 4KB space (wasting 1856 bytes each
and 233856 bytes in total), from ixgbe_alloc_q_vector().

And when system starts some real workload like multiple docker
instances, there could are more severe waste.

[1]. https://lkml.org/lkml/2019/8/12/266
[2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/

[Thanks Hyeonggon for pointing out several bugs about sorting/format]
[Thanks Vlastimil for suggesting way to reduce memory usage of
 orig_size and keep it only for kmalloc objects]

Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-23 12:32:45 +02:00
Vlastimil Babka
5959725a4a Merge branch 'slab/for-6.1/slub_validation_locking' into slab/for-next
My series [1] to fix validation races for caches with enabled debugging.

By decoupling the debug cache operation more from non-debug fastpaths,
additional locking simplifications were possible and done afterwards.

Additional cleanup of PREEMPT_RT specific code on top, by Thomas Gleixner.

[1] https://lore.kernel.org/all/20220823170400.26546-1-vbabka@suse.cz/
2022-09-23 10:33:45 +02:00
Vlastimil Babka
3662c13ec6 Merge branch 'slab/for-6.1/common_kmalloc' into slab/for-next
The "common kmalloc v4" series [1] by Hyeonggon Yoo.

- Improves the mm/slab_common.c wrappers to allow deleting duplicated
  code between SLAB and SLUB.
- Large kmalloc() allocations in SLAB are passed to page allocator like
  in SLUB, reducing number of kmalloc caches.
- Removes the {kmem_cache_alloc,kmalloc}_node variants of tracepoints,
  node id parameter added to non-_node variants.
- 8 files changed, 341 insertions(+), 651 deletions(-)

[1] https://lore.kernel.org/all/20220817101826.236819-1-42.hyeyoo@gmail.com/

--
Merge resolves trivial conflict in mm/slub.c with commit 5373b8a09d
("kasan: call kasan_malloc() from __kmalloc_*track_caller()")
2022-09-23 10:32:02 +02:00
Vlastimil Babka
0467ca385f Merge branch 'slab/for-6.1/trivial' into slab/for-next
Trivial fixes and cleanups:
- unneeded variable removals, by ye xingchen
2022-09-23 10:29:53 +02:00
Maurizio Lombardi
e45cc28872 mm: slub: fix flush_cpu_slab()/__free_slab() invocations in task context.
Commit 5a836bf6b0 ("mm: slub: move flush_cpu_slab() invocations
__free_slab() invocations out of IRQ context") moved all flush_cpu_slab()
invocations to the global workqueue to avoid a problem related
with deactivate_slab()/__free_slab() being called from an IRQ context
on PREEMPT_RT kernels.

When the flush_all_cpu_locked() function is called from a task context
it may happen that a workqueue with WQ_MEM_RECLAIM bit set ends up
flushing the global workqueue, this will cause a dependency issue.

 workqueue: WQ_MEM_RECLAIM nvme-delete-wq:nvme_delete_ctrl_work [nvme_core]
   is flushing !WQ_MEM_RECLAIM events:flush_cpu_slab
 WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637
   check_flush_dependency+0x10a/0x120
 Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
 RIP: 0010:check_flush_dependency+0x10a/0x120[  453.262125] Call Trace:
 __flush_work.isra.0+0xbf/0x220
 ? __queue_work+0x1dc/0x420
 flush_all_cpus_locked+0xfb/0x120
 __kmem_cache_shutdown+0x2b/0x320
 kmem_cache_destroy+0x49/0x100
 bioset_exit+0x143/0x190
 blk_release_queue+0xb9/0x100
 kobject_cleanup+0x37/0x130
 nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc]
 nvme_free_ctrl+0x1ac/0x2b0 [nvme_core]

Fix this bug by creating a workqueue for the flush operation with
the WQ_MEM_RECLAIM bit set.

Fixes: 5a836bf6b0 ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
Cc: <stable@vger.kernel.org>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-22 21:48:48 +02:00
Thomas Gleixner
1f04b07d97 slub: Make PREEMPT_RT support less convoluted
The slub code already has a few helpers depending on PREEMPT_RT. Add a few
more and get rid of the CONFIG_PREEMPT_RT conditionals all over the place.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-17 00:18:36 +02:00
Vlastimil Babka
5875e59828 mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock()
The PREEMPT_RT specific disabling of irqs in __cmpxchg_double_slab()
(through slab_[un]lock()) is unnecessary as bit_spin_lock() disables
preemption and that's sufficient on PREEMPT_RT where no allocation/free
operation is performed in hardirq context and so can't interrupt the
current operation.

That means we no longer need the slab_[un]lock() wrappers, so delete
them and rename the current __slab_[un]lock() to slab_[un]lock().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2022-09-17 00:18:35 +02:00
Vlastimil Babka
4ef3f5a320 mm/slub: convert object_map_lock to non-raw spinlock
The only remaining user of object_map_lock is list_slab_objects().
Obtaining the lock there used to happen under slab_lock() which implied
disabling irqs on PREEMPT_RT, thus it's a raw_spinlock. With the
slab_lock() removed, we can convert it to a normal spinlock.

Also remove the get_map()/put_map() wrappers as list_slab_objects()
became their only remaining user.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2022-09-17 00:18:34 +02:00
Vlastimil Babka
41bec7c33f mm/slub: remove slab_lock() usage for debug operations
All alloc and free operations on debug caches are now serialized by
n->list_lock, so we can remove slab_lock() usage in validate_slab()
and list_slab_objects() as those also happen under n->list_lock.

Note the usage in list_slab_objects() could happen even on non-debug
caches, but only during cache shutdown time, so there should not be any
parallel freeing activity anymore. Except for buggy slab users, but in
that case the slab_lock() would not help against the common cmpxchg
based fast paths (in non-debug caches) anyway.

Also adjust documentation comments accordingly.

Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
2022-09-17 00:18:29 +02:00
Vlastimil Babka
c7323a5ad0 mm/slub: restrict sysfs validation to debug caches and make it safe
Rongwei Wang reports [1] that cache validation triggered by writing to
/sys/kernel/slab/<cache>/validate is racy against normal cache
operations (e.g. freeing) in a way that can cause false positive
inconsistency reports for caches with debugging enabled. The problem is
that debugging actions that mark object free or active and actual
freelist operations are not atomic, and the validation can see an
inconsistent state.

For caches that do or don't have debugging enabled, additional races
involving n->nr_slabs are possible that result in false reports of wrong
slab counts.

This patch attempts to solve these issues while not adding overhead to
normal (especially fastpath) operations for caches that do not have
debugging enabled. Such overhead would not be justified to make possible
userspace-triggered validation safe. Instead, disable the validation for
caches that don't have debugging enabled and make their sysfs validate
handler return -EINVAL.

For caches that do have debugging enabled, we can instead extend the
existing approach of not using percpu freelists to force all alloc/free
operations to the slow paths where debugging flags is checked and acted
upon. There can adjust the debug-specific paths to increase n->list_lock
coverage against concurrent validation as necessary.

The processing on free in free_debug_processing() already happens under
n->list_lock so we can extend it to actually do the freeing as well and
thus make it atomic against concurrent validation. As observed by
Hyeonggon Yoo, we do not really need to take slab_lock() anymore here
because all paths we could race with are protected by n->list_lock under
the new scheme, so drop its usage here.

The processing on alloc in alloc_debug_processing() currently doesn't
take any locks, but we have to first allocate the object from a slab on
the partial list (as debugging caches have no percpu slabs) and thus
take the n->list_lock anyway. Add a function alloc_single_from_partial()
that grabs just the allocated object instead of the whole freelist, and
does the debug processing. The n->list_lock coverage again makes it
atomic against validation and it is also ultimately more efficient than
the current grabbing of freelist immediately followed by slab
deactivation.

To prevent races on n->nr_slabs updates, make sure that for caches with
debugging enabled, inc_slabs_node() or dec_slabs_node() is called under
n->list_lock. When allocating a new slab for a debug cache, handle the
allocation by a new function alloc_single_from_new_slab() instead of the
current forced deactivation path.

Neither of these changes affect the fast paths at all. The changes in
slow paths are negligible for non-debug caches.

[1] https://lore.kernel.org/all/20220529081535.69275-1-rongwei.wang@linux.alibaba.com/

Reported-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-09-17 00:18:11 +02:00
Peter Collingbourne
5373b8a09d kasan: call kasan_malloc() from __kmalloc_*track_caller()
We were failing to call kasan_malloc() from __kmalloc_*track_caller()
which was causing us to sometimes fail to produce KASAN error reports
for allocations made using e.g. devm_kcalloc(), as the KASAN poison was
not being initialized. Fix it.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: <stable@vger.kernel.org> # 5.15
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-16 23:05:59 +02:00
Imran Khan
b84e04f1ba kfence: add sysfs interface to disable kfence for selected slabs.
By default kfence allocation can happen for any slab object, whose size is
up to PAGE_SIZE, as long as that allocation is the first allocation after
expiration of kfence sample interval.  But in certain debugging scenarios
we may be interested in debugging corruptions involving some specific slub
objects like dentry or ext4_* etc.  In such cases limiting kfence for
allocations involving only specific slub objects will increase the
probablity of catching the issue since kfence pool will not be consumed by
other slab objects.

This patch introduces a sysfs interface
'/sys/kernel/slab/<name>/skip_kfence' to disable kfence for specific
slabs.  Having the interface work in this way does not impact
current/default behavior of kfence and allows us to use kfence for
specific slabs (when needed) as well.  The decision to skip/use kfence is
taken depending on whether kmem_cache.flags has (newly introduced)
SLAB_SKIP_KFENCE flag set or not.

Link: https://lkml.kernel.org/r/20220814195353.2540848-1-imran.f.khan@oracle.com
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:52 -07:00
Chao Yu
7e9c323c52 mm/slub: fix to return errno if kmalloc() fails
In create_unique_id(), kmalloc(, GFP_KERNEL) can fail due to
out-of-memory, if it fails, return errno correctly rather than
triggering panic via BUG_ON();

kernel BUG at mm/slub.c:5893!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

Call trace:
 sysfs_slab_add+0x258/0x260 mm/slub.c:5973
 __kmem_cache_create+0x60/0x118 mm/slub.c:4899
 create_cache mm/slab_common.c:229 [inline]
 kmem_cache_create_usercopy+0x19c/0x31c mm/slab_common.c:335
 kmem_cache_create+0x1c/0x28 mm/slab_common.c:390
 f2fs_kmem_cache_create fs/f2fs/f2fs.h:2766 [inline]
 f2fs_init_xattr_caches+0x78/0xb4 fs/f2fs/xattr.c:808
 f2fs_fill_super+0x1050/0x1e0c fs/f2fs/super.c:4149
 mount_bdev+0x1b8/0x210 fs/super.c:1400
 f2fs_mount+0x44/0x58 fs/f2fs/super.c:4512
 legacy_get_tree+0x30/0x74 fs/fs_context.c:610
 vfs_get_tree+0x40/0x140 fs/super.c:1530
 do_new_mount+0x1dc/0x4e4 fs/namespace.c:3040
 path_mount+0x358/0x914 fs/namespace.c:3370
 do_mount fs/namespace.c:3383 [inline]
 __do_sys_mount fs/namespace.c:3591 [inline]
 __se_sys_mount fs/namespace.c:3568 [inline]
 __arm64_sys_mount+0x2f8/0x408 fs/namespace.c:3568

Cc: <stable@kernel.org>
Fixes: 81819f0fc8 ("SLUB core")
Reported-by: syzbot+81684812ea68216e08c5@syzkaller.appspotmail.com
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-08 23:27:01 +02:00
Hyeonggon Yoo
2c1d697fb8 mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
using TRACE_EVENT() macro.

And then this patch does:
   - Do not pass pointer to struct kmem_cache to trace_kmalloc.
     gfp flag is enough to know if it's accounted or not.
   - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
   - Avoid dereferencing s->name in when not using kmem_cache_free event.
   - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB

Cc: Vasily Averin <vasily.averin@linux.dev>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-01 11:44:26 +02:00
Hyeonggon Yoo
11e9734bcb mm/slab_common: unify NUMA and UMA version of tracepoints
Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
remove _node postfix for NUMA version of tracepoints.

This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
but at this point maintaining both kmem_alloc and kmem_alloc_node
event classes does not makes sense at all.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-01 10:40:27 +02:00
Hyeonggon Yoo
26a40990ba mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined
kmalloc. So rename it to kmalloc[_node]_trace().

Move its implementation to slab_common.c by using
__kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to save a
function call when CONFIG_TRACING=n.

Use __assume_kmalloc_alignment for kmalloc[_node]_trace instead of
__assume_slab_alignement. Generally kmalloc has larger alignment
requirements.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-01 10:40:27 +02:00
Hyeonggon Yoo
b140513524 mm/sl[au]b: generalize kmalloc subsystem
Now everything in kmalloc subsystem can be generalized.
Let's do it!

Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(),
kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them
to slab_common.c.

In the meantime, rename kmalloc_large_node_notrace()
to __kmalloc_large_node() and make it static as it's now only called in
slab_common.c.

[ feng.tang@intel.com: adjust kfence skip list to include
  __kmem_cache_free so that kfence kunit tests do not fail ]

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-01 10:38:06 +02:00
Vlastimil Babka
a579b0560c mm/slub: move free_debug_processing() further
In the following patch, the function free_debug_processing() will be
calling add_partial(), remove_partial() and discard_slab(), se move it
below their definitions to avoid forward declarations. To make review
easier, separate the move from functional changes.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
2022-08-25 14:41:54 +02:00
Hyeonggon Yoo
ed4cd17eb2 mm/sl[au]b: introduce common alloc/free functions without tracepoint
To unify kmalloc functions in later patch, introduce common alloc/free
functions that does not have tracepoint.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24 16:11:41 +02:00
Hyeonggon Yoo
d6a71648db mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
There is not much benefit for serving large objects in kmalloc().
Let's pass large requests to page allocator like SLUB for better
maintenance of common code.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24 16:11:41 +02:00
Hyeonggon Yoo
bf37d79102 mm/slab_common: kmalloc_node: pass large requests to page allocator
Now that kmalloc_large_node() is in common code, pass large requests
to page allocator in kmalloc_node() using kmalloc_large_node().

One problem is that currently there is no tracepoint in
kmalloc_large_node(). Instead of simply putting tracepoint in it,
use kmalloc_large_node{,_notrace} depending on its caller to show
useful address for both inlined kmalloc_node() and
__kmalloc_node_track_caller() when large objects are allocated.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24 16:11:41 +02:00
Hyeonggon Yoo
a0c3b94002 mm/slub: move kmalloc_large_node() to slab_common.c
In later patch SLAB will also pass requests larger than order-1 page
to page allocator. Move kmalloc_large_node() to slab_common.c.

Fold kmalloc_large_node_hook() into kmalloc_large_node() as there is
no other caller.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24 16:11:41 +02:00
Hyeonggon Yoo
0f853b2e6d mm/sl[au]b: factor out __do_kmalloc_node()
__kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller()
mostly do same job. Factor out common code into __do_kmalloc_node().

Note that this patch also fixes missing kasan_kmalloc() in SLUB's
__kmalloc_node_track_caller().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24 16:11:40 +02:00
Hyeonggon Yoo
c45248db04 mm/slab_common: cleanup kmalloc_track_caller()
Make kmalloc_track_caller() wrapper of kmalloc_node_track_caller().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24 16:11:40 +02:00
Hyeonggon Yoo
f78a03f6e2 mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n,
remove CONFIG_NUMA ifdefs for common kmalloc functions.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-24 16:11:40 +02:00
ye xingchen
2bfbb0271a mm/slub: Remove the unneeded result variable
Return the value from attribute->store(s, buf, len) and
attribute->show(s, buf) directly instead of storing it in
another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-08-23 16:07:36 +02:00
Greg Kroah-Hartman
50c9c56f73 This is the 5.10.130 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmLNhfsACgkQONu9yGCS
 aT6srhAAjWF9gtiDTufsmNBW5X7DeEPzwsXFVJ1b5+8mTvc2NPzvxxTsc5xwL7QG
 Ml2LEMXQYxdhBrXzm6Av1FZSQgnjdv2BMw9FQfjPkOeuxaFWpta8qg0sxiZZvrHs
 3rxiy9RKlSvaUpEGE2h/RcJFWL4ume9/bB+UhLLYXFKDfmZ7TKe2jA7b87W5Y0i9
 6tYyS98weJPMNdkioM7Iewbugr9xaXTElG47hAL4pDlWd5xnxKf10rgNg5hdzkxA
 jEPQlyiVpLvRHdqb+ZlPKMxLcZu3i9kXsO+fGPe1AsXopyTuSEq+A61vuGSsrhaF
 vyPXIzn66rGCjYIM19ku8HgcMhdi3GcwztE/OSZrMIZS0vU7jECyisqHvT8cUPr6
 caor6kYqrD8mdCvyDGp+8Fqaa+iGINX6PhC/0PkerX3HPwIXm+VPIjhAQ4QJzB/S
 ArJnzazDn1KbdAEldGB89JxI5+huh7COYg/c+zy1UOEHNttovdJZpXwxvBE+N6M4
 XeKqM3s4cxs/S9D1xJXKl1bkeMpQ9UegtlWIfVkKfNmNbWhP3u4FUDFpxCsje1tf
 xfP2eb9tJWYzxiiS+Rb2Ib44+OiF3sE2MCzFpP+Fusna5Sz9FlhqNlYW7I+a2ros
 DwwfWUsMz+t8bwc+wNA4SPqhdrkh8z7lWTm8449QYZesCzzsexU=
 =AgJC
 -----END PGP SIGNATURE-----

Merge 5.10.130 into android12-5.10-lts

Changes in 5.10.130
	mm/slub: add missing TID updates on slab deactivation
	ALSA: hda/realtek: Add quirk for Clevo L140PU
	can: bcm: use call_rcu() instead of costly synchronize_rcu()
	can: grcan: grcan_probe(): remove extra of_node_get()
	can: gs_usb: gs_usb_open/close(): fix memory leak
	bpf: Fix incorrect verifier simulation around jmp32's jeq/jne
	bpf: Fix insufficient bounds propagation from adjust_scalar_min_max_vals
	usbnet: fix memory leak in error case
	net: rose: fix UAF bug caused by rose_t0timer_expiry
	netfilter: nft_set_pipapo: release elements in clone from abort path
	netfilter: nf_tables: stricter validation of element data
	iommu/vt-d: Fix PCI bus rescan device hot add
	fbdev: fbmem: Fix logo center image dx issue
	fbmem: Check virtual screen sizes in fb_set_var()
	fbcon: Disallow setting font bigger than screen size
	fbcon: Prevent that screen size is smaller than font size
	PM: runtime: Redefine pm_runtime_release_supplier()
	memregion: Fix memregion_free() fallback definition
	video: of_display_timing.h: include errno.h
	powerpc/powernv: delay rng platform device creation until later in boot
	can: kvaser_usb: replace run-time checks with struct kvaser_usb_driver_info
	can: kvaser_usb: kvaser_usb_leaf: fix CAN clock frequency regression
	can: kvaser_usb: kvaser_usb_leaf: fix bittiming limits
	xfs: remove incorrect ASSERT in xfs_rename
	ARM: meson: Fix refcount leak in meson_smp_prepare_cpus
	pinctrl: sunxi: a83t: Fix NAND function name for some pins
	arm64: dts: qcom: msm8994: Fix CPU6/7 reg values
	arm64: dts: imx8mp-evk: correct mmc pad settings
	arm64: dts: imx8mp-evk: correct the uart2 pinctl value
	arm64: dts: imx8mp-evk: correct gpio-led pad settings
	arm64: dts: imx8mp-evk: correct I2C3 pad settings
	pinctrl: sunxi: sunxi_pconf_set: use correct offset
	arm64: dts: qcom: msm8992-*: Fix vdd_lvs1_2-supply typo
	ARM: at91: pm: use proper compatible for sama5d2's rtc
	ARM: at91: pm: use proper compatibles for sam9x60's rtc and rtt
	ARM: dts: at91: sam9x60ek: fix eeprom compatible and size
	ARM: dts: at91: sama5d2_icp: fix eeprom compatibles
	xsk: Clear page contiguity bit when unmapping pool
	i40e: Fix dropped jumbo frames statistics
	ibmvnic: Properly dispose of all skbs during a failover.
	selftests: forwarding: fix flood_unicast_test when h2 supports IFF_UNICAST_FLT
	selftests: forwarding: fix learning_test when h1 supports IFF_UNICAST_FLT
	selftests: forwarding: fix error message in learning_test
	r8169: fix accessing unset transport header
	i2c: cadence: Unregister the clk notifier in error path
	dmaengine: imx-sdma: Allow imx8m for imx7 FW revs
	misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transfer
	misc: rtsx_usb: use separate command and response buffers
	misc: rtsx_usb: set return value in rsp_buf alloc err path
	dt-bindings: dma: allwinner,sun50i-a64-dma: Fix min/max typo
	ida: don't use BUG_ON() for debugging
	dmaengine: pl330: Fix lockdep warning about non-static key
	dmaengine: at_xdma: handle errors of at_xdmac_alloc_desc() correctly
	dmaengine: ti: Fix refcount leak in ti_dra7_xbar_route_allocate
	dmaengine: ti: Add missing put_device in ti_dra7_xbar_route_allocate
	Linux 5.10.130

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7590fa913189edca6550e770bbb7c60c273d9461
2022-07-28 17:04:30 +02:00
Hyeonggon Yoo
2055e67bb6 mm/sl[au]b: use own bulk free function when bulk alloc failed
There is no benefit to call generic bulk free function when
kmem_cache_alloc_bulk() failed. Use own kmem_cache_free_bulk()
instead of generic function.

Note that if kmem_cache_alloc_bulk() fails to allocate first object in
SLUB, size is zero. So allow passing size == 0 to kmem_cache_free_bulk()
like SLAB's.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-20 13:30:11 +02:00
Tao Huang
24cbee6672 Merge tag 'ASB-2022-04-05_12-5.10' of https://android.googlesource.com/kernel/common
https://source.android.com/security/bulletin/2022-04-01
CVE-2021-0707
CVE-2021-39800
CVE-2021-39801 (4.9 only)
CVE-2021-39802

* tag 'ASB-2022-04-05_12-5.10': (3832 commits)
  ANDROID: GKI: Update symbols to abi_gki_aarch64_oplus
  ANDROID: vendor_hooks: Reduce pointless modversions CRC churn
  UPSTREAM: locking/lockdep: Avoid potential access of invalid memory in lock_class
  ANDROID: mm: Fix implicit declaration of function 'isolate_lru_page'
  ANDROID: GKI: Update symbols to symbol list
  ANDROID: GKI: Update symbols to symbol list
  ANDROID: GKI: Add hook symbol to symbol list
  Revert "ANDROID: dm-bow: Protect Ranges fetched and erased from the RB tree"
  ANDROID: vendor_hooks: Add hooks to for free_unref_page_commit
  ANDROID: vendor_hooks: Add hooks to for alloc_contig_range
  ANDROID: GKI: Update symbols to symbol list
  ANDROID: vendor_hooks: Add hook in shrink_node_memcgs
  ANDROID: GKI: Add symbols to symbol list
  FROMGIT: iommu/iova: Improve 32-bit free space estimate
  ANDROID: export walk_page_range and swp_swap_info
  ANDROID: vendor_hooks: export shrink_slab
  ANDROID: usb: gadget: f_accessory: add compat_ioctl support
  UPSTREAM: sr9700: sanity check for packet length
  UPSTREAM: io_uring: return back safer resurrect
  UPSTREAM: Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"
  ...

Change-Id: Ic61ead530b99b10ffd535a358a48fe9bb8c33fd4

Conflicts:
	drivers/android/Kconfig
	drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
	drivers/gpu/drm/rockchip/rockchip_vop_reg.c
	drivers/i2c/busses/i2c-rk3x.c
	drivers/media/i2c/imx258.c
	drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
	drivers/usb/dwc2/gadget.c
	drivers/usb/gadget/function/uvc.h
	lib/Kconfig.debug
2022-07-15 17:40:39 +08:00
Jann Horn
6c32496964 mm/slub: add missing TID updates on slab deactivation
commit eeaa345e12 upstream.

The fastpath in slab_alloc_node() assumes that c->slab is stable as long as
the TID stays the same. However, two places in __slab_alloc() currently
don't update the TID when deactivating the CPU slab.

If multiple operations race the right way, this could lead to an object
getting lost; or, in an even more unlikely situation, it could even lead to
an object being freed onto the wrong slab's freelist, messing up the
`inuse` counter and eventually causing a page to be freed to the page
allocator while it still contains slab objects.

(I haven't actually tested these cases though, this is just based on
looking at the code. Writing testcases for this stuff seems like it'd be
a pain...)

The race leading to state inconsistency is (all operations on the same CPU
and kmem_cache):

 - task A: begin do_slab_free():
    - read TID
    - read pcpu freelist (==NULL)
    - check `slab == c->slab` (true)
 - [PREEMPT A->B]
 - task B: begin slab_alloc_node():
    - fastpath fails (`c->freelist` is NULL)
    - enter __slab_alloc()
    - slub_get_cpu_ptr() (disables preemption)
    - enter ___slab_alloc()
    - take local_lock_irqsave()
    - read c->freelist as NULL
    - get_freelist() returns NULL
    - write `c->slab = NULL`
    - drop local_unlock_irqrestore()
    - goto new_slab
    - slub_percpu_partial() is NULL
    - get_partial() returns NULL
    - slub_put_cpu_ptr() (enables preemption)
 - [PREEMPT B->A]
 - task A: finish do_slab_free():
    - this_cpu_cmpxchg_double() succeeds()
    - [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL]

From there, the object on c->freelist will get lost if task B is allowed to
continue from here: It will proceed to the retry_load_slab label,
set c->slab, then jump to load_freelist, which clobbers c->freelist.

But if we instead continue as follows, we get worse corruption:

 - task A: run __slab_free() on object from other struct slab:
    - CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial)
 - task A: run slab_alloc_node() with NUMA node constraint:
    - fastpath fails (c->slab is NULL)
    - call __slab_alloc()
    - slub_get_cpu_ptr() (disables preemption)
    - enter ___slab_alloc()
    - c->slab is NULL: goto new_slab
    - slub_percpu_partial() is non-NULL
    - set c->slab to slub_percpu_partial(c)
    - [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects
      from slab-2]
    - goto redo
    - node_match() fails
    - goto deactivate_slab
    - existing c->freelist is passed into deactivate_slab()
    - inuse count of slab-1 is decremented to account for object from
      slab-2

At this point, the inuse count of slab-1 is 1 lower than it should be.
This means that if we free all allocated objects in slab-1 except for one,
SLUB will think that slab-1 is completely unused, and may free its page,
leading to use-after-free.

Fixes: c17dda40a6 ("slub: Separate out kmem_cache_cpu processing from deactivate_slab")
Fixes: 03e404af26 ("slub: fast release on full slab")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20220608182205.2945720-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-12 16:32:16 +02:00
Muchun Song
b77d5b1b83 mm: slab: optimize memcg_slab_free_hook()
Most callers of memcg_slab_free_hook() already know the slab,  which could
be passed to memcg_slab_free_hook() directly to reduce the overhead of an
another call of virt_to_slab().  For bulk freeing of objects, the call of
slab_objcgs() in the loop in memcg_slab_free_hook() is redundant as well.
Rework memcg_slab_free_hook() and build_detached_freelist() to reduce
those unnecessary overhead and make memcg_slab_free_hook() can handle bulk
freeing in slab_free().

Move the calling site of memcg_slab_free_hook() from do_slab_free() to
slab_free() for slub to make the code clearer since the logic is weird
(e.g. the caller need to judge whether it needs to call
memcg_slab_free_hook()). It is easy to make mistakes like missing calling
of memcg_slab_free_hook() like fixes of:

  commit d1b2cf6cb8 ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()")
  commit ae085d7f93 ("mm: kfence: fix missing objcg housekeeping for SLAB")

This optimization is mainly for bulk objects freeing.  The following numbers
is shown for 16-object freeing.

                           before      after
  kmem_cache_free_bulk:   ~430 ns     ~400 ns

The overhead is reduced by about 7% for 16-object freeing.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Link: https://lore.kernel.org/r/20220429123044.37885-1-songmuchun@bytedance.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04 17:13:05 +02:00
Vasily Averin
b347aa7b57 mm/tracing: add 'accounted' entry into output of allocation tracepoints
Slab caches marked with SLAB_ACCOUNT force accounting for every
allocation from this cache even if __GFP_ACCOUNT flag is not passed.
Unfortunately, at the moment this flag is not visible in ftrace output,
and this makes it difficult to analyze the accounted allocations.

This patch adds boolean "accounted" entry into trace output,
and set it to 'true' for calls used __GFP_ACCOUNT flag and
for allocations from caches marked with SLAB_ACCOUNT.
Set it to 'false' if accounting is disabled in configs.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04 17:11:27 +02:00
Xiongwei Song
efb9352700 mm/slub: Simplify __kmem_cache_alias()
There is no need to do anything if sysfs_slab_alias() return nonzero
value after getting a mergeable cache.

Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Link: https://lore.kernel.org/all/e5ebc952-af17-321f-5343-bc914d47c931@suse.cz/
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04 17:08:47 +02:00
Tao Huang
ff4142e509 mm: slub: Add SLUB_SYSFS
Make slub sysfs interface selectable.
Save about 4.8ms of boot time on RK1808 EVB if unselected.

Change-Id: I2587cc259c3c31a209604d99640d8e84a4ba78f4
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
2022-06-15 14:21:09 +08:00
Jann Horn
eeaa345e12 mm/slub: add missing TID updates on slab deactivation
The fastpath in slab_alloc_node() assumes that c->slab is stable as long as
the TID stays the same. However, two places in __slab_alloc() currently
don't update the TID when deactivating the CPU slab.

If multiple operations race the right way, this could lead to an object
getting lost; or, in an even more unlikely situation, it could even lead to
an object being freed onto the wrong slab's freelist, messing up the
`inuse` counter and eventually causing a page to be freed to the page
allocator while it still contains slab objects.

(I haven't actually tested these cases though, this is just based on
looking at the code. Writing testcases for this stuff seems like it'd be
a pain...)

The race leading to state inconsistency is (all operations on the same CPU
and kmem_cache):

 - task A: begin do_slab_free():
    - read TID
    - read pcpu freelist (==NULL)
    - check `slab == c->slab` (true)
 - [PREEMPT A->B]
 - task B: begin slab_alloc_node():
    - fastpath fails (`c->freelist` is NULL)
    - enter __slab_alloc()
    - slub_get_cpu_ptr() (disables preemption)
    - enter ___slab_alloc()
    - take local_lock_irqsave()
    - read c->freelist as NULL
    - get_freelist() returns NULL
    - write `c->slab = NULL`
    - drop local_unlock_irqrestore()
    - goto new_slab
    - slub_percpu_partial() is NULL
    - get_partial() returns NULL
    - slub_put_cpu_ptr() (enables preemption)
 - [PREEMPT B->A]
 - task A: finish do_slab_free():
    - this_cpu_cmpxchg_double() succeeds()
    - [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL]

From there, the object on c->freelist will get lost if task B is allowed to
continue from here: It will proceed to the retry_load_slab label,
set c->slab, then jump to load_freelist, which clobbers c->freelist.

But if we instead continue as follows, we get worse corruption:

 - task A: run __slab_free() on object from other struct slab:
    - CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial)
 - task A: run slab_alloc_node() with NUMA node constraint:
    - fastpath fails (c->slab is NULL)
    - call __slab_alloc()
    - slub_get_cpu_ptr() (disables preemption)
    - enter ___slab_alloc()
    - c->slab is NULL: goto new_slab
    - slub_percpu_partial() is non-NULL
    - set c->slab to slub_percpu_partial(c)
    - [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects
      from slab-2]
    - goto redo
    - node_match() fails
    - goto deactivate_slab
    - existing c->freelist is passed into deactivate_slab()
    - inuse count of slab-1 is decremented to account for object from
      slab-2

At this point, the inuse count of slab-1 is 1 lower than it should be.
This means that if we free all allocated objects in slab-1 except for one,
SLUB will think that slab-1 is completely unused, and may free its page,
leading to use-after-free.

Fixes: c17dda40a6 ("slub: Separate out kmem_cache_cpu processing from deactivate_slab")
Fixes: 03e404af26 ("slub: fast release on full slab")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20220608182205.2945720-1-jannh@google.com
2022-06-13 17:41:36 +02:00
Sebastian Andrzej Siewior
c4cf678559 mm/slub: Move the stackdepot related allocation out of IRQ-off section.
The set_track() invocation in free_debug_processing() is invoked with
acquired slab_lock(). The lock disables interrupts on PREEMPT_RT and
this forbids to allocate memory which is done in stack_depot_save().

Split set_track() into two parts: set_track_prepare() which allocate
memory and set_track_update() which only performs the assignment of the
trace data structure. Use set_track_prepare() before disabling
interrupts.

[ vbabka@suse.cz: make set_track() call set_track_update() instead of
  open-coded assignments ]

Fixes: 5cf909c553 ("mm/slub: use stackdepot to save stack trace in objects")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/Yp9sqoUi4fVa5ExF@linutronix.de
2022-06-13 17:26:20 +02:00
Linus Torvalds
2e17ce1106 slab changes for 5.19
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmKLUYoACgkQ4CHKc/GJ
 qRCMFwf/Tm1cf2JLUANrT58rjkrrj15EtKhnJdm5/yvmsWKps7WKPP4jeUHe+NTO
 NovAGt67lG1l6LMLczZkWckOkWlyYjC42CPDLdxRUkk+zQRb3nRA8Nbt6VTNBOfQ
 0wTLOqXgsNXdSPSVUsKGL8kIAHNQTMX+7TjO6s7CXy/5Qag6r1iZX2HZxASOHxLa
 yYzaJ9pJRZBAMGnzV6L6v0J8KPnjYO0fB68S1qYQTbhoRxchtFF+0AIr1JydGgBI
 9RFUowTrSpJkZtcSjabopvZz4JfCRDP+eAxkyw13feji7MG1FMX74HgDdw+HhzTv
 R2/6iA5WcsmzcXopsfMx8lUP/KIfPw==
 =gnSc
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - Conversion of slub_debug stack traces to stackdepot, allowing more
   useful debugfs-based inspection for e.g. memory leak debugging.
   Allocation and free debugfs info now includes full traces and is
   sorted by the unique trace frequency.

   The stackdepot conversion was already attempted last year but
   reverted by ae14c63a9f. The memory overhead (while not actually
   enabled on boot) has been meanwhile solved by making the large
   stackdepot allocation dynamic. The xfstest issues haven't been
   reproduced on current kernel locally nor in -next, so the slab cache
   layout changes that originally made that bug manifest were probably
   not the root cause.

 - Refactoring of dma-kmalloc caches creation.

 - Trivial cleanups such as removal of unused parameters, fixes and
   clarifications of comments.

 - Hyeonggon Yoo joins as a reviewer.

* tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  MAINTAINERS: add myself as reviewer for slab
  mm/slub: remove unused kmem_cache_order_objects max
  mm: slab: fix comment for __assume_kmalloc_alignment
  mm: slab: fix comment for ARCH_KMALLOC_MINALIGN
  mm/slub: remove unneeded return value of slab_pad_check
  mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache()
  mm/slub: remove meaningless node check in ___slab_alloc()
  mm/slub: remove duplicate flag in allocate_slab()
  mm/slub: remove unused parameter in setup_object*()
  mm/slab.c: fix comments
  slab, documentation: add description of debugfs files for SLUB caches
  mm/slub: sort debugfs output by frequency of stack traces
  mm/slub: distinguish and print stack traces in debugfs files
  mm/slub: use stackdepot to save stack trace in objects
  mm/slub: move struct track init out of set_track()
  lib/stackdepot: allow requesting early initialization dynamically
  mm/slub, kunit: Make slub_kunit unaffected by user specified flags
  mm/slab: remove some unused functions
2022-05-25 10:24:04 -07:00
Vlastimil Babka
e001897da6 Merge branches 'slab/for-5.19/stackdepot' and 'slab/for-5.19/refactor' into slab/for-linus 2022-05-23 11:14:32 +02:00