net/mlx5: Fix devlink reload LOCKDEP warning

The attempt to reload the driver through devlink reload interface
causes the following LOCKDEP warning. It is caused due to intf_state
mutex that is supposed to protect from concurrent load/unload from
different submodules (FW tracer, health e.t.c.).

As a way to fix it, convert the mlx5_core to use already existing device
lock that is taken for any bus operations (probe, resume e.t.c) anyway.
Such change allows not only properly protect any load/unload operations,
but also close races with driver removal and PCI recovery flow.

[leonro@vm ~]$ lspci |grep nox
00:09.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
[leonro@vm ~]$ sudo devlink dev reload pci/0000:00:09.0

 ======================================================
 WARNING: possible circular locking dependency detected
 5.10.0-rc1+ #2357 Not tainted
 ------------------------------------------------------
 devlink/293 is trying to acquire lock:
 ffff88801046cb58 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core]
 but task is already holding lock:
 ffffffff84135148 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x500
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:

 -> #1 (devlink_mutex){+.+.}-{3:3}:
        __mutex_lock+0x138/0x1290
        devlink_register+0x72/0x140
        mlx5_devlink_register+0x63/0x240 [mlx5_core]
        init_one+0xb72/0xfe0 [mlx5_core]
        pci_device_probe+0x2a0/0x4a0
        really_probe+0x20a/0xc10
        driver_probe_device+0xd8/0x380
        device_driver_attach+0x1df/0x250
        __driver_attach+0xff/0x240
        bus_for_each_dev+0x11e/0x1a0
        bus_add_driver+0x309/0x570
        driver_register+0x1ee/0x380
        bpf_prog_5a288f6d419c5fa6+0x2a/0xfc8
        do_one_initcall+0xd5/0x410
        do_init_module+0x1c8/0x750
        load_module+0x6cbd/0x9230
        __do_sys_finit_module+0x118/0x1b0
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
        __lock_acquire+0x29f2/0x5a10
        lock_acquire+0x1ac/0x830
        __mutex_lock+0x138/0x1290
        mlx5_unload_one+0x1e/0xa0 [mlx5_core]
        mlx5_devlink_reload_down+0x15d/0x220 [mlx5_core]
        devlink_reload+0x13d/0x4d0
        devlink_nl_cmd_reload+0x666/0xf50
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        genl_rcv_msg+0x27f/0x4a0
        netlink_rcv_skb+0x11d/0x340
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        netlink_sendmsg+0x6f1/0xbd0
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x193/0x240
        __x64_sys_sendto+0xdd/0x1b0
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xa9

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(devlink_mutex);
                                lock(&dev->intf_state_mutex);
                                lock(devlink_mutex);
   lock(&dev->intf_state_mutex);

  *** DEADLOCK ***

 3 locks held by devlink/293:
  #0: ffffffff84144310 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
  #1: ffffffff841443c8 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0
  #2: ffffffff84135148 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x500

 stack backtrace:
 CPU: 0 PID: 293 Comm: devlink Not tainted 5.10.0-rc1+ #2357
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  dump_stack+0x9a/0xcc
  check_noncircular+0x25f/0x2e0
  ? print_circular_bug+0x310/0x310
  ? __bfs+0x27a/0x620
  ? usage_match+0x100/0x100
  ? __lock_acquire+0xbc0/0x5a10
  __lock_acquire+0x29f2/0x5a10
  ? __bfs+0x27a/0x620
  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
  ? print_shortest_lock_dependencies+0x80/0x80
  ? mark_lock+0xf5/0x2220
  lock_acquire+0x1ac/0x830
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? lock_release+0x6c0/0x6c0
  ? is_bpf_text_address+0x66/0xf0
  ? lock_downgrade+0x6d0/0x6d0
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  __mutex_lock+0x138/0x1290
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? __lock_acquire+0x36e5/0x5a10
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? mutex_lock_io_nested+0x1130/0x1130
  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
  ? stack_trace_save+0x91/0xc0
  ? stack_trace_consume_entry+0x160/0x160
  mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  mlx5_devlink_reload_down+0x15d/0x220 [mlx5_core]
  ? mlx5_devlink_info_get+0x1c0/0x1c0 [mlx5_core]
  ? genl_rcv+0x24/0x40
  ? netlink_unicast+0x433/0x700
  ? netlink_sendmsg+0x6f1/0xbd0
  ? sock_sendmsg+0xb0/0xe0
  ? __sys_sendto+0x193/0x240
  ? __x64_sys_sendto+0xdd/0x1b0
  ? do_syscall_64+0x2d/0x40
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  devlink_reload+0x13d/0x4d0
  ? devlink_port_param_value_changed+0x190/0x190
  ? mutex_lock_io_nested+0x1130/0x1130
  devlink_nl_cmd_reload+0x666/0xf50
  ? devlink_reload+0x4d0/0x4d0
  ? devlink_get_from_attrs+0xd9/0x290
  ? devlink_nl_pre_doit+0x72/0x500
  genl_family_rcv_msg_doit+0x1e9/0x2f0
  ? mutex_lock_io_nested+0x1130/0x1130
  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x230/0x230
  ? security_capable+0x51/0x90
  genl_rcv_msg+0x27f/0x4a0
  ? genl_get_cmd+0x3c0/0x3c0
  ? lock_acquire+0x1ac/0x830
  ? devlink_reload+0x4d0/0x4d0
  ? lock_release+0x6c0/0x6c0
  netlink_rcv_skb+0x11d/0x340
  ? genl_get_cmd+0x3c0/0x3c0
  ? netlink_ack+0x9f0/0x9f0
  genl_rcv+0x24/0x40
  netlink_unicast+0x433/0x700
  ? netlink_attachskb+0x6f0/0x6f0
  ? __alloc_skb+0x32a/0x530
  netlink_sendmsg+0x6f1/0xbd0
  ? netlink_unicast+0x700/0x700
  ? find_held_lock+0x2d/0x110
  ? netlink_unicast+0x700/0x700
  sock_sendmsg+0xb0/0xe0
  __sys_sendto+0x193/0x240
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_raw_spin_unlock+0x54/0x220
  ? __up_read+0x1a1/0x7b0
  __x64_sys_sendto+0xdd/0x1b0
  ? syscall_enter_from_user_mode+0x1d/0x50
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f8db67d00fa
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
 RSP: 002b:00007ffc000d8aa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8db67d00fa
 RDX: 0000000000000030 RSI: 000055a31789b430 RDI: 0000000000000003
 RBP: 000055a31789b400 R08: 00007f8db689d200 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
6 files changed