• Linux 服务器频繁自动重启, 请大佬看下是什么原因.
  • 发布于 1天前
  • 25 热度
    6 评论
  • 小熊
  • 1 粉丝 43 篇博客
  •   
这个是/var/crash/{time}/vmcore-dmesg.txt 其中一个的部分日志,
其它的都是差不多的.
(小公司没有运维, 目前出现这个问题完全没思路了, 是公司买的服务器托管的机房.)
谢谢大家了.
[20710.172887] sd 0:0:0:0: [sda] task abort on host 0, ffff8a3105d44380
[20728.675450] NMI watchdog: Watchdog detected hard LOCKUP on cpu 16
[20728.675479] Modules linked in:
[20728.675483]  ppdev vmw_balloon crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg joydev pcspkr parport_pc parport vmw_vmci i2c_piix4 ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper sd_mod crc_t10dif crct10dif_generic syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm ata_piix libahci nfit crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw libnvdimm vmxnet3 vmw_pvscsi drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
[20728.675517] CPU: 16 PID: 4868 Comm: kworker/u64:0 Kdump: loaded Not tainted 3.10.0-1160.108.1.el7.x86_64 #1
[20728.675519] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[20728.675529] Workqueue: writeback bdi_writeback_workfn (flush-253:2)
[20728.675531] task: ffff8a30edd08000 ti: ffff8a2f5a130000 task.ti: ffff8a2f5a130000
[20728.675532] RIP: 0010:[<ffffffff8c91eb3d>]  [<ffffffff8c91eb3d>] native_queued_spin_lock_slowpath+0x1d/0x200
[20728.675538] RSP: 0018:ffff8a2f5a133728  EFLAGS: 00000093
[20728.675539] RAX: 0000000000000001 RBX: 0000000000000286 RCX: 0000000000000001
[20728.675540] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8a310294f050
[20728.675541] RBP: ffff8a2f5a133728 R08: 0000000000000010 R09: 0000000000000000
[20728.675542] R10: ffff8a31014ab800 R11: ffff8a3105d45180 R12: ffff8a3105d45180
[20728.675543] R13: ffff8a310294f000 R14: ffff8a310294f000 R15: ffff8a3105d45180
[20728.675545] FS:  0000000000000000(0000) GS:ffff8a3107200000(0000) knlGS:0000000000000000
[20728.675546] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20728.675547] CR2: 00007f31628d3a98 CR3: 0000003d2d862000 CR4: 0000000000340fe0
[20728.675565] Call Trace:
[20728.675572]  [<ffffffff8cfac21a>] queued_spin_lock_slowpath+0xb/0x13
[20728.675576]  [<ffffffff8cfba7bb>] _raw_spin_lock_irqsave+0x3b/0x40
[20728.675581]  [<ffffffffc00bf225>] pvscsi_queue+0x25/0x920 [vmw_pvscsi]
[20728.675585]  [<ffffffffc018004f>] ? sd_init_command+0x2f/0xc0 [sd_mod]
[20728.675589]  [<ffffffff8cd0ac39>] ? scsi_setup_cmnd+0x119/0x1d0
[20728.675590]  [<ffffffff8cd0adcb>] ? scsi_prep_fn+0xdb/0x190
[20728.675593]  [<ffffffff8cd03160>] scsi_dispatch_cmd+0xb0/0x250
[20728.675595]  [<ffffffff8cd0cd0c>] scsi_request_fn+0x4ac/0x680
[20728.675599]  [<ffffffff8cb8e1de>] ? deadline_add_request+0x6e/0x90
[20728.675603]  [<ffffffff8cb68799>] __blk_run_queue+0x39/0x50
[20728.675606]  [<ffffffff8cb6c6c3>] blk_queue_bio+0x3b3/0x400
[20728.675608]  [<ffffffff8cb6a4f7>] generic_make_request+0x157/0x390
[20728.675612]  [<ffffffff8ca9b952>] ? bvec_alloc+0x92/0x120
[20728.675613]  [<ffffffff8cb6a7a0>] submit_bio+0x70/0x150
[20728.675615]  [<ffffffff8ca9bbf3>] ? bio_alloc_bioset+0x213/0x310
[20728.675651]  [<ffffffffc034f255>] xfs_add_to_ioend+0x145/0x1d0 [xfs]
[20728.675662]  [<ffffffffc034f8a5>] xfs_do_writepage+0x1e5/0x550 [xfs]
[20728.675666]  [<ffffffff8c9d5a0c>] write_cache_pages+0x21c/0x480
[20728.675676]  [<ffffffffc034f6c0>] ? xfs_vm_writepages+0xb0/0xb0 [xfs]
[20728.675686]  [<ffffffffc034f67b>] xfs_vm_writepages+0x6b/0xb0 [xfs]
[20728.675688]  [<ffffffff8c9d6ab1>] do_writepages+0x21/0x50
[20728.675691]  [<ffffffff8ca8ccf0>] __writeback_single_inode+0x40/0x260
[20728.675693]  [<ffffffff8ca8d8bc>] writeback_sb_inodes+0x1cc/0x430
[20728.675695]  [<ffffffff8ca8dbc7>] __writeback_inodes_wb+0xa7/0xe0
[20728.675697]  [<ffffffff8ca8e0a3>] wb_writeback+0x263/0x2f0
[20728.675699]  [<ffffffff8c9d61f0>] ? bdi_dirty_limit+0x40/0xe0
[20728.675701]  [<ffffffff8ca8ebac>] bdi_writeback_workfn+0x1cc/0x470
[20728.675704]  [<ffffffff8c8c32ef>] process_one_work+0x17f/0x440
[20728.675706]  [<ffffffff8c8c4436>] worker_thread+0x126/0x3c0
[20728.675708]  [<ffffffff8c8c4310>] ? manage_workers.isra.26+0x2b0/0x2b0
[20728.675711]  [<ffffffff8c8cb621>] kthread+0xd1/0xe0
[20728.675712]  [<ffffffff8c8cb550>] ? insert_kthread_work+0x40/0x40
[20728.675716]  [<ffffffff8cfc51f7>] ret_from_fork_nospec_begin+0x21/0x21
[20728.675717]  [<ffffffff8c8cb550>] ? insert_kthread_work+0x40/0x40
[20728.675718] Code: 47 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 66 90 b9 01 00 00 00 8b 17 85 d2 74 0d 83 fa 03 74 08 f3 90 <8b> 17 85 d2 75 f3 89 d0 f0 0f b1 0f 39 c2 75 e3 5d 66 90 c3 cc 
[20728.675736] Kernel panic - not syncing: Hard LOCKUP
[20728.675750] CPU: 16 PID: 4868 Comm: kworker/u64:0 Kdump: loaded Not tainted 3.10.0-1160.108.1.el7.x86_64 #1
[20728.675776] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[20728.675806] Workqueue: writeback bdi_writeback_workfn (flush-253:2)
[20728.675824] Call Trace:
[20728.675834]  <NMI>  [<ffffffff8cfb1bec>] dump_stack+0x19/0x1f
[20728.675854]  [<ffffffff8cfab708>] panic+0xe8/0x21f
[20728.675871]  [<ffffffff8c830a78>] ? show_regs+0x58/0x290
[20728.675888]  [<ffffffff8c89f523>] nmi_panic+0x43/0x50
用户评论
  • 诗人诗意
  • 你这个是 watchdog 报的硬死锁 Hard LOCKUP ,说明系统没办法响应中断了。
    你这个情况发生的地点就是文件缓存回写最后调用到的块设备层,最后调用到的 pvscsi_queue 的 spin_lock_irqsave(shost->host_lock, irq_flags)这句,这句中会关闭中断,并且得不到锁太长时间最后 watchdog 报的错。
    这种情况我认为很大概率就是被虚拟化出的设备太多了,导致单一租户的吞吐率降低,让其余用到了这个锁的代码发出硬件命令的时间变长导致锁迟迟无法释放,最后导致的这个问题。
    反正治标不治本的方法就是关闭 watchdog 了。
  • 2025/8/3 12:45:00 [ 0 ] [ 0 ] 回复
  • 心碎
  • 你确定这是托管的服务器?日志显示是虚拟机啊:Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
    报的错是 NMI watchdog: Watchdog detected hard LOCKUP on cpu 16, 看起来像是虚拟机超卖了

  • 2025/8/3 12:36:00 [ 0 ] [ 0 ] 回复