0:044> !tp logStart: 1 logSize: 200 CPU utilization: 88 % Worker Thread: Total: 8 Running: 4 Idle: 4 MaxLimit: 1023 MinLimit: 4 Work Request in Queue: 0 -------------------------------------- Number of Timers: 2 -------------------------------------- Completion Port Thread:Total: 2 Free: 2 MaxFree: 8 CurrentLimit: 2 MaxLimit: 1000 MinLimit: 4从卦中数据看当前cpu确实达到了 88%,接下来我们观察下这个程序的机器cpu是否给力,可以用 !cpuid 观察。
0:044> !cpuid CP F/M/S Manufacturer MHz 0 6,94,3 GenuineIntel 3192 1 6,94,3 GenuineIntel 3192 2 6,94,3 GenuineIntel 3192 3 6,94,3 GenuineIntel 3192从卦中看,尼玛也就4core,有点弱哈,好歹也是一个高利润的药厂,这么抠门哈。
0:044> !t ThreadCount: 451 UnstartedThread: 0 BackgroundThread: 443 PendingThread: 0 DeadThread: 1 Hosted Runtime: no Lock DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 0 1 22b8 04CE8728 26020 Preemptive 18E5C92C:18E5E4DC 04c86c20 -00001 STA 3 2 17c8 04B25768 2b220 Preemptive 18CAF3A0:18CB1374 04c86c20 -00001 MTA (Finalizer) 4 4 238c 04C0CDD8 202b020 Preemptive 18E45D88:18E464DC 04c86c20 -00001 MTA 5 5 230c 0A6C37A0 202b020 Preemptive 18DAC318:18DAC47C 04c86c20 -00001 MTA 6 6 23a0 0A70E620 202b220 Preemptive 00000000:00000000 04c86c20 -00001 MTA ...从卦中数据看,当前有 451 个线程,其中后台线程是 443 个,再结合刚才的 !tp 看到的线程池线程也才 8 个,这就说明这个程序中有 400+ 的线程是直接通过 new Thread 创建的,这个信息就比较可疑了,为啥不用线程池用 Thread ,有蹊跷。接下来的思路就是使用 ~*e !clrstack 命令观察下每个线程此时都在做什么,命令一输入,刷了好久。
0:044> ~*e !clrstack ... OS Thread Id: 0x220c (18) Child SP IP Call Site 184CF614 77dd19dc [HelperMethodFrame: 184cf614] System.Threading.Thread.SleepInternal(Int32) 184CF680 141975f4 System.Threading.Thread.Sleep(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 357] 184CF694 165055b9 xxx.ActionThread`1[[xxx]].Loop() 184CF878 74467741 System.Threading.Thread+StartHelper.Callback(System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 42] 184CF888 7446fca1 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 183] 184CF8C0 74466742 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105] 184CFA14 74cbc29f [DebuggerU2MCatchHandlerFrame: 184cfa14] ...在卦中的各个线程栈上也没有看到什么特别明显的业务函数,大多都是停在 Thread.SleepInternal 上进行等待,这就让我陷入了迷茫。
0:047> !clrstack OS Thread Id: 0xad8 (47) Child SP IP Call Site 20B5F434 77dd19dc [HelperMethodFrame: 20b5f434] System.Threading.Thread.SleepInternal(Int32) 20B5F4A0 141975f4 System.Threading.Thread.Sleep(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 357] 20B5F4B4 1f123c71 xxx.ActionThread`1[[xxx].Loop() 20B5F698 74467741 System.Threading.Thread+StartHelper.Callback(System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 42] 20B5F6A8 1baab7da System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 183] 20B5F6E0 74466742 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105] 20B5F834 74cbc29f [DebuggerU2MCatchHandlerFrame: 20b5f834] 0:047> !U /d 1f123c71 Normal JIT generated code xxx.ActionThread`1[xxx].Loop() ilAddr is 0A324040 pImport is 08AD6468 Begin 1F123C10, size abd 1f123c10 55 push ebp 1f123c11 8bec mov ebp,esp 1f123c13 57 push edi 1f123c14 56 push esi 1f123c15 81ecd4010000 sub esp,1D4h 1f123c1b c5f877 vzeroupper 1f123c1e c5d857e4 vxorps xmm4,xmm4,xmm4 1f123c22 c5fa7fa524feffff vmovdqu xmmword ptr [ebp-1DCh],xmm4 1f123c2a c5fa7fa534feffff vmovdqu xmmword ptr [ebp-1CCh],xmm4 1f123c32 b850feffff mov eax,0FFFFFE50h 1f123c37 c5fa7f6405f4 vmovdqu xmmword ptr [ebp+eax-0Ch],xmm4 1f123c3d c5fa7f640504 vmovdqu xmmword ptr [ebp+eax+4],xmm4 1f123c43 c5fa7f640514 vmovdqu xmmword ptr [ebp+eax+14h],xmm4 1f123c49 83c030 add eax,30h ... 1f123c5a e84115cc55 call coreclr!JIT_DbgIsJustMyCode (74de51a0) 1f123c5f 90 nop 1f123c60 90 nop 1f123c61 e9300a0000 jmp xxx.ActionThread<xxx>.Loop+0xa86 (1f124696) 1f123c66 90 nop 1f123c67 b901000000 mov ecx,1 1f123c6c e87f54eaea call 09fc90f0 (System.Threading.Thread.Sleep(Int32), mdToken: 06002D01) >>> 1f123c71 90 nop ...通过卦中的 >>> 可以确认很多的方法都是在 while (!base.IsTerminated) 中进行空转,如果 Sleep(1) 的线程比较少那可能没什么问题,但也扛不住400多线程一起玩哈,最后高频的上下文切换导致的 CPU 爆高。在 Sleep(1) 内部会涉及到CPU的等待队列,就绪队列,以及定时器 _KTIMER 内核对象, 因为 Windows 源码不公开,内部还是比较搞的,可以用 !pcr 命令观察下 cpu的背包。
lkd> !pcr 0 KPCR for Processor 0 at fffff8058023c000: Major 1 Minor 1 NtTib.ExceptionList: fffff80589089fb0 NtTib.StackBase: fffff80589088000 NtTib.StackLimit: 000000137e1fa158 NtTib.SubSystemTib: fffff8058023c000 NtTib.Version: 000000008023c180 NtTib.UserPointer: fffff8058023c870 NtTib.SelfTib: 000000137dfe0000 SelfPcr: 0000000000000000 Prcb: fffff8058023c180 Irql: 0000000000000000 ... CurrentThread: ffff910c66906080 NextThread: 0000000000000000 IdleThread: fffff80583d27a00 DpcQueue: lkd> dt nt!_KPRCB fffff8058023c180 +0x008 CurrentThread : 0xffff910c`66906080 _KTHREAD +0x010 NextThread : (null) +0x018 IdleThread : 0xfffff805`83d27a00 _KTHREAD ... +0x7c00 WaitListHead : _LIST_ENTRY [ 0xffff910c`5ec30158 - 0xffff910c`628b1158 ] +0x7c80 DispatcherReadyListHead : [32] _LIST_ENTRY [ 0xfffff805`80243e00 - 0xfffff805`80243e00 ]上面的[32]就是等待线程的32个优先级的数组队列。
2.尽量将 1 -> 50 来缓解,当然越大越好。