skynet异常退出
最近公司使用的skynet项目,运行几天后发生了莫名退出,没有任何报错日志的情况,
查询/
var/log/message
日志看到了这样的信息
Aug 14 20:54:11 iZbp10rgpa3x4d3bdsy28nZ kernel: skynet[12200]: segfault at 7f4d60e52000 ip 00007f4d60a36794 sp 00007f4d6c5f60b8 error 4 in libcrypto.so.1.0.2k[7f4d609be000+237000]
技术有限,百度了大概意思是内存越界?
GDB看一下具体位置
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff11fe700 (LWP 12059)]
0x00007fffe522f794 in sha1_block_data_order_shaext () from /lib64/libcrypto.so.10
Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64 libcom_err-1.42.9-19.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libselinux-2.5-15.el7.x86_64 openssl-libs-1.0.2k-24.el7_9.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-19.el7_9.x86_64
(gdb)
(gdb) where
\#0 0x00007fffe522f794 in sha1_block_data_order_shaext () from /lib64/libcrypto.so.10
\#1 0x00007fffe522d50f in SHA1_Update () from /lib64/libcrypto.so.10
\#2 0x00007fffe52de537 in ssleay_rand_add.part.0 () from /lib64/libcrypto.so.10
\#3 0x00007fffe67a77e0 in ssl3_accept () from /lib64/libssl.so.10
\#4 0x00007ffff7fbbee8 in _ltls_context_handshake (L=0x7fffdd828f08) at lualib-src/ltls.c:180
\#5 0x0000000000425ce6 in precallC (f=0x7ffff7fbbe70 <_ltls_context_handshake>, nresults=1, func=<optimized out>, L=0x7fffdd828f08) at ldo.c:506
\#6 luaD_precall (L=L@entry=0x7fffdd828f08, func=<optimized out>, func@entry=0x7fffc00b4df0, nresults=1) at ldo.c:572
\#7 0x0000000000433830 in luaV_execute (L=L@entry=0x7fffdd828f08, ci=<optimized out>, ci@entry=0x7fffec111040) at lvm.c:1638
\#8 0x00000000004258ca in unroll (L=0x7fffdd828f08, ud=<optimized out>) at ldo.c:717
\#9 0x000000000042502a in luaD_rawrunprotected (L=L@entry=0x7fffdd828f08, f=f@entry=0x425ee0 <resume>, ud=ud@entry=0x7ffff11fcd9c) at ldo.c:144
\#10 0x0000000000426185 in lua_resume (L=L@entry=0x7fffdd828f08, from=from@entry=0x7fffc5b70808, nargs=<optimized out>, nargs@entry=4, nresults=nresults@entry=0x7ffff11fcddc) at ldo.c:822
\#11 0x00007ffff7fe99ed in lua_resumeX (nresults=0x7ffff11fcddc, nargs=4, from=0x7fffc5b70808, L=0x7fffdd828f08) at service-src/service_snlua.c:90
\#12 auxresume (narg=4, co=0x7fffdd828f08, L=0x7fffc5b70808) at service-src/service_snlua.c:146
\#13 timing_resume (L=L@entry=0x7fffc5b70808, co_index=co_index@entry=1, n=4) at service-src/service_snlua.c:198
\#14 0x00007ffff7fe9ed0 in luaB_coresume (L=0x7fffc5b70808) at service-src/service_snlua.c:217
\#15 0x0000000000425a3b in precallC (f=0x7ffff7fe9ea0 <luaB_coresume>, nresults=-1, func=0x7fffb04b5650, L=0x7fffc5b70808) at ldo.c:506
\#16 luaD_pretailcall (L=L@entry=0x7fffc5b70808, ci=ci@entry=0x7fffdd8f2ec0, func=<optimized out>, func@entry=0x7fffb04b5650, narg1=<optimized out>, delta=delta@entry=6) at ldo.c:527
\#17 0x00000000004337cb in luaV_execute (L=L@entry=0x7fffc5b70808, ci=<optimized out>) at lvm.c:1662
\#18 0x0000000000426080 in ccall (inc=65537, nResults=<optimized out>, func=<optimized out>, L=0x7fffc5b70808) at ldo.c:609
\#19 luaD_callnoyield (L=0x7fffc5b70808, func=<optimized out>, nResults=<optimized out>) at ldo.c:627
\#20 0x000000000042502a in luaD_rawrunprotected (L=L@entry=0x7fffc5b70808, f=f@entry=0x421760 <f_call>, ud=ud@entry=0x7ffff11fd0d0) at ldo.c:144
\#21 0x000000000042636e in luaD_pcall (L=L@entry=0x7fffc5b70808, func=func@entry=0x421760 <f_call>, u=u@entry=0x7ffff11fd0d0, old_top=192, ef=<optimized out>) at ldo.c:926
\#22 0x0000000000422eb8 in lua_pcallk (L=L@entry=0x7fffc5b70808, nargs=<optimized out>, nresults=nresults@entry=-1, errfunc=errfunc@entry=0, ctx=ctx@entry=0, k=k@entry=0x43d6e0 <finishpcall>) at lapi.c:1069
\#23 0x000000000043d780 in luaB_pcall (L=0x7fffc5b70808) at lbaselib.c:477
\#24 0x0000000000425ce6 in precallC (f=0x43d730 <luaB_pcall>, nresults=2, func=<optimized out>, L=0x7fffc5b70808) at ldo.c:506
\#25 luaD_precall (L=L@entry=0x7fffc5b70808, func=<optimized out>, func@entry=0x7fffb04b54a0, nresults=2) at ldo.c:572
\#26 0x0000000000433830 in luaV_execute (L=L@entry=0x7fffc5b70808, ci=<optimized out>) at lvm.c:1638
\#27 0x0000000000426080 in ccall (inc=65537, nResults=<optimized out>, func=<optimized out>, L=0x7fffc5b70808) at ldo.c:609
\#28 luaD_callnoyield (L=0x7fffc5b70808, func=<optimized out>, nResults=<optimized out>) at ldo.c:627
\#29 0x000000000042502a in luaD_rawrunprotected (L=L@entry=0x7fffc5b70808, f=f@entry=0x421760 <f_call>, ud=ud@entry=0x7ffff11fd390) at ldo.c:144
\#30 0x000000000042636e in luaD_pcall (L=L@entry=0x7fffc5b70808, func=func@entry=0x421760 <f_call>, u=u@entry=0x7ffff11fd390, old_top=48, ef=<optimized out>) at ldo.c:926
\#31 0x0000000000422eb8 in lua_pcallk (L=L@entry=0x7fffc5b70808, nargs=nargs@entry=5, nresults=nresults@entry=0, errfunc=errfunc@entry=1, ctx=ctx@entry=0, k=k@entry=0x0) at lapi.c:1069
\#32 0x00007ffff7fcb158 in _cb (context=0x7fffc5aa6000, ud=0x7fffc5b70808, type=6, session=0, source=0, msg=0x7fffdcbc2460, sz=24) at lualib-src/lua-skynet.c:75
\#33 0x0000000000419b06 in dispatch_message (ctx=ctx@entry=0x7fffc5aa6000, msg=msg@entry=0x7ffff11fd450) at skynet-src/skynet_server.c:276
\#34 0x000000000041a6cc in skynet_context_message_dispatch (sm=sm@entry=0x7ffff6a08120, q=0x7fffc5be5180, weight=weight@entry=0) at skynet-src/skynet_server.c:336
\#35 0x000000000041ae9b in thread_worker (p=<optimized out>) at skynet-src/skynet_start.c:163
\#36 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
\#37 0x00007ffff6fcbb0d in clone () from /lib64/libc.so.6
(gdb)
跟skynet源码稍有改动,在涵曦大佬的群里,好多大哥帮我找原因,大概是怀疑
ltls.c:180
这里的ssl导致的问题, 并给出了官方漏洞提醒
甚至还跟cpu型号有关?
因为这个函数需要cpu支持,恰好我服务器cpu还真是这个型号,那暂且就信了吧哈哈~
解决方案
升级openssl和ssh 并且skynet中Makefile需要指定openssl版本重新编译,
最后稳定运行再也没自动退出过了,赞
ps:使用ubuntu20版本自带openssl1.1