Up to Main Index Up to Journal for March, 2019 JOURNAL FOR SATURDAY 9TH MARCH, 2019 ______________________________________________________________________________ SUBJECT: Random segmentation faults on Raspberry Pi 3 DATE: Sat 9 Mar 16:44:02 GMT 2019 A few people have asked me if I’ve forgotten about posting some performance testing figures for the new connection quotas on the Raspberry Pi. I haven’t, but there have been some issues… I did some tests and wasn’t happy with the results and so wanted to do some CPU profiling. However on the Raspberry Pi 3 I started seeing random segmentation faults when CPU profiling was turned on. I managed to boil everything down in to a simple test program: package main import ( "log" "net" "os" "runtime/pprof" ) // Test from shell using: // // seq 1000000 | xargs -n1 netcat -zn 127.0.0.1 4001 // func main() { debug := true if debug { f, err := os.Create("/tmp/cpuprof") if err != nil { log.Fatal(err) } if err = pprof.StartCPUProfile(f); err != nil { log.Fatal(err) } } addr, err := net.ResolveTCPAddr("tcp", "127.0.0.1:4001") if err != nil { log.Fatal(err) } l, err := net.ListenTCP("tcp", addr) if err != nil { log.Fatal(err) } log.Print("Listening...") for x := 1; x < 1000001; x++ { conn, err := l.AcceptTCP() if err != nil { log.Fatal(err) } err = conn.Close() if err != nil { log.Fatal(err) } if x%1000 == 0 { log.Printf("%d connections...", x) } } if debug { pprof.StopCPUProfile() } log.Printf("Finished") } This simple program waits for 1,000,000 connections and just closes them when they arrive. On Amd64 I see what I expect to see: 2019/03/09 12:45:26 Listening... 2019/03/09 12:45:30 1000 connections... 2019/03/09 12:45:30 2000 connections... : : : 2019/03/09 12:51:10 999000 connections... 2019/03/09 12:51:11 1000000 connections... 2019/03/09 12:51:11 Finished However on the Raspberry Pi 3 I see: 2019/03/09 12:46:30 Listening... 2019/03/09 12:46:35 1000 connections... 2019/03/09 12:46:38 2000 connections... : : : 2019/03/09 12:47:06 14000 connections... 2019/03/09 12:47:08 15000 connections... Segmentation fault (core dumped) If I set debug to false on the Raspberry Pi 3: 2019/03/09 12:58:06 Listening... 2019/03/09 12:58:13 1000 connections... 2019/03/09 12:58:15 2000 connections... : : : 2019/03/09 13:43:58 999000 connections... 2019/03/09 13:44:00 1000000 connections... 2019/03/09 13:44:00 Finished Interestingly it works fine, but slowly, on a Raspberry Pi Zero W, with CPU profiling turned on. The actual number of connections before it fails is random. However it always fails in the same place. Looking at the core file with gdb I see: Core was generated by `./cpuprofile'. Program terminated with signal SIGSEGV, Segmentation fault. #0 runtime.sigtrampgo (ctx=0x2009d10, info=0x2009c90, sig=<optimized out>) at /home/rolfea/go1.11.1/src/runtime/signal_unix.go:307 307 if sp < g.m.gsignal.stack.lo || sp >= g.m.gsignal.stack.hi { [Current thread is 1 (LWP 16249)] Loading Go Runtime support. 3(gdb) bt #0 runtime.sigtrampgo (ctx=0x2009d10, info=0x2009c90, sig=<optimized out>) at /home/rolfea/go1.11.1/src/runtime/signal_unix.go:307 #1 0x00064904 in runtime.sigtramp () at /home/rolfea/go1.11.1/src/runtime/sys_linux_arm.s:445 #2 0x7ea99408 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Running with strace I see: accept4(5, 0xc63de0, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(4, [{EPOLLIN, {u32=1724452800, u64=1724452800}}], 128, 0) = 1 accept4(5, {sa_family=AF_INET, sin_port=htons(34640), sin_addr=inet_addr("127.0.0.1")}, [112->16], SOCK_CLOEXEC|SOCK_NONBLOCK) = 6 epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1724452672, u64=1724452672}}) = 0 getsockname(6, {sa_family=AF_INET, sin_port=htons(4001), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0 setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0 epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc63ec0) = 0 close(6) = 0 accept4(5, 0xc63de0, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(4, [], 128, 0) = 0 epoll_wait(4, [{EPOLLIN, {u32=1724452800, u64=1724452800}}], 128, -1) = 1 --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} --- --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xa387933b} --- +++ killed by SIGSEGV +++ Segmentation fault Hrm, that doesn’t look good :( So far on the Raspberry Pi 3 I have tried Go1.11.1, Go1.12 and tip all with the same results. This seems related to #20417 and #10534 maybe[1]? Just posted to golang-nuts to try and get some feedback[2]. -- Diddymus [1] https://github.com/golang/go/issues/20417 https://github.com/golang/go/issues/10534 [2] https://groups.google.com/forum/#!topic/golang-nuts/nMQHPUhB9L8 Up to Main Index Up to Journal for March, 2019