From 67e83543b3ab0b858dfb32f68ea03b3dd6d0b052 Mon Sep 17 00:00:00 2001 From: Stan Hu Date: Tue, 7 Jun 2016 19:40:38 -0700 Subject: [PATCH] Add debugging tips with gdb [ci skip] --- doc/README.md | 1 + doc/administration/troubleshooting/debug.md | 120 +++++++++++++++ .../troubleshooting/gdb-stuck-ruby.txt | 142 ++++++++++++++++++ 3 files changed, 263 insertions(+) create mode 100644 doc/administration/troubleshooting/debug.md create mode 100644 doc/administration/troubleshooting/gdb-stuck-ruby.txt diff --git a/doc/README.md b/doc/README.md index f1283cea0ad..be0d17084c7 100644 --- a/doc/README.md +++ b/doc/README.md @@ -44,6 +44,7 @@ - [Housekeeping](administration/housekeeping.md) Keep your Git repository tidy and fast. - [GitLab Performance Monitoring](monitoring/performance/introduction.md) Configure GitLab and InfluxDB for measuring performance metrics. - [Monitoring uptime](monitoring/health_check.md) Check the server status using the health check endpoint. +- [Debugging Tips](administration/troubleshooting/debug.md) Tips to debug problems when things go wrong - [Sidekiq Troubleshooting](administration/troubleshooting/sidekiq.md) Debug when Sidekiq appears hung and is not processing jobs. - [High Availability](administration/high_availability/README.md) Configure multiple servers for scaling or high availability. - [Container Registry](administration/container_registry.md) Configure Docker Registry with GitLab. diff --git a/doc/administration/troubleshooting/debug.md b/doc/administration/troubleshooting/debug.md new file mode 100644 index 00000000000..e5701b86cf3 --- /dev/null +++ b/doc/administration/troubleshooting/debug.md @@ -0,0 +1,120 @@ +# Debugging Tips + +Sometimes things don't work the way they should. Here are some tips on debugging issues out +in production. + +## The GNU Project Debugger (gdb) + +`gdb` is a must-have tool for debugging issues. To install on Ubuntu/Debian: + +``` +sudo apt-get install gdb +``` + +On CentOS: + +``` +sudo yum install gdb +``` + +## Common Problems + +Many of the tips to diagnose issues below apply to many different situations. We'll use one +concrete example to illustrate what you can do to learn what is going wrong. + +### 502 Gateway Timeout after unicorn spins at 100% CPU + +This error occurs when the Web server times out (default: 60 s) after not +hearing back from the unicorn worker. If the CPU spins to 100% while this in +progress, there may be something taking longer than it should. + +To fix this issue, we first need to figure out what is happening. The +following tips are only recommended if you do NOT mind users being affected by +downtime. Otherwise skip to the next section. + +1. Load the problematic URL +1. Run `sudo gdb -p ` to attach to the unicorn process. +1. In the gdb window, type: + + ``` + call (void) rb_backtrace() + ``` + +1. This forces the process to generate a Ruby backtrace. Check + `/var/log/gitlab/unicorn/unicorn_stderr.log` for the backtace. For example, you may see: + + ```ruby + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `block in start' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `loop' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:36:in `block (2 levels) in start' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:44:in `sample' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `sample_objects' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each_with_object' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `block in sample_objects' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `name' + ``` + +1. To see the current threads, run: + + ``` + apply all thread bt + ``` + +1. Once you're done debugging with `gdb`, be sure to detach from the process and exit: + + ``` + detach + exit + ``` + +Note that if the unicorn process terminates before you are able to run these +commands, gdb will report an error. To buy more time, you can always raise the +Unicorn timeout. For omnibus users, you can edit `/etc/gitlab/gitlab.rb` and +increase it from 60 seconds to 300: + +```ruby +unicorn['worker_timeout'] = 300 +``` + +For source installations, edit `config/unicorn.rb`. + +[Reconfigure] GitLab for the changes to take effect. + +[Reconfigure]: ../restart_gitlab.md#omnibus-gitlab-reconfigure + +#### Troubleshooting without affecting other users + +The previous section attached to a running unicorn process, and this may have +undesirable effects for users trying to access GitLab during this time. If you +are concerned about affecting others during a production system, you can run a +separate Rails process to debug the issue: + +1. Log in to your GitLab account. +1. Copy the URL that is causing problems (e.g. https://gitlab.com/ABC). +1. Obtain the private token for your user (Profile Settings -> Account). +1. Bring up the GitLab Rails console. For omnibus users, run: + + ```` + sudo gitlab-rails console + ``` + +1. At the Rails console, run: + + ```ruby + [1] pry(main)> app.get '/private_token?' + ``` + + For example: + + ```ruby + [1] pry(main)> app.get 'https://gitlab.com/gitlab-org/gitlab-ce/issues/1?private_token=123456' + ``` + +1. In a new window, run `top`. It should show this ruby process using 100% CPU. Write down the PID. +1. Follow step 2 from the previous section on using gdb. + +# More information + +* [Debugging Stuck Ruby Processes](https://blog.newrelic.com/2013/04/29/debugging-stuck-ruby-processes-what-to-do-before-you-kill-9/) +* [Cheatsheet of using gdb and ruby processes](gdb-stuck-ruby.txt) diff --git a/doc/administration/troubleshooting/gdb-stuck-ruby.txt b/doc/administration/troubleshooting/gdb-stuck-ruby.txt new file mode 100644 index 00000000000..13d5dfcffa4 --- /dev/null +++ b/doc/administration/troubleshooting/gdb-stuck-ruby.txt @@ -0,0 +1,142 @@ +# Here's the script I'll use to demonstrate - it just loops forever: + +$ cat test.rb +#!/usr/bin/env ruby + +loop do + sleep 1 +end + +# Now, I'll start the script in the background, and redirect stdout and stderr +# to /dev/null: + +$ ruby ./test.rb >/dev/null 2>/dev/null & +[1] 1343 + +# Next, I'll grab the PID of the script (1343): + +$ ps aux | grep test.rb +vagrant 1343 0.0 0.4 3884 1652 pts/0 S 14:42 0:00 ruby ./test.rb +vagrant 1345 0.0 0.2 4624 852 pts/0 S+ 14:42 0:00 grep --color=auto test.rb + +# Now I start gdb. Note that I'm using sudo here. This may or may not be +# necessary in your setup. I'd try without sudo first, and fall back to adding +# it if the next step fails: + +$ sudo gdb +GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04 +Copyright (C) 2012 Free Software Foundation, Inc. +License GPLv3+: GNU GPL version 3 or later +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. Type "show copying" +and "show warranty" for details. +This GDB was configured as "i686-linux-gnu". +For bug reporting instructions, please see: +. + +# OK, now I'm in gdb, and I want to instruct it to attach to our Ruby process. +# I can do that using the 'attach' command, which takes a PID (the one we +# gathered above): + +(gdb) attach 1343 +Attaching to process 1343 +Reading symbols from /opt/vagrant_ruby/bin/ruby...done. +Reading symbols from /lib/i386-linux-gnu/librt.so.1...(no debugging symbols found)...done. +Loaded symbols for /lib/i386-linux-gnu/librt.so.1 +Reading symbols from /lib/i386-linux-gnu/libdl.so.2...(no debugging symbols found)...done. +Loaded symbols for /lib/i386-linux-gnu/libdl.so.2 +Reading symbols from /lib/i386-linux-gnu/libcrypt.so.1...(no debugging symbols found)...done. +Loaded symbols for /lib/i386-linux-gnu/libcrypt.so.1 +Reading symbols from /lib/i386-linux-gnu/libm.so.6...(no debugging symbols found)...done. +Loaded symbols for /lib/i386-linux-gnu/libm.so.6 +Reading symbols from /lib/i386-linux-gnu/libc.so.6...(no debugging symbols found)...done. +Loaded symbols for /lib/i386-linux-gnu/libc.so.6 +Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...(no debugging symbols found)...done. +[Thread debugging using libthread_db enabled] +Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1". +Loaded symbols for /lib/i386-linux-gnu/libpthread.so.0 +Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. +Loaded symbols for /lib/ld-linux.so.2 +0xb770c424 in __kernel_vsyscall () + +# Great, now gdb is attached to the target process. If the step above fails, try +# going back and running gdb under sudo. The next thing I want to do is gather +# C-level backtraces from all threads in the process. The following command +# stands for 'thread apply all backtrace': + +(gdb) t a a bt + +Thread 1 (Thread 0xb74d76c0 (LWP 1343)): +#0 0xb770c424 in __kernel_vsyscall () +#1 0xb75d7abd in select () from /lib/i386-linux-gnu/libc.so.6 +#2 0x08069c56 in rb_thread_wait_for (time=...) at eval.c:11376 +#3 0x080a20fd in rb_f_sleep (argc=1, argv=0xbf85f490) at process.c:1633 +#4 0x0805e0e2 in call_cfunc (argv=0xbf85f490, argc=1, len=-1, recv=3075299660, func=0x80a20b0 ) + at eval.c:5778 +#5 rb_call0 (klass=3075304600, recv=3075299660, id=9393, oid=9393, argc=1, argv=0xbf85f490, body=0xb74c85a8, flags=2) + at eval.c:5928 +#6 0x0805e35d in rb_call (klass=3075304600, recv=3075299660, mid=9393, argc=1, argv=0xbf85f490, scope=1, + self=) at eval.c:6176 +#7 0x080651ec in rb_eval (self=3075299660, n=0xb74c4e1c) at eval.c:3521 +#8 0x0805c31c in rb_yield_0 (val=6, self=3075299660, klass=, flags=0, avalue=0) at eval.c:5095 +#9 0x0806a1e5 in loop_i () at eval.c:5227 +#10 0x08058dbd in rb_rescue2 (b_proc=0x806a1c0 , data1=0, r_proc=0, data2=0) at eval.c:5491 +#11 0x08058f28 in rb_f_loop () at eval.c:5252 +#12 0x0805e0c1 in call_cfunc (argv=0x0, argc=0, len=0, recv=3075299660, func=0x8058ef0 ) at eval.c:5781 +#13 rb_call0 (klass=3075304600, recv=3075299660, id=4121, oid=4121, argc=0, argv=0x0, body=0xb74d4dbc, flags=2) + at eval.c:5928 +#14 0x0805e35d in rb_call (klass=3075304600, recv=3075299660, mid=4121, argc=0, argv=0x0, scope=1, self=) + at eval.c:6176 +#15 0x080651ec in rb_eval (self=3075299660, n=0xb74c4dcc) at eval.c:3521 +#16 0x080662c6 in rb_eval (self=3075299660, n=0xb74c4de0) at eval.c:3236 +#17 0x08068ee4 in ruby_exec_internal () at eval.c:1654 +#18 0x08068f24 in ruby_exec () at eval.c:1674 +#19 0x0806b2cd in ruby_run () at eval.c:1684 +#20 0x08053771 in main (argc=2, argv=0xbf860204, envp=0xbf860210) at main.c:48 + +# C backtraces are sometimes sufficient, but often Ruby backtraces are necessary +# for debugging as well. Ruby has a built-in function called rb_backtrace() that +# we can use to dump out a Ruby backtrace, but it prints to stdout or stderr +# (depending on your Ruby version), which might have been redirected to a file +# or to /dev/null (as in our example) when the process started up. +# +# To get aroundt this, we'll do a little trick and redirect the target process's +# stdout and stderr to the current TTY, so that any output from the process +# will appear directly on our screen. + +# First, let's close the existing file descriptors for stdout and stderr +# (FD 1 and 2, respectively): +(gdb) call (void) close(1) +(gdb) call (void) close(2) + +# Next, we need to figure out the device name for the current TTY: +(gdb) shell tty +/dev/pts/0 + +# OK, now we can pass the device name obtained above to open() and attach +# file descriptors 1 and 2 back to the current TTY with these calls: + +(gdb) call (int) open("/dev/pts/0", 2, 0) +$1 = 1 +(gdb) call (int) open("/dev/pts/0", 2, 0) +$2 = 2 + +# Finally, we call rb_backtrace() in order to dump the Ruby backtrace: + +(gdb) call (void) rb_backtrace() + from ./test.rb:4:in `sleep' + from ./test.rb:4 + from ./test.rb:3:in `loop' + from ./test.rb:3 + +# And here's how we get out of gdb. Once you've quit, you'll probably want to +# clean up the stuck process by killing it. + +(gdb) quit +A debugging session is active. + + Inferior 1 [process 1343] will be detached. + +Quit anyway? (y or n) y +Detaching from program: /opt/vagrant_ruby/bin/ruby, process 1343 +$