From 67e83543b3ab0b858dfb32f68ea03b3dd6d0b052 Mon Sep 17 00:00:00 2001
From: Stan Hu <stanhu@gmail.com>
Date: Tue, 7 Jun 2016 19:40:38 -0700
Subject: [PATCH] Add debugging tips with gdb

[ci skip]
---
 doc/README.md                                 |   1 +
 doc/administration/troubleshooting/debug.md   | 120 +++++++++++++++
 .../troubleshooting/gdb-stuck-ruby.txt        | 142 ++++++++++++++++++
 3 files changed, 263 insertions(+)
 create mode 100644 doc/administration/troubleshooting/debug.md
 create mode 100644 doc/administration/troubleshooting/gdb-stuck-ruby.txt
diff --git a/doc/README.md b/doc/README.md
index f1283cea0ad..be0d17084c7 100644
--- a/doc/README.md
+++ b/doc/README.md
@@ -44,6 +44,7 @@
 - [Housekeeping](administration/housekeeping.md) Keep your Git repository tidy and fast.
 - [GitLab Performance Monitoring](monitoring/performance/introduction.md) Configure GitLab and InfluxDB for measuring performance metrics.
 - [Monitoring uptime](monitoring/health_check.md) Check the server status using the health check endpoint.
+- [Debugging Tips](administration/troubleshooting/debug.md) Tips to debug problems when things go wrong
 - [Sidekiq Troubleshooting](administration/troubleshooting/sidekiq.md) Debug when Sidekiq appears hung and is not processing jobs.
 - [High Availability](administration/high_availability/README.md) Configure multiple servers for scaling or high availability.
 - [Container Registry](administration/container_registry.md) Configure Docker Registry with GitLab.
diff --git a/doc/administration/troubleshooting/debug.md b/doc/administration/troubleshooting/debug.md
new file mode 100644
index 00000000000..e5701b86cf3
--- /dev/null
+++ b/doc/administration/troubleshooting/debug.md
@@ -0,0 +1,120 @@
+# Debugging Tips
+
+Sometimes things don't work the way they should. Here are some tips on debugging issues out
+in production.
+
+## The GNU Project Debugger (gdb)
+
+`gdb` is a must-have tool for debugging issues. To install on Ubuntu/Debian:
+
+```
+sudo apt-get install gdb
+```
+
+On CentOS:
+
+```
+sudo yum install gdb
+```
+
+## Common Problems
+
+Many of the tips to diagnose issues below apply to many different situations. We'll use one
+concrete example to illustrate what you can do to learn what is going wrong.
+
+### 502 Gateway Timeout after unicorn spins at 100% CPU
+
+This error occurs when the Web server times out (default: 60 s) after not
+hearing back from the unicorn worker. If the CPU spins to 100% while this in
+progress, there may be something taking longer than it should.
+
+To fix this issue, we first need to figure out what is happening. The
+following tips are only recommended if you do NOT mind users being affected by
+downtime. Otherwise skip to the next section.
+
+1. Load the problematic URL
+1. Run `sudo gdb -p <PID>` to attach to the unicorn process.
+1. In the gdb window, type:
+
+    ```
+    call (void) rb_backtrace()
+    ```
+
+1. This forces the process to generate a Ruby backtrace. Check
+   `/var/log/gitlab/unicorn/unicorn_stderr.log` for the backtace. For example, you may see:
+
+    ```ruby
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `block in start'
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `loop'
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:36:in `block (2 levels) in start'
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:44:in `sample'
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `sample_objects'
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each_with_object'
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each'
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `block in sample_objects'
+    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `name'
+    ```
+
+1. To see the current threads, run:
+
+    ```
+    apply all thread bt
+    ```
+
+1. Once you're done debugging with `gdb`, be sure to detach from the process and exit:
+
+    ```
+    detach
+    exit
+    ```
+
+Note that if the unicorn process terminates before you are able to run these
+commands, gdb will report an error. To buy more time, you can always raise the
+Unicorn timeout. For omnibus users, you can edit `/etc/gitlab/gitlab.rb` and
+increase it from 60 seconds to 300:
+
+```ruby
+unicorn['worker_timeout'] = 300
+```
+
+For source installations, edit `config/unicorn.rb`.
+
+[Reconfigure] GitLab for the changes to take effect.
+
+[Reconfigure]: ../restart_gitlab.md#omnibus-gitlab-reconfigure
+
+#### Troubleshooting without affecting other users
+
+The previous section attached to a running unicorn process, and this may have
+undesirable effects for users trying to access GitLab during this time. If you
+are concerned about affecting others during a production system, you can run a
+separate Rails process to debug the issue:
+
+1. Log in to your GitLab account.
+1. Copy the URL that is causing problems (e.g. https://gitlab.com/ABC).
+1. Obtain the private token for your user (Profile Settings -> Account).
+1. Bring up the GitLab Rails console. For omnibus users, run:
+
+    ````
+    sudo gitlab-rails console
+    ```
+
+1. At the Rails console, run:
+
+    ```ruby
+    [1] pry(main)> app.get '<URL FROM STEP 1>/private_token?<TOKEN FROM STEP 2>'
+    ```
+
+    For example:
+
+    ```ruby
+    [1] pry(main)> app.get 'https://gitlab.com/gitlab-org/gitlab-ce/issues/1?private_token=123456'
+    ```
+
+1. In a new window, run `top`. It should show this ruby process using 100% CPU. Write down the PID.
+1. Follow step 2 from the previous section on using gdb.
+
+# More information
+
+* [Debugging Stuck Ruby Processes](https://blog.newrelic.com/2013/04/29/debugging-stuck-ruby-processes-what-to-do-before-you-kill-9/)
+* [Cheatsheet of using gdb and ruby processes](gdb-stuck-ruby.txt)
diff --git a/doc/administration/troubleshooting/gdb-stuck-ruby.txt b/doc/administration/troubleshooting/gdb-stuck-ruby.txt
new file mode 100644
index 00000000000..13d5dfcffa4
--- /dev/null
+++ b/doc/administration/troubleshooting/gdb-stuck-ruby.txt
@@ -0,0 +1,142 @@
+# Here's the script I'll use to demonstrate - it just loops forever:
+
+$ cat test.rb 
+#!/usr/bin/env ruby
+
+loop do
+  sleep 1
+end
+
+# Now, I'll start the script in the background, and redirect stdout and stderr
+# to /dev/null:
+
+$ ruby ./test.rb >/dev/null 2>/dev/null &
+[1] 1343
+
+# Next, I'll grab the PID of the script (1343):
+
+$ ps aux | grep test.rb
+vagrant   1343  0.0  0.4   3884  1652 pts/0    S    14:42   0:00 ruby ./test.rb
+vagrant   1345  0.0  0.2   4624   852 pts/0    S+   14:42   0:00 grep --color=auto test.rb
+
+# Now I start gdb. Note that I'm using sudo here. This may or may not be
+# necessary in your setup. I'd try without sudo first, and fall back to adding
+# it if the next step fails:
+
+$ sudo gdb
+GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
+Copyright (C) 2012 Free Software Foundation, Inc.
+License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
+and "show warranty" for details.
+This GDB was configured as "i686-linux-gnu".
+For bug reporting instructions, please see:
+<http://bugs.launchpad.net/gdb-linaro/>.
+
+# OK, now I'm in gdb, and I want to instruct it to attach to our Ruby process.
+# I can do that using the 'attach' command, which takes a PID (the one we
+# gathered above):
+
+(gdb) attach 1343
+Attaching to process 1343
+Reading symbols from /opt/vagrant_ruby/bin/ruby...done.
+Reading symbols from /lib/i386-linux-gnu/librt.so.1...(no debugging symbols found)...done.
+Loaded symbols for /lib/i386-linux-gnu/librt.so.1
+Reading symbols from /lib/i386-linux-gnu/libdl.so.2...(no debugging symbols found)...done.
+Loaded symbols for /lib/i386-linux-gnu/libdl.so.2
+Reading symbols from /lib/i386-linux-gnu/libcrypt.so.1...(no debugging symbols found)...done.
+Loaded symbols for /lib/i386-linux-gnu/libcrypt.so.1
+Reading symbols from /lib/i386-linux-gnu/libm.so.6...(no debugging symbols found)...done.
+Loaded symbols for /lib/i386-linux-gnu/libm.so.6
+Reading symbols from /lib/i386-linux-gnu/libc.so.6...(no debugging symbols found)...done.
+Loaded symbols for /lib/i386-linux-gnu/libc.so.6
+Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
+[Thread debugging using libthread_db enabled]
+Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
+Loaded symbols for /lib/i386-linux-gnu/libpthread.so.0
+Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
+Loaded symbols for /lib/ld-linux.so.2
+0xb770c424 in __kernel_vsyscall ()
+
+# Great, now gdb is attached to the target process. If the step above fails, try
+# going back and running gdb under sudo. The next thing I want to do is gather
+# C-level backtraces from all threads in the process. The following command
+# stands for 'thread apply all backtrace':
+
+(gdb) t a a bt
+
+Thread 1 (Thread 0xb74d76c0 (LWP 1343)):
+#0  0xb770c424 in __kernel_vsyscall ()
+#1  0xb75d7abd in select () from /lib/i386-linux-gnu/libc.so.6
+#2  0x08069c56 in rb_thread_wait_for (time=...) at eval.c:11376
+#3  0x080a20fd in rb_f_sleep (argc=1, argv=0xbf85f490) at process.c:1633
+#4  0x0805e0e2 in call_cfunc (argv=0xbf85f490, argc=1, len=-1, recv=3075299660, func=0x80a20b0 <rb_f_sleep>)
+    at eval.c:5778
+#5  rb_call0 (klass=3075304600, recv=3075299660, id=9393, oid=9393, argc=1, argv=0xbf85f490, body=0xb74c85a8, flags=2)
+    at eval.c:5928
+#6  0x0805e35d in rb_call (klass=3075304600, recv=3075299660, mid=9393, argc=1, argv=0xbf85f490, scope=1, 
+    self=<optimized out>) at eval.c:6176
+#7  0x080651ec in rb_eval (self=3075299660, n=0xb74c4e1c) at eval.c:3521
+#8  0x0805c31c in rb_yield_0 (val=6, self=3075299660, klass=<optimized out>, flags=0, avalue=0) at eval.c:5095
+#9  0x0806a1e5 in loop_i () at eval.c:5227
+#10 0x08058dbd in rb_rescue2 (b_proc=0x806a1c0 <loop_i>, data1=0, r_proc=0, data2=0) at eval.c:5491
+#11 0x08058f28 in rb_f_loop () at eval.c:5252
+#12 0x0805e0c1 in call_cfunc (argv=0x0, argc=0, len=0, recv=3075299660, func=0x8058ef0 <rb_f_loop>) at eval.c:5781
+#13 rb_call0 (klass=3075304600, recv=3075299660, id=4121, oid=4121, argc=0, argv=0x0, body=0xb74d4dbc, flags=2)
+    at eval.c:5928
+#14 0x0805e35d in rb_call (klass=3075304600, recv=3075299660, mid=4121, argc=0, argv=0x0, scope=1, self=<optimized out>)
+    at eval.c:6176
+#15 0x080651ec in rb_eval (self=3075299660, n=0xb74c4dcc) at eval.c:3521
+#16 0x080662c6 in rb_eval (self=3075299660, n=0xb74c4de0) at eval.c:3236
+#17 0x08068ee4 in ruby_exec_internal () at eval.c:1654
+#18 0x08068f24 in ruby_exec () at eval.c:1674
+#19 0x0806b2cd in ruby_run () at eval.c:1684
+#20 0x08053771 in main (argc=2, argv=0xbf860204, envp=0xbf860210) at main.c:48
+
+# C backtraces are sometimes sufficient, but often Ruby backtraces are necessary
+# for debugging as well. Ruby has a built-in function called rb_backtrace() that
+# we can use to dump out a Ruby backtrace, but it prints to stdout or stderr
+# (depending on your Ruby version), which might have been redirected to a file
+# or to /dev/null (as in our example) when the process started up.
+#
+# To get aroundt this, we'll do a little trick and redirect the target process's
+# stdout and stderr to the current TTY, so that any output from the process
+# will appear directly on our screen.
+
+# First, let's close the existing file descriptors for stdout and stderr
+# (FD 1 and 2, respectively):
+(gdb) call (void) close(1)
+(gdb) call (void) close(2)
+
+# Next, we need to figure out the device name for the current TTY:
+(gdb) shell tty
+/dev/pts/0
+
+# OK, now we can pass the device name obtained above to open() and attach
+# file descriptors 1 and 2 back to the current TTY with these calls:
+
+(gdb) call (int) open("/dev/pts/0", 2, 0)
+$1 = 1
+(gdb) call (int) open("/dev/pts/0", 2, 0)
+$2 = 2
+
+# Finally, we call rb_backtrace() in order to dump the Ruby backtrace:
+
+(gdb) call (void) rb_backtrace()
+  from ./test.rb:4:in `sleep'
+  from ./test.rb:4
+  from ./test.rb:3:in `loop'
+  from ./test.rb:3
+
+# And here's how we get out of gdb. Once you've quit, you'll probably want to
+# clean up the stuck process by killing it.
+
+(gdb) quit
+A debugging session is active.
+
+  Inferior 1 [process 1343] will be detached.
+
+Quit anyway? (y or n) y
+Detaching from program: /opt/vagrant_ruby/bin/ruby, process 1343
+$