jstack遇到Unable to open socket file……-原创手记-慕课网

大家使用jstack的时候偶尔会遇到这样的异常Unable to open socket file……
下面我们依据openjdk11的code进行分析。

从错误入手

Unable to open socket file这个错误是jstack本身报的，算是一个客户端行为。什么情况下会出这个错误呢。我们直接用错误关键字进行搜索。

 File socket_file = findSocketFile(pid, ns_pid);
        socket_path = socket_file.getPath();
        if (!socket_file.exists()) {
            File f = createAttachFile(pid, ns_pid);
            try {
                sendQuitTo(pid);

                // give the target VM time to start the attach mechanism
                final int delay_step = 100;
                final long timeout = attachTimeout();
                long time_spend = 0;
                long delay = 0;
                do {
                    // Increase timeout on each attempt to reduce polling
                    delay += delay_step;
                    try {
                        Thread.sleep(delay);
                    } catch (InterruptedException x) { }

                    time_spend += delay;
                    if (time_spend > timeout/2 && !socket_file.exists()) {
                        // Send QUIT again to give target VM the last chance to react
                        sendQuitTo(pid);
                    }
                } while (time_spend <= timeout && !socket_file.exists());
                if (!socket_file.exists()) {
                    throw new AttachNotSupportedException(
                        String.format("Unable to open socket file %s: " +
                          "target process %d doesn't respond within %dms " +
                          "or HotSpot VM not loaded", socket_path, pid,
                                      time_spend));
                }
            } finally {
                f.delete();
            }
        }

方法内容相对比较简单，流程如下：
图片描述

    private File findSocketFile(int pid, int ns_pid) {
        // A process may not exist in the same mount namespace as the caller.
        // Instead, attach relative to the target root filesystem as exposed by
        // procfs regardless of namespaces.
        String root = "/proc/" + pid + "/root/" + tmpdir;
        return new File(root, ".java_pid" + ns_pid);
    }

socket的地址其实就是/tmp/.java_pid${ns_pid}
前面的/proc/pid/root/tmp指向的就是/tmp目录。

通过上面的流程，我们大概可以猜到流程中的quit的信号量，就是jvm做出对应操作的地方。

JNIEXPORT void JNICALL Java_sun_tools_attach_VirtualMachineImpl_sendQuitTo
  (JNIEnv *env, jclass cls, jint pid)
{
    if (kill((pid_t)pid, SIGQUIT)) {
        JNU_ThrowIOExceptionWithLastError(env, "kill");
    }
}

发送的信号量就是SIGQUIT。

从信号量入手

#define SIGBREAK SIGQUIT

jvm里有一段宏，就是把SIGQUIT都可以用SIGBREAK代替。


    switch (sig) {
      case SIGBREAK: {
        if (!DisableAttachMechanism && AttachListener::is_init_trigger()) {
          continue;
        }
        VM_PrintThreads op;
        ...

当收到的信号量是SIGQUIT的时候，先看看DisableAttachMechanism。如果设置了-XX:+DisableAttachMechanism，那这里就直接跳过处理了。初始化socket的流程在后面的AttachListener::is_init_trigger里。
当加了-XX:+DisableAttachMechanism后，jstack关注的socket文件就无法创建了，会一定报错。

bool AttachListener::is_init_trigger() {
  if (init_at_startup() || is_initialized()) {
    return false;               // initialized at startup or already initialized
  }
...
  if (ret == -1) {
    log_trace(attach)("Failed to find attach file: %s, trying alternate", fn);
    snprintf(fn, sizeof(fn), "%s/.attach_pid%d",
             os::get_temp_directory(), os::current_process_id());
    RESTARTABLE(::stat64(fn, &st), ret);
    if (ret == -1) {
      log_debug(attach)("Failed to find attach file: %s", fn);
    }
  }
  
  if (ret == 0) {
    // simple check to avoid starting the attach mechanism when
    // a bogus non-root user creates the file
    if (os::Posix::matches_effective_uid_or_root(st.st_uid)) {
      init();
      log_trace(attach)("Attach triggered by %s", fn);
      return true;
    } else {
      log_debug(attach)("File %s has wrong user id %d (vs %d). Attach is not triggered", fn, st.st_uid, geteuid());
    }
  }
 ...
}

is_init_trigger会先检测attach file是否存在，只有存在的情况下，才会有后面的初始化操作。
init方法中开始启动Attach Listener线程。并且最终调用到AttachListener::pd_init()方法中，然后调用到 LinuxAttachListener::init() 去初始化socket。最终通过AttachListener::set_initialized();**设置初始化成功标志。**这里很重要，回头再去看is_init_trigger方法的最开始就是检测标志，如果被设置为成功就不再执行了。也就是说信号量的操作只能初始化一次，后面就再也不会初始化了。

nt LinuxAttachListener::init() {
  char path[UNIX_PATH_MAX];          // socket file
  char initial_path[UNIX_PATH_MAX];  // socket file during setup
  int listener;                      // listener socket (file descriptor)

  // register function to cleanup
  ::atexit(listener_cleanup);

  int n = snprintf(path, UNIX_PATH_MAX, "%s/.java_pid%d",
                   os::get_temp_directory(), os::current_process_id());
  if (n < (int)UNIX_PATH_MAX) {
    n = snprintf(initial_path, UNIX_PATH_MAX, "%s.tmp", path);
  }
  if (n >= (int)UNIX_PATH_MAX) {
    return -1;
  }

  // create the listener socket
  listener = ::socket(PF_UNIX, SOCK_STREAM, 0);
  if (listener == -1) {
    return -1;
  }
  ...

这里就是socket文件的创建过程。

小结

看完上面的流程，我们大概可以梳理以下几种情况，我们是会遇到异常的。

开启了-XX:+DisableAttachMechanism。
初始化完以后，删除了/tmp下的socket文件。
程序的各种问题（资源，夯死等）导致无法触发jvm代码运行。