常见的应用闪退有Java Crash和Native Crash引起,基于最新的Android P源码,以下是其2者的异常处理流程学习:
一. Java Crash
Java代码中未被try catch捕获的异常发生时,虚拟机会调用Thread#dispatchUncaughtException方法来处理异常:
// libcore/ojluni/src/main/java/java/lang/Thread.javapublic final void dispatchUncaughtException(Throwable e) { Thread.UncaughtExceptionHandler initialUeh = Thread.getUncaughtExceptionPreHandler(); if (initialUeh != null) { try { initialUeh.uncaughtException(this, e); } catch (RuntimeException | Error ignored) { // Throwables thrown by the initial handler are ignored } } getUncaughtExceptionHandler().uncaughtException(this, e); }
以上流程中,共有2个UncaughtExceptionHandler会参与处理,分别是PreHandler和Handler,核心是执行其各自实现的uncaughtException方法。
Android中提供了此二者的默认实现。Android系统中,应用进程由Zygote进程孵化而来,Zygote进程启动时,zygoteInit方法中会调用RuntimeInit.commonInit,代码如下:
// frameworks/base/core/java/com/android/internal/os/ZygoteInit.java/** * The main function called when started through the zygote process... */public static final Runnable zygoteInit(int targetSdkVersion, String[] argv, ClassLoader classLoader) { // ... RuntimeInit.commonInit(); ZygoteInit.nativeZygoteInit(); return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader); }
RuntimeInit.commonInit方法中会设置默认的UncaughtExceptionHandler,代码如下:
// frameworks/base/core/java/com/android/internal/os/RuntimeInit.javaprotected static final void commonInit() { // ... /* * set handlers; these apply to all threads in the VM. Apps can replace * the default handler, but not the pre handler. */ LoggingHandler loggingHandler = new LoggingHandler(); Thread.setUncaughtExceptionPreHandler(loggingHandler); Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler)); // ...}
实例化2个对象,分别是LoggingHandler和KillApplicationHandler,均继承于Thread#UncaughtExceptionHandler,重写unCaughtException方法。其中:
LoggingHandler,打印异常信息,包括进程名,pid,Java栈信息等。
系统进程,日志以"*** FATAL EXCEPTION IN SYSTEM PROCESS: "开头
应用进程,日志以"FATAL EXCEPTION: "开头
KillApplicationHandler,通知AMS,杀死进程。代码如下:
@Overridepublic void uncaughtException(Thread t, Throwable e) { try { // 1. 确保LoggingHandler已打印出信息(Android 9.0新增) ensureLogging(t, e); // 2. 通知AMS处理异常,弹出闪退的对话框等 ActivityManager.getService().handleApplicationCrash( mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e)); } catch (Throwable t2) { // ... } finally { // 3. 确保杀死进程 Process.killProcess(Process.myPid()); // 本质上给自己发送Singal 9,杀死进程 System.exit(10); // Java中关闭进程的方法,调用其结束Java虚拟机 } }
注意 1:
Thread#setDefaultUncaughtExceptionHandler是公开API。应用可通过调用,自定义UncaughtExceptionHandler,替换掉KillApplicationHandler,这样能自定义逻辑处理掉异常,避免闪退发生。
Thread#setUncaughtExceptionPreHandler是hidden API。应用无法调用,不能替换LoggingHandler。
/** * ...... * @hide only for use by the Android framework (RuntimeInit) b/29624607 */public static void setUncaughtExceptionPreHandler(UncaughtExceptionHandler eh) { uncaughtExceptionPreHandler = eh; } ....public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh) { defaultUncaughtExceptionHandler = eh; }
因此常出现的情况:
App运行时抛出uncaught exception后,LoggingHandler在日志中打印出了“FATAL EXCEPTION”信息,但应用已替换KillApplicationHandler,应用进程并不会退出,AMS也不会得到通知。应用仍正常运行。
注意 2:
默认情况下,uncaught exception发生后,KillApplicationHandler的方法中会执行System.exit(10)结束进程的Java虚拟机。此时,如果进程中仍有逻辑创建新线程,会抛出错误Error:Thread starting during runtime shutdown。如下:
java.lang.InternalError: Thread starting during runtime shutdown at java.lang.Thread.nativeCreate(Native Method) at java.lang.Thread.start(Thread.java:733)
日志中遇见此Error,建议首先查找下引发进程异常退出的真正原因。
二. Native Crash
Native异常发生时,CPU通过异常中断的方式,触发异常处理流程。Linux kernel会将中断处理,统一为信号。应用进程可以注册接收信号。
Android P,默认注册信号处理函数的代码位置是:bionic/linker/linker_main.cpp,其中调用debuggerd_init方法注册。linker_main.cpp代码如下:
// bionic/linker/linker_main.cpp/* * This code is called after the linker has linked itself and * fixed it's own GOT. It is safe to make references to externs * and other non-local data at this point. */static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args) { // ... debuggerd_init(&callbacks); }
debuggerd_init方法中会执行信号处理函数的注册,代码如下:
// system/core/debuggerd/handler/debuggerd_handler.cppvoid debuggerd_init(debuggerd_callbacks_t* callbacks) { // ... struct sigaction action; memset(&action, 0, sizeof(action)); sigfillset(&action.sa_mask); action.sa_sigaction = debuggerd_signal_handler; action.sa_flags = SA_RESTART | SA_SIGINFO; // Use the alternate signal stack if available so we can catch stack overflows. action.sa_flags |= SA_ONSTACK; debuggerd_register_handlers(&action); }
由上看出,信号处理的默认函数是debuggerd_signal_handler,那注册接收哪些信号呢?具体看debuggerd_register_handlers方法,如下:
// system/core/debuggerd/include/debuggerd/handler.hstatic void __attribute__((__unused__)) debuggerd_register_handlers(struct sigaction* action) { sigaction(SIGABRT, action, nullptr); sigaction(SIGBUS, action, nullptr); sigaction(SIGFPE, action, nullptr); sigaction(SIGILL, action, nullptr); sigaction(SIGSEGV, action, nullptr); #if defined(SIGSTKFLT) sigaction(SIGSTKFLT, action, nullptr); #endif sigaction(SIGSYS, action, nullptr); sigaction(SIGTRAP, action, nullptr); sigaction(DEBUGGER_SIGNAL, action, nullptr); }
通过sigaction方法,注册接收的信号有:SIGABRT,SIGBUS,SIGFPE,SIGILL,SIGSEGV,SIGSTKFLT,SIGSYS,SIGTRAP,DEBUGGER_SIGNAL,共计9个。
接下来,如果Native异常发生,处理流程如下:
应用的默认信号处理函数debuggerd_signal_handler被调用,其主要作用是针对目标进程,clone出一个子进程,并执行debuggerd_dispatch_pseudothread方法,此方法执行结束后,子进程退出。如下:
// system/core/debuggerd/handler/debuggerd_handler.cpp// Handler that does crash dumping by forking and doing the processing in the child.// Do this by ptracing the relevant thread, and then execing debuggerd to do the actual dump.static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) { // ... // 1. 打印一条Fatal signal日志,包含基本的异常信息 log_signal_summary(info); // 2. clone子进程 pid_t child_pid = clone(debuggerd_dispatch_pseudothread, pseudothread_stack, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID, &thread_info, nullptr, nullptr, &thread_info.pseudothread_tid); // ...}
log_signal_summary方法会在日志中打印一条“Fatal signal”的异常信息。通过注释大致了解,如果后续过程失败,至少先保留一条基本的Native异常信息。例如:12-16 14:30:17.067 10177 4780 4780 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x74 in tid 4780 (com.kevin.test), pid 4780 (com.kevin.test)
子进程clone出后,会执行debuggerd_dispatch_pseudothread方法,其主要作用是通过execle函数,执行/system/bin/crash_dump32或/system/bin/crash_dump64程序,并传入相关参数,包括:
main_tid:发生Native Crash的线程id(目标进程)
pseudothread_tid:初步从代码看,与获取backtrace有关,后续更多调研
debuggerd_dump_type:共有4种dump类型,发生Native Crash时的类型是kDebuggerdTombstone
static int debuggerd_dispatch_pseudothread(void* arg) { // ... execle(CRASH_DUMP_PATH, CRASH_DUMP_NAME, main_tid, pseudothread_tid, debuggerd_dump_type, nullptr, nullptr); // ...}
注意:此时执行crash_dump32或crash_dump64,并不会新创建一个进程。原因是:Linux中,execle函数将当前进程替换为1个新进程,新启动的程序main方法被执行,新旧进程的pid不变。
crash_dump.cpp的main方法会执行,代码位置:system/core/debuggerd/crash_dump.cpp,这里可以说是Native Crash异常处理的核心代码,其主要作用是:
通过ptrace attach到应用(看源码这里循环ptrace到应用的每条子线程,并针对发生Native Crash的线程会调用ReadCrashInfo方法),读取应用的寄存器等信息,最终汇总所有异常信息,包括机型版本,ABI,信号,寄存器,backtrace等,在日志中输出
通过Socket通知tombstoned进程,将所有异常信息输出到/data/tombstones/tombstone_xx文件中
通过Socket通知System_server进程,(NativeCrashListener线程会监听socket通信),并最终调用到AMS#handleApplicationCrashInner方法(逻辑同Java Crash的处理此时保持一致)
以上逻辑,主要代码如下:
// system/core/debuggerd/crash_dump.cppint main(int argc, char** argv) { // ... // 1. 通过ptrach attach到应用,获取异常信息 ATRACE_NAME("ptrace"); for (pid_t thread : threads) { // ... ThreadInfo info; info.pid = target_process; info.tid = thread; info.process_name = process_name; info.thread_name = get_thread_name(thread); if (!ptrace_interrupt(thread, &info.signo)) { PLOG(WARNING) << "failed to ptrace interrupt thread " << thread; ptrace(PTRACE_DETACH, thread, 0, 0); continue; } if (thread == g_target_thread) { // Read the thread's registers along with the rest of the crash info out of the pipe.kDebuggerdTombstone, ReadCrashInfo(input_pipe, &siginfo, &info.registers, &abort_address); info.siginfo = &siginfo; info.signo = info.siginfo->si_signo; } else { info.registers.reset(Regs::RemoteGet(thread)); if (!info.registers) { PLOG(WARNING) << "failed to fetch registers for thread " << thread; ptrace(PTRACE_DETACH, thread, 0, 0); continue; } } // ... } // ... // 2. 与tombstoned进程建立Socket通信,目的由tombstoned进程输出异常信息至/data/tombstones/tombstone_xx文件 { ATRACE_NAME("tombstoned_connect"); LOG(INFO) << "obtaining output fd from tombstoned, type: " << dump_type; g_tombstoned_connected = tombstoned_connect(g_target_thread, &g_tombstoned_socket, &g_output_fd, dump_type); } // ... // 3. 通过Socket通知System_server进程 activity_manager_notify(target_process, signo, amfd_data); // ...}
最后介绍下AMS端的处理。System_server进程中,AMS启动时,会先调用startObservingNativeCrashes方法,启动1个新线程NativeCrashListener,其作用是循环监听Socket端口(Socket Path:/data/system/ndebugsocket),接收来自debuggerd端的Native异常信息(如上面分析,对端是执行crash_dump程序的进程)。主要代码如下:
// frameworks/base/services/core/java/com/android/server/am/NativeCrashListener.javafinal class NativeCrashListener extends Thread { // ... @Override public void run() { // ... try { FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0); final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem( DEBUGGERD_SOCKET_PATH); Os.bind(serverFd, sockAddr); Os.listen(serverFd, 1); Os.chmod(DEBUGGERD_SOCKET_PATH, 0777); while (true) { FileDescriptor peerFd = null; try { if (MORE_DEBUG) Slog.v(TAG, "Waiting for debuggerd connection"); peerFd = Os.accept(serverFd, null /* peerAddress */); if (MORE_DEBUG) Slog.v(TAG, "Got debuggerd socket " + peerFd); if (peerFd != null) { // consumeNativeCrashData(peerFd); } // ... }
作者:kevinsong0810
链接:https://www.jianshu.com/p/f39e9265ea66