充满矛盾的类——SparseArray-原创手记-慕课网

虽然类名叫“稀疏数组”，但它其实非常“紧实”。这一篇将会通过分析SparseArray的源码来展现这个类的矛盾之处。

(ps: 下文中的粗斜体字表示引导源码阅读的内心戏)

还记得分析RecyclerView缓存机制中提到的SparseArray吗？当时只知道它是一个存放键值对的容器。RecyclerView的回收池使用它按ViewType分类存放被回收的ViewHolder。 HashMap也存放键值对，为啥RecyclerView不用 HashMap？

带着这个疑问，点开SparseArray.java，一探究竟：

/**
 * SparseArrays map integers to Objects.  Unlike a normal array of Objects,
 * there can be gaps in the indices.  It is intended to be more memory efficient
 * than using a HashMap to map Integers to Objects, both because it avoids
 * auto-boxing keys and its data structure doesn‘t rely on an extra entry object
 * for each mapping.
 * SparseArrays建立int值和Object之间的关系
 * 不同于一般的数组对象，SparseArray数组对象间不会存在空隙
 * 相对于HashMap<Integer,Object>更加节省内存，因为避免了键自动装箱以及创建Entry实体 
 *
 * <p>Note that this container keeps its mappings in an array data structure,
 * using a binary search to find keys.  The implementation is not intended to be appropriate for
 * data structures
 * that may contain large numbers of items.  It is generally slower than a traditional
 * HashMap, since lookups require a binary search and adds and removes require inserting
 * and deleting entries in the array.  For containers holding up to hundreds of items,
 * the performance difference is not significant, less than 50%.</p>
 * SparseArrays将键值对保存在数组中并通过二分查找键值。所以在数据量很大的时候速度较慢。
 */public class SparseArray<E> implements Cloneable {
    ...
}

读完这段注释好像已经回答了刚才的问题：因为SparseArray相较于HashMap有更好的空间性能。而且RecyclerView回收池中每种viewType默认最多存放5个ViewHolder。少量数据也正好可以忽略SparseArray在性能上的劣势。

取值

好奇 SparseArray是如何做到用时间换空间的？容器类必然会提供存和取的操作，就以取操作为切入点，一探究竟：

public class SparseArray<E> implements Cloneable {    //存储键的数组
    private int[] mKeys;    //存储值的数组
    private Object[] mValues;    
    /**
     * Gets the Object mapped from the specified key, or <code>null</code>
     * if no such mapping has been made.
     * 根据给定的key取对应value，如果没有取到则返回null
     */
    public E get(int key) {        return get(key, null);
    }    /**
     * Gets the Object mapped from the specified key, or the specified Object
     * if no such mapping has been made.
     * 根据给定的key取对应value
     */
    @SuppressWarnings("unchecked")    public E get(int key, E valueIfKeyNotFound) {        //用二分查找获取值的索引
        int i = ContainerHelpers.binarySearch(mKeys, mSize, key);        if (i < 0 || mValues[i] == DELETED) {            return valueIfKeyNotFound;
        } else {            //成功获取值
            return (E) mValues[i];
        }
    }
}class ContainerHelpers {    // This is Arrays.binarySearch(), but doesn’t do any argument validation.
    //二分查找
    static int binarySearch(int[] array, int size, int value) {        int lo = 0;        int hi = size - 1;        while (lo <= hi) {            final int mid = (lo + hi) >>> 1;            final int midVal = array[mid];            //递增序列
            if (midVal < value) {
                lo = mid + 1;
            } else if (midVal > value) {
                hi = mid - 1;
            } else {                return mid;  // value found
            }
        }        //若查找失败，则将索引置反（变为负数）
        return ~lo;  // value not present
    }
}

看到这里可以做一个阶段性总结：

SparseArray用两个长度相同的数组分别存储键和值，键是int类型的，值是Object类型的。而且一对键值在各自数组中的索引相同。
取值算法如下：按键取值时，会二分查找键数组。若找到，则用相同的索引去值数组取出值。
不由得产生一个疑问：键数组可以使用二分法查找，难道它是有序的吗？看下存值逻辑是否能找到更多线索。

存值

public class SparseArray<E> implements Cloneable {
    /**
     * Adds a mapping from the specified key to the specified value,
     * replacing the previous mapping from the specified key if there
     * was one.
     * 存值，相同键的新值会覆盖旧值
     */
    public void put(int key, E value) {        //用二分查找获得键索引（键在键数组中的位置）
        int i = ContainerHelpers.binarySearch(mKeys, mSize, key);        //1.若找到键索引（表示该键已存在），则用新值覆盖旧值
        if (i >= 0) {
            mValues[i] = value;
        }        //若未找到存储索引，则存储新值
        else {            //将负值索引恢复成正值（二分查找失败时会取反）
            i = ~i;            //2.若目标插入位置是待删除位，则复用待删除位
            if (i < mSize && mValues[i] == DELETED) {
                mKeys[i] = key;
                mValues[i] = value;                return;
            }            //当键数组满了且存在待删除位时，压缩数组
            if (mGarbage && mSize >= mKeys.length) {
                gc();                // Search again because indices may have changed.
                //压缩数组产生了数组挪动，索引会改变，所以重新查找一遍以获新目标插入位置
                i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
            }            //3.插入到新的位置
            mKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key);
            mValues = GrowingArrayUtils.insert(mValues, mSize, i, value);            //有效键值对+1
            mSize++;
        }
    }
}

存值时通过二分查找确定存入位置，如果每次存入的键都是新的，那每次查找必然失败，但查找失败返回的索引值（取反后）正是遵守了原有序列的顺序产生的应该被插入新键的位置。所以，刚才的疑问解决了：键数组是有序的，且是递增的。

在获得目标插入位置后，分为三种情况：

1. 新值覆盖旧值

在键数组中二分查找给定键值，如果返回的索引大于0，则表示找到相同的键。直接将新值赋值给值数组的对应位置，覆盖掉旧值。

2. 复用待删除位

如果二分查找失败，表示给定键是一个新的键。如果新键对应的索引位是一个待删除位，则将新值存入该位置，以复用该待删除位。待删除位是啥？，看一下删除操作 SparseArray.delete()：

public class SparseArray<E> implements Cloneable {
    ...    //用于标记某一位置的键值对已经被删除（假删）
    private static final Object DELETED = new Object();    
    /**
     * Removes the mapping from the specified key, if there was any.
     * 删除键值对
     */
    public void delete(int key) {        //获得待删除位置
        int i = ContainerHelpers.binarySearch(mKeys, mSize, key);        if (i >= 0) {            //将值数组中对应位置标记为待删除
            if (mValues[i] != DELETED) {
                mValues[i] = DELETED;                //打开回收开关
                mGarbage = true;
            }
        }
    }
}

删除键值对时并没有真正的删除，而是标记为删除（此处并没有将mSize--），但标记的后果是数组会变得“稀疏”，即有效键值对不是一个挨着一个，他们会被待删除键值对分割。

想想也对，数组的删除操作是费时的，因为需要整体挪动数组覆盖待删除位。想必在未来的某个时刻应该会执行一次真正的删除。

所以存值的第二种情况是：当待删除位还没来得及真正删除时被复用了。

3. 插入新位置

当给定键是一个新的键且其对应的索引位已经被其他键值所占用了，即发生了冲突（HashMap的解决方案是“拉链法”，相同键的值被组织在一个链表结构中），SparseArray通过挪动数组来解决冲突：

public final class GrowingArrayUtils {
    /**
     * Primitive int version of {@link #insert(Object[], int, int, Object)}.
     * 数组插入操作
     * @param array       数组
     * @param currentSize 当前有效元素个数
     * @param index 待插入位置
     * @param element 带插入元素
     * @return
     */
    public static int[] insert(int[] array, int currentSize, int index, int element) {
        assert currentSize <= array.length;        //1.如果array数组可以容纳下一个新元素，则将给定元素插入到指定位置
        //并将原本该位置以后的元素整体往后平移一个位置
        if (currentSize + 1 <= array.length) {
            System.arraycopy(array, index, array, index + 1, currentSize - index);            array[index] = element;            return array;
        }        //2.如果array容纳不下，则将数组扩容
        //构造比老数组更大的新数组
        int[] newArray = ArrayUtils.newUnpaddedIntArray(growSize(currentSize));        //将老数组中带插入位置前的所有数据复制到新数组
        System.arraycopy(array, 0, newArray, 0, index);        //将新元素插入到待插入位置
        newArray[index] = element;        //将老数组中带插入位置后的所有元素复制到新数组
        System.arraycopy(array, index, newArray, index + 1, array.length - index);        return newArray;
    }    
    //扩容算法
    public static int growSize(int currentSize) {        return currentSize <= 4 ? 8 : currentSize * 2;
    }
}

其实除了发生冲突，还有另一种情况（注释中第2点）：原本的键数组满了。这种情况下只能通过数组扩容才能存取新的值。

紧致

刚才跳过了一个步骤（它是“紧致”的关键），即将新的键值插入到数组之前，还有一个回收操作SparseArray.gc()，点进去看看：

public class SparseArray<E> implements Cloneable {    //压缩键数组和值数组，将待删除索引位覆盖(将数组中的空隙填满)
    private void gc() {        // Log.e("SparseArray", "gc start with " + mSize);

        int n = mSize;        //待删除位索引
        int o = 0;        int[] keys = mKeys;
        Object[] values = mValues;        //遍历值数组
        for (int i = 0; i < n; i++) {
            Object val = values[i];            if (val != DELETED) {                if (i != o) {                    //将后面的有效键值对往前挪覆盖待删除位
                    keys[o] = keys[i];
                    values[o] = val;
                    values[i] = null;
                }

                o++;
            }
        }        //压缩完之后 标记不需要再被压缩 直到下次键值对被删除后出现了空位
        mGarbage = false;        //有效键值对个数再一次符合事实（当发生删除操作后，mSize会偏大）
        mSize = o;
    }
}

一图胜千言：

假设当前SparseArray中有4个键值对，此时mSize=4，如下所示：

	0	1	2	3
mKeys	55	57	58	60
mValues	A	B	D	F

删除第二个后，此时mSize=4：

	0	1	2	3
mKeys	55	57	58	60
mValues	A	DELETE	D	F

执行SparseArray.gc()后，此时mSize=3：

	0	1	2	3
mKeys	55	58	60	60
mValues	A	D	F	null

虽然经过SparseArray.gc()后，数组末尾会出现两个相同的键，但这不会影响下一次二分查找，因为二分查找的终点是mSize-1。

“紧致”是相对于“稀疏”来说的，当从SparseArray删除键值对，SparseArray中存储键值对的两个数组就变得“稀疏”，“紧致”就是覆盖掉待删除键值对，让有效键值对重新挨个排在一起，这样二叉查找效率更高。

SparseArray.gc()中运用了两个索引，分别是i和o。如果把遍历值数组比作跑步，那这两个索引就是两个选手，它们同时从起点出发。不同的是，每当遇到待删除坑位，o就会停下来，而i总是一往直前，所以i一直领先于o，当i找到有效键值对时，会将其覆盖到o的位置，然后o才向前进一步。通过这样的算法，就实现了将数组后面的内容往前挪覆盖掉待删位。

所以，明明很紧致，却偏偏要取一个稀疏的名字，这是一个充满矛盾的类。

总结

SparseArray用于存放键值对，键是int，值是Object。它比HashMap更节省空间，但速度更慢。
SparseArray用两个长度相等的数组分别存储键和值，同一个键值对所在两个数组中的索引相等。
SparseArray中存放键的数组是递增序列
SparseArray删除元素时并不会真正删除，而是标记为待删除元素，在合适的时候会将后面的元素往前挪覆盖掉待删除元素。待删除元素在没有被覆盖前有可能被复用。

作者：唐子玄
链接：https://www.jianshu.com/p/bbddc7f13330