介绍

Caffe2 core code中与tensor相关的可见于以下几个文件。

$ ls tensor
tensor.cc       tensor.h        tensor_int8.cc  tensor_int8.h

Tensor是Caffe2中的连续内存区域抽象表示。真正的caffe2 code中Tensor主要作为一个内存抽象APIs集合来供外部如Operator等对象来使用，其内部的大多数功能都实际依靠TensorImpl这个类来完成。TensorImpl的主要成员有两个，一个为dims_，它包含了当下内存的维度层次表示，另外一个则为storage_，它是一个Storage对象，亦是一个wrapper，实际起作用的为StorageImple，里面包含了此内存的实际地址，包含元素的类型(TypeMeta)等，同Tensor与TensorImpl的结构类似。

新一版Caffe2里面使用了许多Aten（有名的Tensor操作库，之前主要由Pytorch来使用）里的元素。像TensorImpl与Storage都是c10::intrusive_ptr_target的子类，内部自带了引用计数的操作。自然Tensor里在调用它们时都是通过使用类型为c10::intrusive_ptr的成员来做的。

下面我们将分别介绍下StorageImpl，Storage，TensorImpl及Tensor的内容。至于c10::intrusive_ptr_target与c10::intrusive_ptr多属于Aten的内容，在这里暂不作过多说明。

StorageImpl

新的Tensor实现里不再以Template的形式来支持不同类型的Context，而是通过将Tensor实现所需的DeviceType在构造时传入，并再转而构造一个Device specific的Storage对象来实现不同Device context支持。Storage里面的大多数功能都通过StorageImpl来实现。

下面的StorageImpl构造函数可以看出这一作法。

explicit StorageImpl(DeviceType device_type) : device_type_(device_type) {}  StorageImpl(DeviceType device_type, TypeMeta data_type)
      : data_type_(data_type), device_type_(device_type) {}

当然更常用的是我们同时指出管理内存的实际地址、包含空间大小、上面元素的类型及对其进行删除的方法。如下所示：

template <typename Deleter = MemoryDeleter>
  StorageImpl(
      DeviceType device_type,
      TypeMeta data_type,      void* src,      size_t capacity,
      Deleter d = nullptr)
      : data_type_(data_type), device_type_(device_type) {
    CAFFE_ENFORCE_WITH_CALLER(
        data_type_.id() != TypeIdentifier::uninitialized(),        "To create storage with a raw external pointer you need to pass in an "
        "initialized data_type(TypeMeta).");    // Check if the deleter is a MemoryDeleter and is a simple nullptr.
    if (std::is_same<MemoryDeleter, Deleter>::value &&        reinterpret_cast<MemoryDeleter*>(static_cast<void*>(&d))[0] ==            nullptr) {      // Use aliasing constructor trick to avoid calling the destructor.
      data_ptr_ = std::shared_ptr<void>(std::shared_ptr<void>(), src);
    } else {
      data_ptr_.reset(src, d);
    }
    capacity_ = capacity;
  }

明白了StorageImpl里面所含有的基本成员，我们就清楚了它所能提供的对这些成员进行操作的一些方法，举例如下：

...........
........... void reset() {
    data_ptr_.reset();
    capacity_ = 0;
  }  template <typename T>  inline bool IsType() const {    return data_type_.Match<T>();
  }  void* data() const {    return data_ptr_.get();
  }  void* data() {    return data_ptr_.get();
  }  DataPtr& data_ptr() {    return data_ptr_;
  }
...........
...........

Storage

Storage对象里面包含一个类型为c10::instrusitive_ptr的StorageImpl成员。然后它以外包的形式，向外开放StorageImpl所提供的一些功能支持。

如下为Storage的一个较全的构造函数。

  template <typename Deleter = MemoryDeleter>
  Storage(      void* src,
      DeviceType device_type,
      TypeMeta data_type,      size_t capacity,
      Deleter d = nullptr)
      : storage_impl_(c10::make_intrusive<StorageImpl>(
            device_type,
            data_type,
            src,
            capacity,
            d)) {}

然后通过外包，向外提供一些基本的内存单元属性读取或设置功能。

...............
...............  void reset() {
    storage_impl_->reset();
  }  template <typename T>  inline bool IsType() const {    return storage_impl_->IsType<T>();
  }  void* data() const {    return storage_impl_->data();
  }  void* data() {    return storage_impl_->data();
  }  DataPtr& data_ptr() {    return storage_impl_->data_ptr();
  }  const DataPtr& data_ptr() const {    return storage_impl_->data_ptr();
  }
...............
...............  inline long use_count() const {    return storage_impl_.use_count();
  }  inline bool unique() const {    return storage_impl_.unique();
  }  template <typename Deleter = MemoryDeleter>  void UniqueStorageShareExternalPointer(      void* src,      const DataType& data_type,      size_t capacity,
      Deleter d = nullptr) {
    CAFFE_ENFORCE_WITH_CALLER(
        storage_impl_.unique(),        "UniqueStorageShareExternalPointer can only be called when \
        use_count == 1");
    storage_impl_->UniqueStorageShareExternalPointer<Deleter>(
        src, data_type, capacity, d);
  } protected:
  c10::intrusive_ptr<StorageImpl> storage_impl_;
};

TensorImpl

TensorImpl对内存的管理通过两个成员完成，一个为dims，另一个则为storage_。其中storage_为上面讲过的一个Storage对象，里面有所管理内存的实际地址，空间大小，类型(TypeMeta)等。

class CAFFE2_API TensorImpl : public c10::intrusive_ptr_target {
 public:
  TensorImpl() = delete;  explicit TensorImpl(DeviceType device_type) : storage_(device_type) {}  /**
   * @brief Creates a tensor of the given dimension.
   *
   * Note that the actual data allocation is not going to be carried out until
   * the first time mutable_data() is called.
   */
  // TODO: here, we create a Storage
  // and immediately discard it in Resize() since
  // reset_tensor will be true and FreeMemory will be called,
  // we might want to avoid creating Storage twice?
  explicit TensorImpl(const vector<TIndex>& dims, at::DeviceType device_type)
      : storage_(device_type) {
    Resize(dims);
  }

我们可以move copy或assign tensorImpl对象，但却不可以以复制copy的形式进行操作。

  /**
   * @brief Delete the copy constructor and use Clone explicitly
   */
  TensorImpl(const TensorImpl& src) = delete;

  TensorImpl(TensorImpl&& src) noexcept {
    swap(src);
  }

  TensorImpl& operator=(TensorImpl&&) = default;  // Note(jiayq): possibly a rule-of-three violation, but we explicitly
  // discourage the use of = for Tensors.
  TensorImpl& operator=(const TensorImpl& src) = delete;

因为Tensor去掉了Context的模板参数，因此将它作为一个static的成员放在了类里面，指向DeviceType所对应的Context（DeviceType则存在Storage对象成员里面如上节所讲。）

  /*
   * Since we removed template from tensor, we now store a static
   * context pointer in tensor, which indicates the type of the tensor.
   */
  BaseStaticContext* GetStaticContext() const {    return get_static_context(GetDeviceType());
  }  /* @brief
   * Create a context that has the same device_type
   * as the tensor.
   * Note that this doesn't support passing in argument
   * TODO(jerryzh): move this to a global registry
   * that can create context for us
   */
  std::unique_ptr<BaseContext> CreateContext() const {    return GetStaticContext()->CreateContext();
  }

如下为一个Tensor扩展其内存空间的方法。

  /**
   * @brief Extends the outer-most dimension of this tensor by num elements,
   * preserving the existing data.
   *
   * The underlying data may be reallocated in order to accommodate the new
   * elements, in which case this tensors' capacity is grown at a factor of
   * growthPct. This ensures that Extend runs on an amortized O(1) time
   * complexity.
   */
  void Extend(TIndex num, float growthPct, BaseContext* context) {
    CAFFE_ENFORCE_WITH_CALLER(
        is_contiguous_, "Tensor must be contiguous in order to call Extend.");
    CAFFE_ENFORCE_GE_WITH_CALLER(dims_.size(), 1);
    CAFFE_ENFORCE_GE_WITH_CALLER(
        num, 0, "`num` must be non-negative for Extend");    auto newDims = dims_;
    newDims[0] += num;    if (!storage_.data()) {
      Resize(newDims);      return;
    }    auto newNumel = std::accumulate(
        newDims.begin(),
        newDims.end(),        static_cast<TIndex>(1),        std::multiplies<TIndex>());    if (newNumel * storage_.itemsize() <= storage_.capacity()) {
      dims_ = newDims;
      numel_ = newNumel;      return;
    }    auto newCapacity = dims_;
    newCapacity[0] = std::max<size_t>(
        newDims[0], std::ceil(dims_[0] * (growthPct + 100) / 100));    auto oldData = std::move(storage_.data_ptr());    auto oldSize = numel_;    auto oldDims = dims_;
    Resize(newCapacity);    auto* newData = raw_mutable_data(storage_.dtype());
    CAFFE_ENFORCE(
        context != nullptr, "Context must be provided to Extend the tensor");
    context->CopyItemsSameDevice(
        storage_.dtype(), oldSize, oldData.get(), newData);
    reserved_ = true;
    dims_ = newDims;
    numel_ = newNumel;
  }

以下为对Tensor管理空间进行shrink时所做的事。注意我们不可对共享的storage单元进行shrink操作。

  /**
   * @brief Shrinks the outer-most dimension to given size, keeping the data.
   *
   * This method guarantees that no re-allocations are carried out, which means
   * that the extra capacity after the end of the shurnk tensor is maintained.
   */
  void ShrinkTo(TIndex outer_dim) {
    CAFFE_ENFORCE_WITH_CALLER(
        is_contiguous_, "Tensor must be contiguous in order to call ShrinkTo.");
    CAFFE_ENFORCE_WITH_CALLER(dims_.size() >= 1, "Tensor must be at least 1D");
    CAFFE_ENFORCE_WITH_CALLER(
        outer_dim <= dims_[0],        "New outer dimension must be smaller than current.");
    CAFFE_ENFORCE(
        storage_.unique(),        "Can't call ShrinkTo on shared storage, please call Resize instead.");
    dims_[0] = outer_dim;
    numel_ = std::accumulate(
        dims_.begin(),
        dims_.end(),        static_cast<TIndex>(1),        std::multiplies<TIndex>());
  }

Tensor上的Resize操作，则主要是更改dims；只有某些情况下，一些flags存在，加上实时条件满足则会真正地释放掉旧的内存，下一次mutable_data调用时则重新分配内存。

  /**
   * @brief Resizes a tensor.
   *
   * Resize takes in a vector of ints specifying the dimensions of the tensor.
   * You can pass in an empty vector to specify that it is a scalar (i.e.
   * containing one single item).
   *
   * The underlying storage may be deleted after calling Resize: if the new
   * shape leads to a different number of items in the tensor, the old memory
   * is deleted and new memory will be allocated next time you call
   * mutable_data(). However, if the shape is different but the total number of
   * items is the same, the underlying storage is kept.
   */
  template <typename... Ts>  void Resize(Ts... dim_source) {    bool is_init = numel_ == -1;    bool size_changed = SetDims(dim_source...);    if (size_changed) {      // If needed, we will free the data. the next mutable_data() call
      // will create the data storage.
      bool reset_tensor = false;      if (reserved_) {        // If tensor is reserved then don't claim its memeory unless capacity()
        // is smaller than new size
        reset_tensor = storage_.capacity() < numel_ * storage_.itemsize();
      } else {
        reset_tensor = storage_.capacity() < numel_ * storage_.itemsize() ||
            !FLAGS_caffe2_keep_on_shrink ||
            storage_.capacity() - numel_ * storage_.itemsize() >
                FLAGS_caffe2_max_keep_on_shrink_memory;
      }      if (reset_tensor && !is_init) {
        FreeMemory();
      }
    }
  }

Reshape则只是需要保证新的dims与旧的dims总的size相同，然后直接改变dims，而对底下的storage对象则并不改动。

  /**
   * Resizes the tensor without touching underlying storage.
   * This requires the total size of the tensor to remains constant.
   */
  inline void Reshape(const vector<TIndex>& dims) {
    CAFFE_ENFORCE_WITH_CALLER(
        is_contiguous_, "Tensor must be contiguous in order to call Reshape.");
    TIndex new_size = 1;    for (auto d : dims) {
      CAFFE_ENFORCE_GE_WITH_CALLER(d, 0);
      new_size *= d;
    }
    CAFFE_ENFORCE_WITH_CALLER(
        new_size == numel_,        "New size and old size are not equal. You cannot use Reshape, "
        "but should use Resize."
        // TODO(jiayq): remove the following warning after pending diffs
        // stabilize.
        " The old caffe2 mixes Reshape and Resize but this behavior has "
        "been changed. If you find this error, most likely you will need "
        "to change corresponding code from Reshape to Resize.");
    dims_ = dims;
  }

ShareData则是为了在多个Tensor之间共享底部的Storage，而它们可以有着完全不同的dims，只需要保证其size相同即可。

  /**
   * @brief Shares the data with another tensor.
   *
   * To share data between two tensors, the sizes of the two tensors must be
   * equal already. The reason we do not implicitly do a Resize to make the two
   * tensors have the same shape is that we want to allow tensors of different
   * shapes but the same number of items to still be able to share data. This
   * allows one to e.g. have a n-dimensional Tensor and a flattened version
   * sharing the same underlying storage.
   *
   * The source tensor should already have its data allocated.
   */
  void ShareData(const TensorImpl& src) {    // Right now, we are assuming the device_type are the same, since it is
    // inherently the same in the non-templatized code. We should probably add
    // an ENFORCE here which might affect perf a little bit.
    CAFFE_ENFORCE_EQ_WITH_CALLER(
        src.numel_,
        numel_,        "Size mismatch - did you call reshape before sharing the data?");    // It is possible that the source tensor hasn't called mutable_data() yet,
    // in which case ShareData() doesn't make much sense since we don't really
    // know what to share yet.
    CAFFE_ENFORCE_WITH_CALLER(
        src.storage_.data() || src.numel_ == 0,        "Source tensor has no content and has size > 0");    // Finally, do sharing.
    /* Since we create new Storage whenever we need to change data_type/capacity
     * this still keeps the original semantics
     */
    storage_ = src.storage();
  }

以下为返回只读的raw data或者带类型（Typename T）的data的方法。

  /**
   * Returns a const raw void* pointer of the underlying storage. mutable_data()
   * or raw_mutable_data() must have been called prior to this function call.
   */
  inline const void* raw_data() const {
    CAFFE_ENFORCE_WITH_CALLER(
        is_contiguous_,        "Tensor must be contiguous in order to call raw_data()");
    CAFFE_ENFORCE_WITH_CALLER(storage_.data() || numel_ == 0);    return storage_.data();
  }  /**
   * Returns a typed pointer of the underlying storage. mutable_data() or
   * raw_mutable_data() must have been called prior to this function call, and
   * the data type must be of the correct type. If you want to get a void*
   * pointer instead, use raw_data().
   */
  template <typename T>  inline const T* data() const {
    CAFFE_ENFORCE_WITH_CALLER(
        is_contiguous_, "Tensor must be contiguous in order to call data()");
    CAFFE_ENFORCE_WITH_CALLER(
        storage_.data() || numel_ == 0,        "The tensor is of non-zero shape, but its data is not allocated yet. "
        "Caffe2 uses a lazy allocation, so you will need to call "
        "mutable_data() or raw_mutable_data() to actually allocate memory.");
    CAFFE_ENFORCE_WITH_CALLER(
        IsType<T>(),        "Tensor type mismatch, caller expects elements to be ",
        TypeMeta::TypeName<T>(),        " while tensor contains ",
        storage_.dtype().name());    return static_cast<T*>(storage_.data());
  }

以下为有意思的mutable_data的实现方法。其实里面有一个delay执行分配内存的优化做法。

  /**
   * Returns a mutable raw pointer of the underlying storage. Since we will need
   * to know the type of the data for allocation, a TypeMeta object is passed in
   * to specify the necessary information. This is conceptually equivalent of
   * calling mutable_data<T>() where the TypeMeta parameter meta is derived from
   * the type T. This function differs from mutable_data<T>() in the sense that
   * the type T can be specified during runtime via the TypeMeta object.
   *
   * If the existing data does not match the desired type, it will be deleted
   * and a new storage will be created.
   */
  inline void* raw_mutable_data(const TypeMeta& meta) {
    CAFFE_ENFORCE_WITH_CALLER(
        is_contiguous_,        "Tensor must be contiguous in order to call raw_mutable_data()");    // For 0-size tensors it's fine to return any pointer (including nullptr)
    if (storage_.dtype() == meta && (storage_.data() || numel_ == 0)) {      return storage_.data();
    } else {      bool had_special_dtor = storage_.dtype().dtor() != nullptr;      if (storage_.unique()) {
        storage_.set_dtype(meta);        // TODO: recalcuate numel when we store numel instead of capacity in
        // Storage
      } else {        if (storage_.dtype() != meta) {
          storage_ = Storage(storage_.device_type(), meta);
        }
      }
      CAFFE_ENFORCE_WITH_CALLER(
          numel_ >= 0,          "Tensor is not initialized. You probably need to call Resize() "
          "before calling mutable_data()");      // We can reuse the existing buffer if the current data does not have
      // a special destructor and the new data doesn't have a special
      // constructor.
      if (numel_ == 0 ||
          (meta.ctor() == nullptr && !had_special_dtor &&
           storage_.capacity() >= numel_ * storage_.itemsize())) {        return storage_.data();
      }      if (meta.ctor()) {        // For types that need placement new, we will call it, as well as
        // making sure that when the data is freed, it calls the right
        // destruction procedure.
        auto size = numel_;        auto dtor = storage_.dtype().dtor();        auto ptr_and_deleter =
            GetStaticContext()->New(numel_ * storage_.itemsize());        auto deleter = ptr_and_deleter.second;
        storage_.data_ptr().reset(
            ptr_and_deleter.first, [size, dtor, deleter](void* ptr) -> void {
              dtor(ptr, size);
              deleter(ptr);
            });
        storage_.dtype().ctor()(storage_.data(), numel_);
      } else {        // For fundamental type, new and delete is easier.
        auto ptr_and_deleter =
            GetStaticContext()->New(numel_ * storage_.itemsize());
        storage_.data_ptr().reset(
            ptr_and_deleter.first, ptr_and_deleter.second);
      }
      storage_.set_numel(numel_);      return storage_.data();
    }
  }

下面为它的一些protected成员。

 protected:
  DimVector dims_; // sizes_
  DimVector strides_;
  TIndex numel_ = -1; // numel_
  bool is_contiguous_ = true;  // we decide to keep reserved_ and it will
  // live in Tensor after the split
  // The logic is that if Extend() or ReserveSpace() were ever called,
  // then subsequent Resize()s will not free up Storage.
  bool reserved_ = false;
  Storage storage_;  // int64_t storage_offset_;

我们还有一个static的UndefinedTensor使用。

class CAFFE2_API UndefinedTensorImpl final : public TensorImpl {
  UndefinedTensorImpl() : TensorImpl(CPU){}; public: // Without this, we get:
 //  error: identifier "at::UndefinedTensor::_singleton" is undefined in device code
 // (ostensibly because the constexpr tricks MSVC into trying to compile this
 // function for device as well).#ifdef _WIN32
 static inline TensorImpl * singleton() {#else
 static constexpr inline TensorImpl * singleton() {#endif
    return &singleton_;
  } private:  static UndefinedTensorImpl singleton_;
};

Tensor

最后我们来看下真正外部程序所见的对象，Tensor。

以下为它的构造函数及基本类成员，从此易知它的大部分操作都是借助TensorImpl来完成的。在它的多个构造函数当中还有一个模板构造函数。

/**
 * @brief Tensor class holds a shared pointer to the implementation TensorImpl,
 * redirects API calls to TensorImpl;
 * Copying of Tensor results in sharing the same underlying implementation
 * object
 */class CAFFE2_API Tensor final { protected:
  using TensorImplPtr = c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>;
  TensorImplPtr impl_; public:
  Tensor() : impl_() {}

  operator bool() const {    return impl_.defined();
  }

  explicit Tensor(const vector<TIndex>& dims, DeviceType type)
      : impl_(
            c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(dims, type)) {}

  template <
      typename T,
      typename = typename std::enable_if<std::is_scalar<T>::value>::type>
  Tensor(const T& value, BaseContext* context)
      : impl_(c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(
            value,
            context)) {}

以下为Tensor借助TensorImpl实现一些操作的提现。

  inline int ndim() const {    return impl_.get()->ndim();
  }  inline TIndex size() const {    return impl_.get()->size();
  }  inline size_t itemsize() const {    return impl_.get()->itemsize();
  }  inline size_t nbytes() const {    return impl_.get()->nbytes();
  }  inline size_t capacity_nbytes() const {    return impl_.get()->capacity_nbytes();
  }

作者：manofmountain
链接：https://www.jianshu.com/p/21c74798062b

Caffe2核心代码解析系列之三：Tensor

介绍

StorageImpl

Storage

TensorImpl

Tensor