介绍
Caffe2 core code中与tensor相关的可见于以下几个文件。
$ ls tensor tensor.cc tensor.h tensor_int8.cc tensor_int8.h
Tensor是Caffe2中的连续内存区域抽象表示。真正的caffe2 code中Tensor主要作为一个内存抽象APIs集合来供外部如Operator等对象来使用,其内部的大多数功能都实际依靠TensorImpl这个类来完成。TensorImpl的主要成员有两个,一个为dims_,它包含了当下内存的维度层次表示,另外一个则为storage_,它是一个Storage对象,亦是一个wrapper,实际起作用的为StorageImple,里面包含了此内存的实际地址,包含元素的类型(TypeMeta)等,同Tensor与TensorImpl的结构类似。
新一版Caffe2里面使用了许多Aten(有名的Tensor操作库,之前主要由Pytorch来使用)里的元素。像TensorImpl与Storage都是c10::intrusive_ptr_target的子类,内部自带了引用计数的操作。自然Tensor里在调用它们时都是通过使用类型为c10::intrusive_ptr的成员来做的。
下面我们将分别介绍下StorageImpl,Storage,TensorImpl及Tensor的内容。至于c10::intrusive_ptr_target与c10::intrusive_ptr多属于Aten的内容,在这里暂不作过多说明。
StorageImpl
新的Tensor实现里不再以Template的形式来支持不同类型的Context,而是通过将Tensor实现所需的DeviceType在构造时传入,并再转而构造一个Device specific的Storage对象来实现不同Device context支持。Storage里面的大多数功能都通过StorageImpl来实现。
下面的StorageImpl构造函数可以看出这一作法。
explicit StorageImpl(DeviceType device_type) : device_type_(device_type) {} StorageImpl(DeviceType device_type, TypeMeta data_type) : data_type_(data_type), device_type_(device_type) {}
当然更常用的是我们同时指出管理内存的实际地址、包含空间大小、上面元素的类型及对其进行删除的方法。如下所示:
template <typename Deleter = MemoryDeleter> StorageImpl( DeviceType device_type, TypeMeta data_type, void* src, size_t capacity, Deleter d = nullptr) : data_type_(data_type), device_type_(device_type) { CAFFE_ENFORCE_WITH_CALLER( data_type_.id() != TypeIdentifier::uninitialized(), "To create storage with a raw external pointer you need to pass in an " "initialized data_type(TypeMeta)."); // Check if the deleter is a MemoryDeleter and is a simple nullptr. if (std::is_same<MemoryDeleter, Deleter>::value && reinterpret_cast<MemoryDeleter*>(static_cast<void*>(&d))[0] == nullptr) { // Use aliasing constructor trick to avoid calling the destructor. data_ptr_ = std::shared_ptr<void>(std::shared_ptr<void>(), src); } else { data_ptr_.reset(src, d); } capacity_ = capacity; }
明白了StorageImpl里面所含有的基本成员,我们就清楚了它所能提供的对这些成员进行操作的一些方法,举例如下:
........... ........... void reset() { data_ptr_.reset(); capacity_ = 0; } template <typename T> inline bool IsType() const { return data_type_.Match<T>(); } void* data() const { return data_ptr_.get(); } void* data() { return data_ptr_.get(); } DataPtr& data_ptr() { return data_ptr_; } ........... ...........
Storage
Storage对象里面包含一个类型为c10::instrusitive_ptr的StorageImpl成员。然后它以外包的形式,向外开放StorageImpl所提供的一些功能支持。
如下为Storage的一个较全的构造函数。
template <typename Deleter = MemoryDeleter> Storage( void* src, DeviceType device_type, TypeMeta data_type, size_t capacity, Deleter d = nullptr) : storage_impl_(c10::make_intrusive<StorageImpl>( device_type, data_type, src, capacity, d)) {}
然后通过外包,向外提供一些基本的内存单元属性读取或设置功能。
............... ............... void reset() { storage_impl_->reset(); } template <typename T> inline bool IsType() const { return storage_impl_->IsType<T>(); } void* data() const { return storage_impl_->data(); } void* data() { return storage_impl_->data(); } DataPtr& data_ptr() { return storage_impl_->data_ptr(); } const DataPtr& data_ptr() const { return storage_impl_->data_ptr(); } ............... ............... inline long use_count() const { return storage_impl_.use_count(); } inline bool unique() const { return storage_impl_.unique(); } template <typename Deleter = MemoryDeleter> void UniqueStorageShareExternalPointer( void* src, const DataType& data_type, size_t capacity, Deleter d = nullptr) { CAFFE_ENFORCE_WITH_CALLER( storage_impl_.unique(), "UniqueStorageShareExternalPointer can only be called when \ use_count == 1"); storage_impl_->UniqueStorageShareExternalPointer<Deleter>( src, data_type, capacity, d); } protected: c10::intrusive_ptr<StorageImpl> storage_impl_; };
TensorImpl
TensorImpl对内存的管理通过两个成员完成,一个为dims,另一个则为storage_。其中storage_为上面讲过的一个Storage对象,里面有所管理内存的实际地址,空间大小,类型(TypeMeta)等。
class CAFFE2_API TensorImpl : public c10::intrusive_ptr_target { public: TensorImpl() = delete; explicit TensorImpl(DeviceType device_type) : storage_(device_type) {} /** * @brief Creates a tensor of the given dimension. * * Note that the actual data allocation is not going to be carried out until * the first time mutable_data() is called. */ // TODO: here, we create a Storage // and immediately discard it in Resize() since // reset_tensor will be true and FreeMemory will be called, // we might want to avoid creating Storage twice? explicit TensorImpl(const vector<TIndex>& dims, at::DeviceType device_type) : storage_(device_type) { Resize(dims); }
我们可以move copy或assign tensorImpl对象,但却不可以以复制copy的形式进行操作。
/** * @brief Delete the copy constructor and use Clone explicitly */ TensorImpl(const TensorImpl& src) = delete; TensorImpl(TensorImpl&& src) noexcept { swap(src); } TensorImpl& operator=(TensorImpl&&) = default; // Note(jiayq): possibly a rule-of-three violation, but we explicitly // discourage the use of = for Tensors. TensorImpl& operator=(const TensorImpl& src) = delete;
因为Tensor去掉了Context的模板参数,因此将它作为一个static的成员放在了类里面,指向DeviceType所对应的Context(DeviceType则存在Storage对象成员里面如上节所讲。)
/* * Since we removed template from tensor, we now store a static * context pointer in tensor, which indicates the type of the tensor. */ BaseStaticContext* GetStaticContext() const { return get_static_context(GetDeviceType()); } /* @brief * Create a context that has the same device_type * as the tensor. * Note that this doesn't support passing in argument * TODO(jerryzh): move this to a global registry * that can create context for us */ std::unique_ptr<BaseContext> CreateContext() const { return GetStaticContext()->CreateContext(); }
如下为一个Tensor扩展其内存空间的方法。
/** * @brief Extends the outer-most dimension of this tensor by num elements, * preserving the existing data. * * The underlying data may be reallocated in order to accommodate the new * elements, in which case this tensors' capacity is grown at a factor of * growthPct. This ensures that Extend runs on an amortized O(1) time * complexity. */ void Extend(TIndex num, float growthPct, BaseContext* context) { CAFFE_ENFORCE_WITH_CALLER( is_contiguous_, "Tensor must be contiguous in order to call Extend."); CAFFE_ENFORCE_GE_WITH_CALLER(dims_.size(), 1); CAFFE_ENFORCE_GE_WITH_CALLER( num, 0, "`num` must be non-negative for Extend"); auto newDims = dims_; newDims[0] += num; if (!storage_.data()) { Resize(newDims); return; } auto newNumel = std::accumulate( newDims.begin(), newDims.end(), static_cast<TIndex>(1), std::multiplies<TIndex>()); if (newNumel * storage_.itemsize() <= storage_.capacity()) { dims_ = newDims; numel_ = newNumel; return; } auto newCapacity = dims_; newCapacity[0] = std::max<size_t>( newDims[0], std::ceil(dims_[0] * (growthPct + 100) / 100)); auto oldData = std::move(storage_.data_ptr()); auto oldSize = numel_; auto oldDims = dims_; Resize(newCapacity); auto* newData = raw_mutable_data(storage_.dtype()); CAFFE_ENFORCE( context != nullptr, "Context must be provided to Extend the tensor"); context->CopyItemsSameDevice( storage_.dtype(), oldSize, oldData.get(), newData); reserved_ = true; dims_ = newDims; numel_ = newNumel; }
以下为对Tensor管理空间进行shrink时所做的事。注意我们不可对共享的storage单元进行shrink操作。
/** * @brief Shrinks the outer-most dimension to given size, keeping the data. * * This method guarantees that no re-allocations are carried out, which means * that the extra capacity after the end of the shurnk tensor is maintained. */ void ShrinkTo(TIndex outer_dim) { CAFFE_ENFORCE_WITH_CALLER( is_contiguous_, "Tensor must be contiguous in order to call ShrinkTo."); CAFFE_ENFORCE_WITH_CALLER(dims_.size() >= 1, "Tensor must be at least 1D"); CAFFE_ENFORCE_WITH_CALLER( outer_dim <= dims_[0], "New outer dimension must be smaller than current."); CAFFE_ENFORCE( storage_.unique(), "Can't call ShrinkTo on shared storage, please call Resize instead."); dims_[0] = outer_dim; numel_ = std::accumulate( dims_.begin(), dims_.end(), static_cast<TIndex>(1), std::multiplies<TIndex>()); }
Tensor上的Resize操作,则主要是更改dims;只有某些情况下,一些flags存在,加上实时条件满足则会真正地释放掉旧的内存,下一次mutable_data调用时则重新分配内存。
/** * @brief Resizes a tensor. * * Resize takes in a vector of ints specifying the dimensions of the tensor. * You can pass in an empty vector to specify that it is a scalar (i.e. * containing one single item). * * The underlying storage may be deleted after calling Resize: if the new * shape leads to a different number of items in the tensor, the old memory * is deleted and new memory will be allocated next time you call * mutable_data(). However, if the shape is different but the total number of * items is the same, the underlying storage is kept. */ template <typename... Ts> void Resize(Ts... dim_source) { bool is_init = numel_ == -1; bool size_changed = SetDims(dim_source...); if (size_changed) { // If needed, we will free the data. the next mutable_data() call // will create the data storage. bool reset_tensor = false; if (reserved_) { // If tensor is reserved then don't claim its memeory unless capacity() // is smaller than new size reset_tensor = storage_.capacity() < numel_ * storage_.itemsize(); } else { reset_tensor = storage_.capacity() < numel_ * storage_.itemsize() || !FLAGS_caffe2_keep_on_shrink || storage_.capacity() - numel_ * storage_.itemsize() > FLAGS_caffe2_max_keep_on_shrink_memory; } if (reset_tensor && !is_init) { FreeMemory(); } } }
Reshape则只是需要保证新的dims与旧的dims总的size相同,然后直接改变dims,而对底下的storage对象则并不改动。
/** * Resizes the tensor without touching underlying storage. * This requires the total size of the tensor to remains constant. */ inline void Reshape(const vector<TIndex>& dims) { CAFFE_ENFORCE_WITH_CALLER( is_contiguous_, "Tensor must be contiguous in order to call Reshape."); TIndex new_size = 1; for (auto d : dims) { CAFFE_ENFORCE_GE_WITH_CALLER(d, 0); new_size *= d; } CAFFE_ENFORCE_WITH_CALLER( new_size == numel_, "New size and old size are not equal. You cannot use Reshape, " "but should use Resize." // TODO(jiayq): remove the following warning after pending diffs // stabilize. " The old caffe2 mixes Reshape and Resize but this behavior has " "been changed. If you find this error, most likely you will need " "to change corresponding code from Reshape to Resize."); dims_ = dims; }
ShareData则是为了在多个Tensor之间共享底部的Storage,而它们可以有着完全不同的dims,只需要保证其size相同即可。
/** * @brief Shares the data with another tensor. * * To share data between two tensors, the sizes of the two tensors must be * equal already. The reason we do not implicitly do a Resize to make the two * tensors have the same shape is that we want to allow tensors of different * shapes but the same number of items to still be able to share data. This * allows one to e.g. have a n-dimensional Tensor and a flattened version * sharing the same underlying storage. * * The source tensor should already have its data allocated. */ void ShareData(const TensorImpl& src) { // Right now, we are assuming the device_type are the same, since it is // inherently the same in the non-templatized code. We should probably add // an ENFORCE here which might affect perf a little bit. CAFFE_ENFORCE_EQ_WITH_CALLER( src.numel_, numel_, "Size mismatch - did you call reshape before sharing the data?"); // It is possible that the source tensor hasn't called mutable_data() yet, // in which case ShareData() doesn't make much sense since we don't really // know what to share yet. CAFFE_ENFORCE_WITH_CALLER( src.storage_.data() || src.numel_ == 0, "Source tensor has no content and has size > 0"); // Finally, do sharing. /* Since we create new Storage whenever we need to change data_type/capacity * this still keeps the original semantics */ storage_ = src.storage(); }
以下为返回只读的raw data或者带类型(Typename T)的data的方法。
/** * Returns a const raw void* pointer of the underlying storage. mutable_data() * or raw_mutable_data() must have been called prior to this function call. */ inline const void* raw_data() const { CAFFE_ENFORCE_WITH_CALLER( is_contiguous_, "Tensor must be contiguous in order to call raw_data()"); CAFFE_ENFORCE_WITH_CALLER(storage_.data() || numel_ == 0); return storage_.data(); } /** * Returns a typed pointer of the underlying storage. mutable_data() or * raw_mutable_data() must have been called prior to this function call, and * the data type must be of the correct type. If you want to get a void* * pointer instead, use raw_data(). */ template <typename T> inline const T* data() const { CAFFE_ENFORCE_WITH_CALLER( is_contiguous_, "Tensor must be contiguous in order to call data()"); CAFFE_ENFORCE_WITH_CALLER( storage_.data() || numel_ == 0, "The tensor is of non-zero shape, but its data is not allocated yet. " "Caffe2 uses a lazy allocation, so you will need to call " "mutable_data() or raw_mutable_data() to actually allocate memory."); CAFFE_ENFORCE_WITH_CALLER( IsType<T>(), "Tensor type mismatch, caller expects elements to be ", TypeMeta::TypeName<T>(), " while tensor contains ", storage_.dtype().name()); return static_cast<T*>(storage_.data()); }
以下为有意思的mutable_data的实现方法。其实里面有一个delay执行分配内存的优化做法。
/** * Returns a mutable raw pointer of the underlying storage. Since we will need * to know the type of the data for allocation, a TypeMeta object is passed in * to specify the necessary information. This is conceptually equivalent of * calling mutable_data<T>() where the TypeMeta parameter meta is derived from * the type T. This function differs from mutable_data<T>() in the sense that * the type T can be specified during runtime via the TypeMeta object. * * If the existing data does not match the desired type, it will be deleted * and a new storage will be created. */ inline void* raw_mutable_data(const TypeMeta& meta) { CAFFE_ENFORCE_WITH_CALLER( is_contiguous_, "Tensor must be contiguous in order to call raw_mutable_data()"); // For 0-size tensors it's fine to return any pointer (including nullptr) if (storage_.dtype() == meta && (storage_.data() || numel_ == 0)) { return storage_.data(); } else { bool had_special_dtor = storage_.dtype().dtor() != nullptr; if (storage_.unique()) { storage_.set_dtype(meta); // TODO: recalcuate numel when we store numel instead of capacity in // Storage } else { if (storage_.dtype() != meta) { storage_ = Storage(storage_.device_type(), meta); } } CAFFE_ENFORCE_WITH_CALLER( numel_ >= 0, "Tensor is not initialized. You probably need to call Resize() " "before calling mutable_data()"); // We can reuse the existing buffer if the current data does not have // a special destructor and the new data doesn't have a special // constructor. if (numel_ == 0 || (meta.ctor() == nullptr && !had_special_dtor && storage_.capacity() >= numel_ * storage_.itemsize())) { return storage_.data(); } if (meta.ctor()) { // For types that need placement new, we will call it, as well as // making sure that when the data is freed, it calls the right // destruction procedure. auto size = numel_; auto dtor = storage_.dtype().dtor(); auto ptr_and_deleter = GetStaticContext()->New(numel_ * storage_.itemsize()); auto deleter = ptr_and_deleter.second; storage_.data_ptr().reset( ptr_and_deleter.first, [size, dtor, deleter](void* ptr) -> void { dtor(ptr, size); deleter(ptr); }); storage_.dtype().ctor()(storage_.data(), numel_); } else { // For fundamental type, new and delete is easier. auto ptr_and_deleter = GetStaticContext()->New(numel_ * storage_.itemsize()); storage_.data_ptr().reset( ptr_and_deleter.first, ptr_and_deleter.second); } storage_.set_numel(numel_); return storage_.data(); } }
下面为它的一些protected成员。
protected: DimVector dims_; // sizes_ DimVector strides_; TIndex numel_ = -1; // numel_ bool is_contiguous_ = true; // we decide to keep reserved_ and it will // live in Tensor after the split // The logic is that if Extend() or ReserveSpace() were ever called, // then subsequent Resize()s will not free up Storage. bool reserved_ = false; Storage storage_; // int64_t storage_offset_;
我们还有一个static的UndefinedTensor使用。
class CAFFE2_API UndefinedTensorImpl final : public TensorImpl { UndefinedTensorImpl() : TensorImpl(CPU){}; public: // Without this, we get: // error: identifier "at::UndefinedTensor::_singleton" is undefined in device code // (ostensibly because the constexpr tricks MSVC into trying to compile this // function for device as well).#ifdef _WIN32 static inline TensorImpl * singleton() {#else static constexpr inline TensorImpl * singleton() {#endif return &singleton_; } private: static UndefinedTensorImpl singleton_; };
Tensor
最后我们来看下真正外部程序所见的对象,Tensor。
以下为它的构造函数及基本类成员,从此易知它的大部分操作都是借助TensorImpl来完成的。在它的多个构造函数当中还有一个模板构造函数。
/** * @brief Tensor class holds a shared pointer to the implementation TensorImpl, * redirects API calls to TensorImpl; * Copying of Tensor results in sharing the same underlying implementation * object */class CAFFE2_API Tensor final { protected: using TensorImplPtr = c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>; TensorImplPtr impl_; public: Tensor() : impl_() {} operator bool() const { return impl_.defined(); } explicit Tensor(const vector<TIndex>& dims, DeviceType type) : impl_( c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(dims, type)) {} template < typename T, typename = typename std::enable_if<std::is_scalar<T>::value>::type> Tensor(const T& value, BaseContext* context) : impl_(c10::make_intrusive<TensorImpl, UndefinedTensorImpl>( value, context)) {}
以下为Tensor借助TensorImpl实现一些操作的提现。
inline int ndim() const { return impl_.get()->ndim(); } inline TIndex size() const { return impl_.get()->size(); } inline size_t itemsize() const { return impl_.get()->itemsize(); } inline size_t nbytes() const { return impl_.get()->nbytes(); } inline size_t capacity_nbytes() const { return impl_.get()->capacity_nbytes(); }
作者:manofmountain
链接:https://www.jianshu.com/p/21c74798062b