我有一个 C# Azure 函数来从 Blob 读取文件内容并将其写入 Azure Data Lake 目标。该代码适用于大文件(~8 MB 及以上),但对于小文件,目标文件用 0 字节写入。我尝试将块大小更改为较小的数字并将并行线程更改为 1,但行为保持不变。我正在模拟 Visual Studio 2017 中的代码。
请找到我正在使用的代码片段。我已经阅读了有关 Parallel.ForEach 限制的文档,但没有遇到任何特定于文件大小问题的内容。(https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism)
int bufferLength = 1 * 1024 * 1024;//1 MB chunk
long blobRemainingLength = blob.Properties.Length;
var outPutStream = new MemoryStream();
Queue<KeyValuePair<long, long>> queues = new
Queue<KeyValuePair<long, long>>();
long offset = 0;
while (blobRemainingLength > 0)
{
long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
offset += chunkLength;
blobRemainingLength -= chunkLength;
}
Console.WriteLine("Number of Queues: " + queues.Count);
Parallel.ForEach(queues,
new ParallelOptions()
{
//Gets or sets the maximum number of concurrent tasks
MaxDegreeOfParallelism = 10
}, (queue) =>
{
using (var ms = new MemoryStream())
{
blob.DownloadRangeToStreamAsync(ms, queue.Key,
queue.Value).GetAwaiter().GetResult();
lock (mystream)
{
var bytes = ms.ToArray();
Console.WriteLine("Processing on thread {0}",
Thread.CurrentThread.ManagedThreadId);
mystream.Write(bytes, 0, bytes.Length);
}
}
});
慕神8447489
相关分类