快读大表。

我的 csv 文件结构如下:


1,0,2.2,0,0,0,0,1.2,0

0,1,2,4,0,1,0.2,0.1,0

0,0,2,3,0,0,0,1.2,2.1

0,0,0,1,2,1,0,0.2,0.1

0,0,1,0,2.1,0.1,0,1.2

0,0,2,3,0,1.1,0.1,1.2

0,0.2,0,1.2,2,0,3.2,0

0,0,1.2,0,2.2,0,0,1.1

但有 10k 列和 10k 行。我想以这样的方式阅读它,在结果中我得到一个字典,其中 Key 作为行的索引,Value 作为 float 数组,其中包含该行中的每个值。现在我的代码看起来像这样:


 var lines = File.ReadAllLines(filePath).ToList();

 var result = lines.AsParallel().AsOrdered().Select((line, index) =>

 {

    var values = line?.Split(',').Where(v =>!string.IsNullOrEmpty(v))

         .Select(f => f.Replace('.', ','))      

         .Select(float.Parse).ToArray();

    return (index, values);     

  }).ToDictionary(d => d.Item1, d => d.Item2);

但它最多需要 30 秒才能完成,所以它很慢,我想优化它以使其更快一些。


慕工程0101907
浏览 173回答 3
3回答

一只斗牛犬

虽然您可以进行许多小的优化,但真正让您丧命的是垃圾收集器,因为所有的分配。你的代码在我的机器上运行需要 12 秒。读取文件使用了这 12 秒中的 2 秒。通过使用评论中提到的所有优化(使用File.ReadLines, StringSplitOptions.RemoveEmptyEntries,也使用float.Parse(f, CultureInfo.InvariantCulture)而不是调用string.Replace),我们将时间缩短到 9 秒。仍有很多分配已完成,尤其是File.ReadLines. 我们能做得更好吗?只需在 app.config 中激活服务器 GC:<runtime>&nbsp; &nbsp;&nbsp; <gcServer enabled="true" /></runtime>这样,使用您的代码执行时间下降到 6 秒,使用上述优化后执行时间下降到 3 秒。那时,文件 I/O 占用了超过 60% 的执行时间,因此不值得进一步优化。代码的最终版本:var lines = File.ReadLines(filePath);var separator = new[] {','};var result = lines.AsParallel().AsOrdered().Select((line, index) =>{&nbsp; &nbsp; var values = line?.Split(separator, StringSplitOptions.RemoveEmptyEntries)&nbsp; &nbsp; &nbsp; &nbsp; .Select(f => float.Parse(f, CultureInfo.InvariantCulture)).ToArray();&nbsp; &nbsp; return (index, values);}).ToDictionary(d => d.Item1, d => d.Item2);

料青山看我应如是

用手动解析替换SplitandReplace并使用InvariantInfo接受句点作为小数点,然后删除浪费ReadAllLines().ToList()并AsParallel()在解析时从文件中读取,在我的 PC 上加速了大约四倍。var lines = File.ReadLines(filepath);var result = lines.AsParallel().AsOrdered().Select((line, index) => {&nbsp; &nbsp; var values = new List<float>(10000);&nbsp; &nbsp; var pos = 0;&nbsp; &nbsp; while (pos < line.Length) {&nbsp; &nbsp; &nbsp; &nbsp; var commapos = line.IndexOf(',', pos);&nbsp; &nbsp; &nbsp; &nbsp; commapos = commapos < 0 ? line.Length : commapos;&nbsp; &nbsp; &nbsp; &nbsp; var fs = line.Substring(pos, commapos - pos);&nbsp; &nbsp; &nbsp; &nbsp; if (fs != String.Empty) // remove if no value is ever missing&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; values.Add(float.Parse(fs, NumberFormatInfo.InvariantInfo));&nbsp; &nbsp; &nbsp; &nbsp; pos = commapos + 1;&nbsp; &nbsp; }&nbsp; &nbsp; return values;}).ToList();也用 a代替ToArray,因为它通常更快(优于)。valuesListToListToArray

哆啦的时光机

using Microsoft.VisualBasic.FileIO;&nbsp; &nbsp; &nbsp; &nbsp;protected void CSVImport(string importFilePath)&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; string csvData = System.IO.File.ReadAllText(importFilePath, System.Text.Encoding.GetEncoding("WINDOWS-1250"));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; foreach (string row in csvData.Split('\n'))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; var parser = new TextFieldParser(new StringReader(row));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parser.HasFieldsEnclosedInQuotes = true;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parser.SetDelimiters(",");&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; string[] fields;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fields = parser.ReadFields();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;//do what you need with data in array&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }
打开App,查看更多内容
随时随地看视频慕课网APP