猿问

使用函数参数过滤 CSV 文件

所以我正在编写一个函数来根据函数参数过滤 csv 文件,然后在过滤后找到一列的平均值。我只允许使用 import csv (没有 pandas)并且不能使用 lambda 或任何其他 python“高级”快捷方式。我觉得我可以轻松获得平均部分,但我在根据我提到的参数和约束对其进行过滤时遇到了麻烦。我通常会使用 pandas 来解决这个问题,这使得这个过程更容易,但我不能。


这是我的代码:


def calc_avg(self, specific, filter, logic, threshold):

        

        with open(self.load_data, 'r') as avg_file:

            for row in csv.DictReader(avg_file, delimiter= ','):

                specific = row[specific]

                filter = int(row[filter])

                logic = logic

                threshold = 0

                

                if logic == 'lt':

                    filter < threshold

                    

                elif logic == 'gt':

                    filter > threshold

                    

                elif logic == 'lte':

                    filter <= threshold

                    

                elif logic == 'gte':

                    filter >= threshold

                    

它应该与这个命令一起使用


print(csv_data.calc_avg("Length_of_stay", filter="SOFA", logic="lt", threshold="15"))

这是代码和列标题的格式。样本数据:


RecordID SAPS-I SOFA    Length_of_stay  

132539    6      1         5    

132540    16     8         8    

132541    21     11       19    

132545    17     2         4    

132547    14     11        6    

132548    14     4         9    

132551    19     8         6    

132554    11     0        17    


慕运维8079593
浏览 109回答 2
2回答

狐的传说

更新此选项计算一次并返回一个可在迭代行时使用的logic函数。compare当数据有很多行时,速度会更快。# written as a function because you don't share the definition of load_data# but the main idea can be translated to a classdef calc_avg(self, specific, filter, logic, threshold):&nbsp; &nbsp; if isinstance(threshold, str):&nbsp; &nbsp; &nbsp; &nbsp; threshold = float(threshold)&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; def lt(a, b): return a < b&nbsp; &nbsp; def gt(a, b): return a > b&nbsp; &nbsp; def lte(a, b): return a <= b&nbsp; &nbsp; def gte(a, b): return a >= b&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; if logic == 'lt': compare = lt&nbsp; &nbsp; elif logic == 'gt': compare = gt&nbsp; &nbsp; elif logic == 'lte': compare = lte&nbsp; &nbsp; elif logic == 'gte': compare = gte&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; with io.StringIO(self) as avg_file: # change to open an actual file&nbsp; &nbsp; &nbsp; &nbsp; running_sum = running_count = 0&nbsp; &nbsp; &nbsp; &nbsp; for row in csv.DictReader(avg_file, delimiter=','):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if compare(int(row[filter]), threshold):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; running_sum += int(row[specific])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # or float(row[specific])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; running_count += 1&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; if running_count == 0:&nbsp; &nbsp; &nbsp; &nbsp; # no even one row passed the filter&nbsp; &nbsp; &nbsp; &nbsp; return 0&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; return running_sum / running_countprint(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '15'))print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '2'))print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '0'))输出9.2511.00初步答复为了过滤行,一旦确定应该使用哪种类型的不等式,就必须进行比较。这里的代码将其存储在 boolean 中include。然后你可以有两个变量:running_sum和running_count稍后应该除以返回平均值。import ioimport csv# written as a function because you don't share the definition of load_data# but the main idea can be translated to a classdef calc_avg(self, specific, filter, logic, threshold):&nbsp; &nbsp; if isinstance(threshold, str):&nbsp; &nbsp; &nbsp; &nbsp; threshold = float(threshold)&nbsp; &nbsp; with io.StringIO(self) as avg_file: # change to open an actual file&nbsp; &nbsp; &nbsp; &nbsp; running_sum = running_count = 0&nbsp; &nbsp; &nbsp; &nbsp; for row in csv.DictReader(avg_file, delimiter=','):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # your code has: filter = int(row[filter])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; value = int(row[filter]) # avoid overwriting parameters&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if logic == 'lt' and value < threshold:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; include = True&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif logic == 'gt' and value > threshold:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; include = True&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif logic == 'lte' and value <= threshold: # should it be 'le'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; include = True&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif logic == 'gte' and value >= threshold: # should it be 'ge'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; include = True&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # or import ast and consider all cases in one line&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # if ast.literal_eval(f'{value}{logic}{treshold}'):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # include = True&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; include = False&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if include:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; running_sum += int(row[specific])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # or float(row[specific])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; running_count += 1&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; return running_sum / running_count&nbsp; &nbsp;&nbsp;data = """RecordID,SAPS-I,SOFA,Length_of_stay132539,6,1,5132540,16,8,8132541,21,11,19132545,17,2,4132547,14,11,6132548,14,4,9132551,19,8,6132554,11,0,17"""print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '15'))print(calc_avg(data, 'Length_of_stay', 'SOFA', 'lt', '2'))输出9.2511.0

陪伴而非守候

您没有对比较结果做任何事情。您需要在if报表中使用它们以将特定值包含在平均值计算中。def calc_avg(self, specific, filter, logic, threshold):&nbsp; &nbsp; with open(self.load_data, 'r') as avg_file:&nbsp; &nbsp; &nbsp; &nbsp; values = []&nbsp; &nbsp; &nbsp; &nbsp; for row in csv.DictReader(avg_file, delimiter= ','):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; specific = row[specific]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; filter = int(row[filter])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; threshold = 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if logic == 'lt' and filter < threshold:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; values.append(specific)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif logic == 'gt' and filter > threshold:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; values.append(specific)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif logic == 'lte' and filter <= threshold:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; values.append(specific)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif logic == 'gte' and filter >= threshold:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; values.append(specific)&nbsp; &nbsp; &nbsp; &nbsp; if len(values) > 0:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return sum(values) / len(values)&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return 0
随时随地看视频慕课网APP

相关分类

Python
我要回答