我有一个数据框(game_df)和一个Postgres表(team_stats_1970_2017)。
game_df 由几千行包含这样的数据...
season_yr home_team visitor_team home_team_runs visitor_team_runs
0 2017 ARI SFG 6 5
1 2017 ARI SFG 4 8
2 2017 ARI SFG 8 6
3 2017 ARI SFG 9 3
4 2017 ARI CLE 7 3
5 2017 ARI CLE 11 2
6 2017 ATL LAD 2 3
team_stats_1970_2017 会有这个对应的数据
team season_yr r_per_g pa ab b_r b_h b2 b3 b_hr
0 ARI 2017 5.01 6224.0 5525 812 1405 314 39 220
1 ATL 2017 4.52 6216.0 5584 732 1467 289 26 165
2 CLE 2017 5.05 6234.0 5511 818 1449 333 29 212
3 LAD 2017 4.75 6191.0 5408 770 1347 312 20 221
4 SFG 2017 3.94 6137.0 5551 639 1382 290 28 128
例如,对于的第1行game_df,代码从team_stats_1970_2017Postgres中选择“ ARI”和“ SFG”数据,并由此创建特征。然后,对其中的其余行重复此操作game_df。
我目前正在使用df.iterrows,但是我注意到它的速度很慢,因为我仅测试了我的一小部分数据,并且仍然需要一段时间。有人会为此提供更好/更快的替代方法吗?
features = []
results = []
for index,row in game_df.iterrows():
import psycopg2 as pg2
connect = pg2.connect(login)
cursor=connect.cursor()
year, t1, t2, p1, p2 = row
p1, p2 = map(int, [p1, p2])
feature1 = cursor.execute("SELECT * FROM team_stats_1970_2017 WHERE team = (%s) and season_yr = (%s)",(t1,year))
feature1 = list(cursor.fetchone()[2::])
feature2 = cursor.execute("SELECT * FROM team_stats_1970_2017 WHERE team = (%s) and season_yr = (%s)",(t2,year))
feature2 = list(cursor.fetchone()[2::])
feature = np.array(feature2) - np.array(feature1)
芜湖不芜
相关分类