我正在尝试制作一个映射器/还原器程序,以从数据集中计算最大/最小温度。我试图自己修改,但是代码不起作用。鉴于我在mapper中进行了更改,因此mapper可以正常运行,但reducer不能运行。
我的示例代码:mapper.py
import re
import sys
for line in sys.stdin:
val = line.strip()
(year, temp, q) = (val[14:18], val[25:30], val[31:32])
if (temp != "9999" and re.match("[01459]", q)):
print "%s\t%s" % (year, temp)
reducer.py
import sys
(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
(key, val) = line.strip().split("\t")
if last_key and last_key != key:
print "%s\t%s" % (last_key, max_val)
(last_key, max_val) = (key, int(val))
else:
(last_key, max_val) = (key, max(max_val, int(val)))
if last_key:
print "%s\t%s" % (last_key, max_val)
文件中的示例行:
690190,13910,2012 ** 0101 * 42.9,18,29.4,18,1033.3,18,968.7,18,10.0,18,8.7,18,15.0,999.9,52.5,31.6 *,0.00I,999.9,000000,
我需要用粗体显示的值。任何的想法!!
如果我将mapper作为简单代码运行,这是我的输出:
root@ubuntu:/home/hduser/files# python maxtemp-map.py
2012 42.9
2012 50.0
2012 47.0
2012 52.0
2012 43.4
2012 52.6
2012 51.1
2012 50.9
2012 57.8
2012 50.7
2012 44.6
2012 46.7
2012 52.1
2012 48.4
2012 47.1
2012 51.8
2012 50.6
2012 53.4
2012 62.9
2012 62.6
该文件包含不同的年份数据。我必须计算每年的最小值,最大值和平均值。
FIELD POSITION TYPE DESCRIPTION
STN--- 1-6 Int. Station number (WMO/DATSAV3 number)
for the location.
WBAN 8-12 Int. WBAN number where applicable--this is the
historical
YEAR 15-18 Int. The year.
MODA 19-22 Int. The month and day.
TEMP 25-30 Real Mean temperature. Missing = 9999.9
Count 32-33 Int. Number of observations in mean temperature
猛跑小猪
相关分类