拉风的咖菲猫
数据调用自:POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx内容在被OpenLayers 库使用之前以自定义格式进行编码。所有的解码都位于这个JS文件中。如果你美化了,你可以找一下它的WayTo.Wtb.Format.WTB解码OpenLayers.Class。二进制文件按照 JS 中的如下所示逐字节解码:switch(elementType){ case 1: var lineColor = new WayTo.Wtb.Element.LineColor(); byteOffset = lineColor.parse(dataReader, byteOffset); outputElement = lineColor; break; case 2: var lineStyle = new WayTo.Wtb.Element.LineStyle(); byteOffset = lineStyle.parse(dataReader, byteOffset); outputElement = lineStyle; break; case 3: var ellipse = new WayTo.Wtb.Element.Ellipse(); byteOffset = ellipse.parse(dataReader, byteOffset); outputElement = ellipse; break; ........}我们必须重现这个解码算法才能获得原始数据。我们不需要解码所有对象,我们只想获得正确的偏移量并strings正确提取。这里有一个Python解码部分的脚本,用于解码文件中的数据(输出卷曲):with open("wtb.bin", mode='rb') as file: encodedData = file.read() offset = 0 objects = [] while offset < len(encodedData): elementSize = encodedData[offset] offset+=1 elementType = encodedData[offset] offset+=1 if elementType == 0: break curElemSize = elementSize curElemType = elementType if elementType== 114: largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big") offset+=4 largeElementType = int.from_bytes(encodedData[offset:offset+2], "little") offset+=2 curElemSize = largeElementSize curElemType = largeElementType print(f"type {curElemType} | size {curElemSize}") offsetInit = offset if curElemType == 1: offset+=4 elif curElemType == 2: offset+=2 elif curElemType == 3: offset+=20 elif curElemType == 4: offset+=28 elif curElemType == 5: offset+=12 elif curElemType == 6: textLength = curElemSize - 3 objects.append({ "type": "Text", "x_position": int.from_bytes(encodedData[offset:offset+2], "little"), "y_position": int.from_bytes(encodedData[offset+2:offset+4], "little"), "rotation": int.from_bytes(encodedData[offset+4:offset+6], "little"), "text": encodedData[offset+6:offset+6+(textLength*2)].decode("utf-8").replace('\x00','') }) offset+=6+(textLength*2) elif curElemType == 7: numPoint = int(curElemSize / 2) offset+=4*numPoint elif curElemType == 27: numPoint = int(curElemSize / 4) offset+=8*numPoint elif curElemType == 8: numPoint = int(curElemSize / 2) offset+=4*numPoint elif curElemType == 28: numPoint = int(curElemSize / 4) offset+=8*numPoint elif curElemType == 13: offset+=4 elif curElemType == 14: offset+=2 elif curElemType == 15: offset+=2 elif curElemType == 100: pass elif curElemType == 101: offset+=20 elif curElemType == 102: offset+=2 elif curElemType == 103: pass elif curElemType == 104: highShort = int.from_bytes(encodedData[offset+2:offset+4], "little") lowShort = int.from_bytes(encodedData[offset+4:offset+6], "little") objects.append({ "type": "StartNumericCell", "entity": int.from_bytes(encodedData[offset:offset+2], "little"), "occurrence": (highShort << 16) + lowShort }) offset+=6 elif curElemType == 105: #end cell pass elif curElemType == 109: textLength = curElemSize - 1 objects.append({ "type": "StartAlphanumericCell", "entity": int.from_bytes(encodedData[offset:offset+2], "little"), "occurrence":encodedData[offset+2:offset+2+(textLength*2)].decode("utf-8").replace('\x00','') }) offset+=2+(textLength*2) elif curElemType == 111: offset+=40 elif curElemType == 112: objects.append({ "type": "CoordinatePlane", "projection_code": encodedData[offset+48:offset+52].decode("utf-8").replace('\x00','') }) offset+=52 elif curElemType == 113: offset+=24 elif curElemType == 256: nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little") objects.append({ "type": "LargePolygon", "name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''), "occurence": int.from_bytes(encodedData[offset+2:offset+6], "little") }) if nameLength > 0: offset+= 16 + nameLength if encodedData[offset] == 0: offset+=1 else: offset+= 16 numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little") offset+=2 offset+=numberOfPoints*8 elif curElemType == 257: pass else: offset+= curElemSize*2 print(f"offset diff {offset-offsetInit}") print("--------------------------------") print(objects) print(len(encodedData)) print(offset)(旁注:请注意,元素大小采用大端字节序,所有其他值均采用小端字节序)运行这个 repl.it以查看它如何解码文件从那里我们构建了抓取数据的步骤,为了清楚起见,我将描述所有步骤(甚至是您已经弄清楚的步骤):登录使用以下命令登录网站:GET https://alta.registries.gov.ab.ca/spinii/logon.aspx抓取输入名称/值并添加uctrlLogon:cmdLogonGuest.x,uctrlLogon:cmdLogonGuest.y然后调用POST https://alta.registries.gov.ab.ca/spinii/logon.aspx法律声明法律声明调用对于获取地图值不是必需的,但对于获取项目信息是必需的(帖子中的最后一步)GET https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx抓取input标签名称/值并设置cmdYES.x然后cmdYES.y调用POST https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx地图数据调用服务器地图API:POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx有以下数据:{ "mt":"titleresults", "qt":"lincNo", "LINCNumber": lincNumber, "rights": "B", #not required "cx": 1920, #screen definition "cy": 1080,}cx/xy是画布尺寸使用上述方法对编码数据进行解码。你会得到 :[{'type': 'LargePolygon', 'name': '0010495134 8722524;1;162', 'entity': 23, 'occurence': 628079167, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170859 8022146;8;99', 'entity': 23, 'occurence': 628048595, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010691822 8722524;1;163', 'entity': 23, 'occurence': 628222354, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169736 8022146;8;89', 'entity': 23, 'occurence': 628021327, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694454 8722524;1;179', 'entity': 23, 'occurence': 628191678, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694362 8722524;1;178', 'entity': 23, 'occurence': 628307403, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010433381 8722524;1;177', 'entity': 23, 'occurence': 628209696, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169710 8022146;8;88A', 'entity': 23, 'occurence': 628021328, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694355 8722524;1;176', 'entity': 23, 'occurence': 628315826, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170866 8022146;8;100', 'entity': 23, 'occurence': 628163431, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694347 8722524;1;175', 'entity': 23, 'occurence': 628132810, 'line_color_green': 0, 'line_color_red': 129,提取信息如果您想针对特定的目标,lincNumber则需要查找多边形的样式,因为对于“多个”值(例如具有多个项目的值),没有提及lincNumber响应的 id,只有链接引用。以下将获取所选项目:selectedZone = [ t for t in objects if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255][0]print(selectedZone)调用您在帖子中提到的网址来获取数据并提取表:GET https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}完整代码:import requestsfrom bs4 import BeautifulSoupimport pandas as pdlincNumber = "0030278592"#lincNumber = "0010661156"s = requests.Session()# 1) loginr = s.get("https://alta.registries.gov.ab.ca/spinii/logon.aspx")soup = BeautifulSoup(r.text, "html.parser")payload = dict([ (t["name"], t.get("value", "")) for t in soup.findAll("input")])payload["uctrlLogon:cmdLogonGuest.x"] = 76payload["uctrlLogon:cmdLogonGuest.y"] = 25s.post("https://alta.registries.gov.ab.ca/spinii/logon.aspx",data=payload)# 2) legal noticer = s.get("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx")soup = BeautifulSoup(r.text, "html.parser")payload = dict([ (t["name"], t.get("value", "")) for t in soup.findAll("input")])payload["cmdYES.x"] = 82payload["cmdYES.y"] = 3s.post("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx", data = payload)# 3) map datar = s.post("http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx", data= { "mt":"titleresults", "qt":"lincNo", "LINCNumber": lincNumber, "rights": "B", #not required "cx": 1920, #screen definition "cy": 1080, })def decodeWtb(encodedData): offset = 0 objects = [] iteration = 0 while offset < len(encodedData): elementSize = encodedData[offset] offset+=1 elementType = encodedData[offset] offset+=1 if elementType == 0: break curElemSize = elementSize curElemType = elementType if elementType== 114: largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big") offset+=4 largeElementType = int.from_bytes(encodedData[offset:offset+2], "little") offset+=2 curElemSize = largeElementSize curElemType = largeElementType offsetInit = offset if curElemType == 1: offset+=4 elif curElemType == 2: offset+=2 elif curElemType == 3: offset+=20 elif curElemType == 4: offset+=28 elif curElemType == 5: offset+=12 elif curElemType == 6: textLength = curElemSize - 3 offset+=6+(textLength*2) elif curElemType == 7: numPoint = int(curElemSize / 2) offset+=4*numPoint elif curElemType == 27: numPoint = int(curElemSize / 4) offset+=8*numPoint elif curElemType == 8: numPoint = int(curElemSize / 2) offset+=4*numPoint elif curElemType == 28: numPoint = int(curElemSize / 4) offset+=8*numPoint elif curElemType == 13: offset+=4 elif curElemType == 14: offset+=2 elif curElemType == 15: offset+=2 elif curElemType == 100: pass elif curElemType == 101: offset+=20 elif curElemType == 102: offset+=2 elif curElemType == 103: pass elif curElemType == 104: offset+=6 elif curElemType == 105: pass elif curElemType == 109: textLength = curElemSize - 1 offset+=2+(textLength*2) elif curElemType == 111: offset+=40 elif curElemType == 112: offset+=52 elif curElemType == 113: offset+=24 elif curElemType == 256: nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little") objects.append({ "type": "LargePolygon", "name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''), "entity": int.from_bytes(encodedData[offset:offset+2], "little"), "occurence": int.from_bytes(encodedData[offset+2:offset+6], "little"), "line_color_green": encodedData[offset + 8], "line_color_red": encodedData[offset + 7], "line_color_blue": encodedData[offset + 9], "fill_color_green": encodedData[offset + 10], "fill_color_red": encodedData[offset + 11], "fill_color_blue": encodedData[offset + 13] }) if nameLength > 0: offset+= 16 + nameLength if encodedData[offset] == 0: offset+=1 else: offset+= 16 numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little") offset+=2 offset+=numberOfPoints*8 elif curElemType == 257: pass else: offset+= curElemSize*2 return objects# 4) decode custom formatobjects = decodeWtb(r.content)# 5) get the selected areaselectedZone = [ t for t in objects if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255][0]print(selectedZone)# 6) get the info about itemr = s.get(f'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}')df = pd.read_html(r.content, attrs = {'class': 'bodyText'}, header =0)[0]del df['Add to Cart']del df['View']print(df[:-1])在 repl.it 上运行这个输出 Title Number Type LINC Number Short Legal Rights Registration Date Change/Cancel Date0 052400228 Current Title 0030278592 0420091;16 Surface 19/09/2005 13/11/20191 072294084 Current Title 0030278551 0420091;12 Surface 22/05/2007 21/08/20072 072400529 Current Title 0030278469 0420091;3 Surface 05/07/2007 28/08/20073 072498228 Current Title 0030278501 0420091;7 Surface 18/08/2007 08/02/20084 072508699 Current Title 0030278535 0420091;10 Surface 23/08/2007 13/12/20075 072559500 Current Title 0030278477 0420091;4 Surface 17/09/2007 19/11/20076 072559508 Current Title 0030278576 0420091;14 Surface 17/09/2007 09/01/20097 072559521 Current Title 0030278519 0420091;8 Surface 17/09/2007 07/11/20078 072559530 Current Title 0030278493 0420091;6 Surface 17/09/2007 25/08/20089 072559605 Current Title 0030278485 0420091;5 Surface 17/09/2007 23/12/2008objects如果您想获得更多条目,可以查看该字段。如果您想获得有关坐标等项目的更多信息,您可以改进解码器......还可以通过查看包含 lincNumber 的字段来匹配目标周围的其他 lincNumber,name除非其中存在“多个”名称。