我可以从我下载的 HTML 文件中提取一些数据吗?它里面有一些 JSON

这是我下载的 HTML 文件的链接

https://drive.google.com/open?id=1z7A9U0qZSVtLMQDbsVtPyZVz9Zm73-ZQ

从这个文件最后你可以看到一些这样的数据

<div data-react-class="packs/v9/phone/containers/AreaCodeListing" data-react-props="{"areaCodes":[{"phone_prefix":"(202) 200","details":["Sprint"],"location":"Washington, DC","href":"/202-200"},{"phone_prefix":"(202) 201","details":["Verizon"],"location":"Washington, DC","href":"/202-201"},{"phone_prefix":"(202) 202","details":["General Service Carrier"],"location":"Washington, DC","href":"/202-202"},{"phone_prefix":"(202) 203","details":["T-Mobile"],"location":"Washington, DC","href":"/202-203"},{"phone_prefix":"(202) 204","details":["XO Communications"],"location":"Washington, DC","href":"/202-204"}

我如何从这个页面提取href值?我认为JSON可以完成这项工作,但我被困在如何达到那个点以获得那个 json

或者有没有其他最好的方法从我下载的这个 HTML 页面中获取href值?


交互式爱情
浏览 173回答 3
3回答

泛舟湖上清波郎朗

第一种方法如果您想要 AreaCode 的整个对象,请先尝试方法。public List<AreaCode> GetAllAreaCodes(string htmlString){&nbsp; &nbsp; List<AreaCode> areraCodes = new List<AreaCode>();&nbsp; &nbsp; Regex rgxAttr = new Regex(@"data-react-props=""{(.*?)}""");&nbsp; &nbsp; Regex rgxValue = new Regex(@"""{(.*?)}""");&nbsp; &nbsp; var attrResult = rgxAttr.Matches(htmlString);&nbsp; &nbsp; List<string> attrValues = new List<string>();&nbsp; &nbsp; foreach (Match match in attrResult)&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; var val = rgxValue.Match(match.Value);&nbsp; &nbsp; &nbsp; &nbsp; attrValues.Add(val.Value.Replace("\"{", "{").Replace("}\"", "}"));&nbsp; &nbsp; }&nbsp; &nbsp; foreach (var item in attrValues)&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; JavaScriptSerializer js = new JavaScriptSerializer();&nbsp; &nbsp; &nbsp; &nbsp; var dn = js.Deserialize<dynamic>(item) as Dictionary<string, object>;&nbsp; &nbsp; &nbsp; &nbsp; if (dn != null && dn.ContainsKey("areaCodes"))&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; var abc = item.Remove(item.Length - 1, 1).Remove(0, 1).Replace(@"""areaCodes"":", "");&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; areraCodes = js.Deserialize<List<AreaCode>>(abc);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; return areraCodes;}public class AreaCode{&nbsp; &nbsp; public string phone_prefix { get; set; }&nbsp; &nbsp; public string location { get; set; }&nbsp; &nbsp; public string href { get; set; }&nbsp; &nbsp; public string[] details { get; set; }}第二种方法如果您只需要 href 值,则使用第二种方法。public List<string> GetAllHref(string htmlString){&nbsp; &nbsp; List<string> hrefList = new List<string>();&nbsp; &nbsp; Regex rgxAttr = new Regex(@"data-react-props=""{(.*?)}""");&nbsp; &nbsp; Regex rgxValue = new Regex(@"""{(.*?)}""");&nbsp; &nbsp; var attrResult = rgxAttr.Matches(htmlString);&nbsp; &nbsp; List<string> attrValues = new List<string>();&nbsp; &nbsp; foreach (Match match in attrResult)&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; var val = rgxValue.Match(match.Value);&nbsp; &nbsp; &nbsp; &nbsp; attrValues.Add(val.Value.Replace("\"{", "{").Replace("}\"", "}"));&nbsp; &nbsp; }&nbsp; &nbsp; dynamic ob = null;&nbsp; &nbsp; foreach (var item in attrValues)&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; JavaScriptSerializer js = new JavaScriptSerializer();&nbsp; &nbsp; &nbsp; &nbsp; var dn = js.Deserialize<dynamic>(item) as Dictionary<string, object>;&nbsp; &nbsp; &nbsp; &nbsp; if (dn != null && dn.ContainsKey("areaCodes"))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ob = dn["areaCodes"];&nbsp; &nbsp; }&nbsp; &nbsp; var s = ob as Array;&nbsp; &nbsp; foreach (Dictionary<string, object> item in s)&nbsp; &nbsp; &nbsp; &nbsp; hrefList.Add(item["href"].ToString());&nbsp; &nbsp; return hrefList;}

狐的传说

您可以使用HTLMAgilityPack等库来解析 HTML 文档,然后根据需要提取 JSON。

30秒到达战场

您下载的文件不是有效的 HTML,因为它是 React 视图。因此,HTMLAgilityPack 之类的工具对您的帮助不大。您可以尝试使用诸如WebKit.NET 之类的无头浏览器,看看您是否有运气。在构建最终 HTML 的过程中,您可能可以在某处插入。除此之外,我能想到的唯一选择是使用正则表达式从文件中获取所需的数据。例如:var regex = new Regex(@"(?<=data-react-props=""){.*}(?=<)");var match = regex.Match(pageContents);if (match.Success){&nbsp; &nbsp; foreach (var gr in match.Groups)&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; Console.WriteLine(gr);&nbsp; &nbsp; }}
打开App,查看更多内容
随时随地看视频慕课网APP