[VB]如何提取网页的数据
当前位置:点晴教程→知识管理交流
→『 技术文档交流 』
用XML和HTMLDocument处理的 Visual Basic code
Private Sub Command1_Click()Dim XMLObject As Object, HTMLDoc As ObjectDim SendStr As String, HTMLStr As StringDim DataInfo As String, S As Long, E As LongDim Info(66) As String, TempArray() As StringDim X As Long, Y As Long, I As Long, TempStr As StringDim TitleMaxByte As Long, TitleByte As Long'初始化变量Y = 0I = 0TitleMaxByte = 0TempStr = ""'通过XML取得网页数据内容Set XMLObject = CreateObject("Microsoft.XMLHTTP")Set HTMLDoc = CreateObject("htmlfile")XMLObject.open "GET", "http://quotes.money.163.com/corp/1034/code=600221.html", FalseXMLObject.setRequestHeader "CONTENT-TYPE", "application/x-www-form-urlencoded"XMLObject.Send SendStrHTMLStr = StrConv(XMLObject.ResponseBody, vbUnicode)'通过HTMLDocument对象分析出网页内包含的文本HTMLDoc.body.innerHTML = HTMLStrDataInfo = HTMLDoc.body.innerText '从网页中取得全部文本信息'取得相关的资料位置S = InStr(1, DataInfo, "报表日期")E = InStr(S, DataInfo, "主编信箱")'提取资料文本DataInfo = Mid(DataInfo, S, E - S - 4)'将文本分割成以行为单位的数组TempArray = Split(DataInfo, vbCrLf)'为了让最后输出的文本在格式上比较好看,所以就取出信息字段的最大字节数作为格式化标准For X = 0 To 66Info(X) = RTrim(TempArray(X)) '将右边的空格符去掉TitleByte = LenB(StrConv(Info(X), vbFromUnicode)) '取字段标题字节数If TitleByte > TitleMaxByte Then TitleMaxByte = TitleByte '纪录最大字节数Next X'将标题内容统一格式化为最大字节数,以空格填充For X = 0 To 66'判断如果是大类标题就不处理If Right(Info(X), 1) <> ":" ThenTitleByte = LenB(StrConv(Info(X), vbFromUnicode)) '取当前处理的字段标题字节数Info(X) = Info(X) & String(TitleMaxByte - TitleByte, " ") & vbTab '用空格填充标题内容End IfNext X'将数据放入字段行数组中For X = 67 To UBound(TempArray)If Y >= 67 Then Y = 0: I = I + 1'判断如果是大类标题就不处理If Right(Info(Y), 1) <> ":" ThenIf I = 0 ThenInfo(Y) = Info(Y) & TempArray(X)ElseInfo(Y) = Info(Y) & "," & TempArray(X)End IfEnd IfY = Y + 1Next X'将处理好的行文本集合到一个文本变量中For X = 0 To UBound(Info)If Len(TempStr) = 0 ThenTempStr = Info(X)ElseTempStr = TempStr & vbCrLf & Info(X)End IfNext X'输出文本Text1.Text = TempStrEnd Sub其实效率差不多的,只是少了下载图片和处理显示网页的时间, 用WebBrowser的方法我这里测试的是7秒,用这个方法是5秒。 不过这种方法理论上说是要快点。 该文章在 2014/3/25 0:19:13 编辑过 |
关键字查询
相关文章
正在查询... |