VBA抓取带登录和cookie的网页

这里分享两个做爬虫的函数,一个用来拿到cookie,另一个用cookie拿到网页内容。

1.登录后需要记下网站的cookie,这是个获得cookie的小工具。使用它的前提是你已经完成登录过程,比如用WebBrowser控件完成了用户名、密码的登录(参考这里)。参数中url是可以登录的页面地址,errmsg是目标网站可能返回的出错信息,比如“错误”“未登录”之类。返回值是cookie字符串。

Private Declare Function InternetGetCookieEx Lib "wininet.dll" Alias "InternetGetCookieExA" (ByVal url As String, ByVal cookieName As String, ByVal cookieData As String, ByRef size As Long, ByVal flags As Long, ByVal pReserved As Long) As Boolean
Private Const INTERNET_COOKIE_HTTPONLY As Integer = 8192

Function getCookie(url As String, errmsg As String) As String
    Dim cookietmp As String * 8192
    Dim dummy As Variant
    Dim pcchCookiedata As Long
    pcchCookiedata = 8192
    Dim HTTP As Object
    
    On Error GoTo errhandler
    Set HTTP = CreateObject("msxml2.xmlhttp.6.0")
    HTTP.Open "get", url, False
    HTTP.send
    
    If InStr(1, HTTP.responsetext, errmsg) <> 0 Or HTTP.responsetext = "" Then
        getCookie = ""
    Else
        If Not InternetGetCookieEx(url, vbNullString, cookietmp, pcchCookiedata, INTERNET_COOKIE_HTTPONLY, 0) Then
            dummy = InternetGetCookieEx(url, vbNullString, vbNullString, pcchCookiedata, INTERNET_COOKIE_HTTPONLY, 0)
        End If
        getCookie = Mid(cookietmp, 1, pcchCookiedata - 1)
    End If
    Set HTTP = Nothing
    Exit Function
    
errhandler:
    getCookie = ""
    Exit Function
End Function

2.拿到cookie后,就可以用它访问需要登录的网页,用这个函数拿到网页的响应。其中errmsg是目标网页可能给出的出错信息,nRetry是出错后你能接受的最大重试次数。注意,xmlhttp是不支持cookie的,必须用serverxmlhttp。

Function httpRequest(url, cookie, errmsg, nRetry)
    Dim HTTP
    On Error Resume Next
    For j = 0 To nRetry - 1
        Set HTTP = Nothing
        Set HTTP = CreateObject("msxml2.serverxmlhttp.6.0")
        Err.Clear
        HTTP.setTimeouts 0, 5000, 5000, 15000
        HTTP.Open "get", url, True
        HTTP.setRequestHeader "Cookie", cookie
        HTTP.setRequestHeader "Connection", "keep-alive"
        HTTP.send
        If HTTP.waitForResponse(15) = False Then
            HTTP.abort
            temp = ""
        Else
            temp = HTTP.responsetext
        End If
        errFlag = Err.Description <> "" Or Len(temp) = 0 Or InStr(1, temp, errmsg) <> 0
        
        If errFlag = False Then
            Set HTTP = Nothing
            httpRequest = temp
            nRetry = j
            Exit Function
        End If
    Next 'j
    Set HTTP = Nothing
    httpRequest = temp
End Function

以上。

爬的老狼

《VBA抓取带登录和cookie的网页》上有6条评论

  1. 老狼您好,请教一下,第一个获取cookie的函数,我用Chrome登录某网站后,获取的cookie不是登录状态的。我猜想函数获取的只能是WebBrowser或者IE登录的cookie。但问题是,现在微软已不再对IE维护了,我需要登录的网站,无法用IE或者WebBrowser打开。请问还有别的函数吗?

    谢谢!

    1. 我不知道VBA怎么调用IE之外的浏览器。要么试试用chrome登录后,手动获取cookie,把它放在VBA里?

      1. 是的,目前我就是用手动获取。原来的宏能用是能用。但把宏给其他人使用时,我要解释如何手动获取,比较费事。

        另外我也用想用Selenium获取cookie,但目标网站能识别Selenium,然后就不返回数据。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注