Wednesday, 11 September 2013

httpwebrequest getting weird characters instead of html code

httpwebrequest getting weird characters instead of html code

iam trying to crawl some sites. It works like a charm. But there is a
major problem. On some pages (not mutch) I'm getting some weird characters
instead of html code.
It looks like this:
;&#65533;<cS&#65533;&#65533;&#65533;u&#65533;/&#65533;qYa$&#65533;4l7&#65533;.&#65533;Q&#65533;7&&#65533;&#65533;O&#65533;&#65533;&#65533;&#65533;&#65533;
Z&#65533;D}z&#65533;&#65533;/&#65533;&#65533;&#65533;&#65533;
&#65533;&#65533;u&#65533;&#65533;&#65533;&#65533;V&#65533;&#65533;&#65533;lWY|&#65533;n5&#65533;1&#65533;We&#65533;&#65533;&#65533;&#65533;GB&#65533;U&#65533;&#65533;g{&#65533;&#65533;
&#65533;|&#1015;&#65533;&#65533;&#65533;&#65533;*&#65533;Q&#65533;&#65533;0&#65533;&#65533;&#65533;nb&#65533;o&#65533;&#2031;&#65533;&#65533;&#65533;&#65533;&#65533;[b&#65533;&#65533;/&#65533;&#65533;&#65533;&#65533;@C&#401;&#65533;&#65533;&#65533;&#65533;D{{/n&#65533;&#65533;X&#65533;!&#65533;
&#65533;Et&#65533;X"&#65533;&#65533;&#65533;&#65533;?&#65533;&#65533;&#745;&#65533;&#65533;&#65533;&#65533;8\y&#65533;&#65533;&
If I'll open it in my browser, there is no Problem at all. I dont
understand why.
My HTTP Header says:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control:max-age=0 Connection:keep-alive User-Agent:Mozilla/5.0
(Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/29.0.1547.66 Safari/537.36
I think it has something to do with the Accept
request.Accept = "*/*"Thats my webrequest:
Public Class Http
Dim cookieCon As New CookieContainer
Dim request As HttpWebRequest
Dim response As HttpWebResponse
Public Function GetRequest(ByVal Params() As Object)
Dim url As String = Params(0)
Dim mycookie As String = Params(1)
'request.AllowAutoRedirect = True
request = CType(HttpWebRequest.Create(url), HttpWebRequest)
request.CookieContainer = New CookieContainer()
request.Method = "GET"
request.Timeout = 20000
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66
Safari/537.36"
'request.ContentType = "application/x-www-form-urlencoded"
request.Accept = "*/*"
If Not mycookie Like "nocookie" Then
request.Headers("Cookie") = mycookie
End If
response = CType(request.GetResponse(), HttpWebResponse)
Dim html(1) As String
html(0) = request.Address.ToString()
html(1) = New StreamReader(response.GetResponseStream()).ReadToEnd()
Return html
End Function
Thanks.

No comments:

Post a Comment