Here is some code to scrape... Now maybe somebody can help me out and figure out how to scrape




 , and others which this script doesn't want to scrape Applause  I can scrape with a webcontrol without issue, however, I want this to run as a service - so that really isn't a good way to do it.

        Public Function ScrapeURL(ByVal MyURL As String) As String
            Dim ReturnScrape As String = ""
            Dim myUri As New Uri(MyURL)
            Dim MyRequest As HttpWebRequest = DirectCast(WebRequest.Create(myUri.AbsoluteUri), HttpWebRequest)
            MyRequest.AllowAutoRedirect = True
            MyRequest.MaximumAutomaticRedirections = 10
            MyRequest.UserAgent = "Googlebot/2.1 (+"
            MyRequest.KeepAlive = True
            MyRequest.Timeout = 30000

            Dim MyResponse As HttpWebResponse = Nothing
                MyResponse = DirectCast(MyRequest.GetResponse, HttpWebResponse)
            Catch exception1 As WebException
                Return ReturnScrape
            End Try

            Dim MyReader As StreamReader = Nothing
                Dim MyEncoding As New UTF8Encoding
                MyReader = New StreamReader(MyResponse.GetResponseStream, MyEncoding)
                ReturnScrape = MyReader.ReadToEnd
            Catch exception As Exception
            End Try


            'Might want to use


  to Strip out the HTML here
            return (ReturnScrape)

        End Function

Well, I hope someone can help me figure out how to scrape ALL web pages instead of just some of them.  I did find a web control over at, the Chilkat spider... However, I would have to do A LOT of processing of the returned text with that thing.

I'd prefer something that I can bring in (just like a web browser would) and then just READ the text off the resultant page.  Not sure how easy that would be.

Any help?


I ended up using the Chilkat Spider and then re-writing my


 t StripHTML function to strip a bit differently.

Now - Does anyone know how to keep from inserting question marks into the database when inserting text??

I have the:    Characters in the original string and even when I try to


  them out (using character codes 147 & 14Applause changing them to '' (two single quotes) the insert STILL changes them to ?

It's freakin' maddening.


Try making sure that those actually are those codes (147 & 14Applause, they might not be, they might be some unicode version.


For a testing phase, I'm bringing back the stripped html (including the


  against chr(147)/(14Applause and I can see that the characters change.... So, I can only assume that I have replaced them correctly.  That's what is so maddening about it.


hmm. only thing i can think beyond that is a logic error. an order of execution problem. try breaking it down to the smallest chunk of code you can, eliminating any other things you are doing to the string.

Really there should be no reason for it not to work.

the only other suggestion is to use .replace (or whatever it is)
if that works, in the same exact code that the


  wont, then there must be something slightly wrong with the




Yeah, I have tried all that Applause

It has literally come down to the insert statement.. When inserting it changes the characters to a ?.  Everything else works pretty smoothly because I have single-stepped through the program (written about 200 different ways now) to figure it out.

I even changed the code to use parameters and executenonquery.  Not sure what else I can do, but I HAVE to get this to work!  Especially 'cause I can't - it makes me want to try even harder Applause



its on the insert. got it. Its a unicode issue then im pretty sure.
Is this to MsSQL? if so, change the column datatype to Nvarchar, or Nwhateveryouareusing. im guessing you have it set to a non-N type.
if MySQL, same problem i think, but its the 'collation type'. Though i am not actually sure what that should be set to.


MySql (running on Windows). I set everything the column characterset to: utf8 and the Column Collate to: utf8_general_ci.  The datatype is LONGTEXT

Anything seem unusual there?


the mysql collations are like voodoo to me, so I really don't know specifically, but I am pretty sure that may be where the problem is. hopefully someone else knows.


it definitely seems like a utf thing.


  when creating a connection you can specify the character set.
if your tables are set for utf then look at what options you have when making a connection.

Perkiset's Place Home   Politics @ Perkiset's