|
tommytx
|
 |
« on: May 10, 2010, 08:55:18 PM » |
|
I am back! and since you guys did such a supe job on my last question... I wanted to try again... So far without a doubt this the smartest forum around town and I hope this is posted on first page of Google Tomorrow morining and brings 5,000 new signups.. hee.. hee.. Do you want me to ping this page... I doubt its needed...
Anyhoo.. here is my predicament... and I did not give up for six hours and you guys/gals will solve it in 2 mins and piss me off... but that is how it is when you know your shit.
' This is the result I want to get. ' Col 1 Col2 Col3 Basic old 3 column table in html ' -------------------------------- 'Apples|sweetfruit|yummy 'Bananas|yellow fruit|very good 'Chicken|Best Fried|excellent 'Barbeque|Love it|eat it all day
I have the above information in a basic 3 column table. Using VBS I want to extract the data and separate it with the | symbol for later processing. I can easily capture the entire document with "Set mybody=IE.document" easy enough. The problem I am having is parsing the document to get the data from each cell. Is there a way I can now parse the variable mybody now and grab the cells of data and separate them with a | with a CR following each groups of 3 cells which is one line.
Some snippets are shown below but if anyone want to play with it, you can find the simple html table htm and the entire vbs code on the web at hXXp://www.vahud.com/web_question/web_question.zip
IF anyone wants more info or code, I can dump the entire code here... or you can get it from the web site above.
Below is a bunch of the stuff I have tried to no avail.
Set mybody = IE.document
Dim a,Num,All,i,links, mytable1, mytable2
set mytable1 = IE.document.getElementsbyTagname("TABLE")
set mytable2 = mytable1.getElementsbyTagname("TR")
set links = mytable2.getElementsbyTagname("TD")
For Each i in links olinks = i.outerHtml ilinks = i.innerText list1 = list1 + i + vbcr list2 = list2 + olinks + vbcr list3 = list3 + "|" + ilinks + "|" + vbcr Next
Oh.. just in case anyone is skittish about downloading any zip stuff from the web from an unknown person, i have included every thing in text files on the web.. so just visit the site listed above with no file name and it will give you the directory.... the htm file and the vbs file have .txt extensions so they cannot jump on you if you are worried... but they are safe... you have my word..and besides you have my website address so you can kick my butt, and Perk may also give you my email.. just to be sure you get me.
Thanks Tom PLS VBS... I can't use Javascript...
Oh.. I just saw the attach button.. so use attach or get from web.. your choice...
|
|
|
|
Logged
|
|
|
|
|
tommytx
|
 |
« Reply #1 on: May 10, 2010, 11:01:24 PM » |
|
Some updated information:
bob = IE.document.getElementsbyTagname("tr") dog = "" for i = 0 to 2 dog = dog + bob.document.getElementsByTagName("td").item(i).innerTEXT + "|" next msgbox dog
This gives "Apples|Sweet Fruit|Yummy" which is exactly what I am looking for but now I need to make it loop three times since there is 3 rows... its is now looping 3 times for the 3 columns as should be. But will not loop the three times for the rows as needed.
However when I try to compete it like this: nothing works.. says can't find bob. I surrounded the code above with this outside loop for the rows.. but no luck.
for z = 0 to 2 bob = IE.document.getElementsbyTagname("tr").item(z) "All code above is here." next
This should present 0,1,2 lines of TR's for the Triple TD loop to pull from. Even if i add the following: bob = IE.document.getElementsbyTagname("tr").item(0) without the next it will not work.. says its looking for bob.. So using item(0,1,or2) flat won't work.. so if it won't work manually, then of course when i try the outside loop it will not work since it won't even work with the manual control.
I will keep working...but I hope one of you miracle magicians will wave your magic wand for me as you have many times in the past. Sorry bout the 3 dupe attachments... Itried to remove 2 of them but could not.. But they are small only 1k.
|
|
|
|
« Last Edit: May 10, 2010, 11:06:22 PM by tommytx »
|
Logged
|
|
|
|
|
Phaėton
|
 |
« Reply #2 on: May 11, 2010, 12:18:18 AM » |
|
can i do it in php and you curl my php script with a url you want to parse  and i return a pipe delimited line by line file or something easy for you to vbparse? could do that in a few minutes with regex p.s. its awful cold here in virginia beach tonight, isnt it?
|
|
|
|
|
Logged
|
When I was your age we used to walk to the TV to change the channel.... _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
|
|
|
|
perkiset
|
 |
« Reply #3 on: May 11, 2010, 01:12:51 AM » |
|
Sorry mate, definitely a job for Regex. And I'm personally allergic to VBS ... Had to get shots of adrenaline to bring me down last time touched it.  Do you want to move this thread to windows or something other than JavaScript? Really doesn't look like the right board for this ...
|
|
|
|
|
Logged
|
It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
|
|
|
|
tommytx
|
 |
« Reply #4 on: May 11, 2010, 09:09:17 AM » |
|
Perk..you da boss.. move it to where ever you want it...
Phae.. Thanks for the offer, if its simple I would appreciate that php... but unless it fakes the site that it is a browser it will not let me in... it hate the dog=file("web address") and also any attempt at file_get_contents() has been routinely rejected.. I tried all that.. and of course VBS simulates a browser completely... IF you have some secrets via php that will fake it into thinking its a live visitor then that would be worth its weight in gold....gold to me is a 25 dollar gift certificate... so anything you could do would be great... PHP would be much better as I could hook it to the Cron and set and forget as it must scrape each 4 hours.. and keep in a database...
I had thought of buying a windows platform to see if I could run from there but not sure how all that works.. but I do have lots of stufff that needs to simulate a browser.. and wondered if VBS could run 24x7 on a windows server an no one would know it was not a live browser.... just dumb about all that..
|
|
|
|
|
Logged
|
|
|
|
|
Phaėton
|
 |
« Reply #5 on: May 12, 2010, 05:40:01 PM » |
|
dim mybody mybody="<table><tr><td>apples</td><td>sweet fruit</td><td>yummy</td>"
msgbox mybody
' of course have the mybody.IE.text be what sets the data for mybody if we ' can figure out how to get the IE.document html into a string ' then you could have the real internet explorer browser do the browsing ' and ship the html off to the php
Set objShell = CreateObject("WScript.Shell") set WshShell = WScript.CreateObject("WScript.Shell")
Const cURL = "http://www.tomstats.info/script.php?thebodydata="+mybody
Dim IE Dim objShell Set objShell = CreateObject("WScript.Shell") Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True IE.Navigate curl While IE.Busy Wend WScript.Sleep(1000)
Wscript.Quit(0)
it could work like the above.. i tried to set mybody=IE.document but it returned the string [object] instead of html you can see here: http://www.youtube.com/watch?v=0g1FYC0kZMQis it IE.document.text maybe? if we could get the .text from it you could post it to a php script easy to write and then get back pipe delimited line by line file of the table... ive got two windows machines .... maybe get a virtual host to run a windows xp vm online somewhere.... then you could vnc in and use phpdesigner 2008 from your UI of choice.
|
|
|
|
« Last Edit: May 12, 2010, 06:45:08 PM by Phaėton »
|
Logged
|
When I was your age we used to walk to the TV to change the channel.... _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
|
|
|
|
Phaėton
|
 |
« Reply #6 on: May 12, 2010, 06:36:15 PM » |
|
Is it some flash supercookie type of stuff that you need the ie browser for? couldnt you just do this in php? <?php
$thehtml = file_get_contents('http://www.tomstats.info/_btlogin/login_tom.htm'); // and then a wicked preg_split sequence or whatever to draw the data out of // the result... ?>
or are there certain things like flash supercookies or some java elements that you just cant get with a simple http get scrape and thats why you want to use the automation of the real internet explorer browser?
|
|
|
|
|
Logged
|
When I was your age we used to walk to the TV to change the channel.... _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
|
|
|
|
tommytx
|
 |
« Reply #7 on: May 12, 2010, 06:56:33 PM » |
|
Php would be great if I could teach it to accept cookies and then return the cookie when asked for.. When I go in with vbs of course it activates IE which does a nice job of handling cookies and sessions.. Not sure if I have to handle sessions, but the vbs IE works perfect... would love to do it with php so my desktop would not have to be on line 24x7..
The vb6 Web Browser does a great job also, but it continuously blows the hell out of my ieframe.dll and makes me reload the IE from scratch as I tried to replace the dll.. but that does not work. Also I went into the register and rewrote the file location C:\Windows\system32\ieframe.dll but that made no freaking difference... seems the ieframe.dll blows up bout every 4 to 6 hours... all the other IE stuff for vb6 continues to work fine... IF you search google "ieframe.dll blows out my butt... not really just search for ieframe.dll fails etc you find a ton of complaints... its really a piece of sh?t. But I really like the web browser.. wish I could find a fix for it... the web is filled with complaints.
So bottom line... I know a lot of php but can't seem to get it to process cookies... can sent ok.. but it needs to read and write on demand on the site.. the site checks the cookie all the time... I just need to go in and fill in the user and pass and grab one page each 4 hours... and vbs and IE does it well so does the vb6 webbrowser... but have not been able to convinve php to do it.. I tried snoopy, curl and a lot other cookie handlers, but my level of knowledge has not solved it you..
I would love to give you the site and password.. but its a major MLS network.. and they would pull my real estate license if someone were caught ttying to get in... actually all I need to do is find one of many baisc sites that use cookies adn even sessions then use that site.. one that would just delete your password it they detected automation.
I am not trying to break in to anything.. I have user and password.. just want to get the data each 4 hours wihtout having to do it manually..
Thanks for all the help... I will look at what you have offered and see if I can do anything with it..
If anyone reading this whinning.. and knows how to fix the failing ieframe.dll for longer than 4 to 6 hours I would forever be in your debt... I will post on some of the vb6 forums for this problem.. but you just never know who might help...
|
|
|
|
|
Logged
|
|
|
|
|
tommytx
|
 |
« Reply #8 on: May 12, 2010, 07:01:39 PM » |
|
<quote> $thehtml = file_get_contents('http </quote>
Oh! just notice this.. both the file_get_ contents() and the file() are being blocked... but I don't think its the actual command, but being blocked since you are coming in from php with a browser to manage the cookie requests...so I think that is what is causing the block... keeps folks from scrapping the massive MLS stuff...
|
|
|
|
|
Logged
|
|
|
|
|
Phaėton
|
 |
« Reply #9 on: May 12, 2010, 07:09:15 PM » |
|
You could use delphi or c++ to load the i.e. object and get the html from it.. if you can get the ie.document.  ? figured out and into the mybody variable that would work. but apparently youd still have to reset ie every 4-6 hours?
|
|
|
|
|
Logged
|
When I was your age we used to walk to the TV to change the channel.... _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
|
|
|
|
Phaėton
|
 |
« Reply #10 on: May 12, 2010, 07:11:51 PM » |
|
you could do some delphi code with sendkey.pas to execute the keystrokes as macros to view the site ... ctrl+l for location ... then use sendkey to poke the keys into the keypress buffer like ctrl+a to select all the text in the bar where the ctrl+l took you and then do a file save page as auto macro and then have the php just strobe the dir... hahahah
|
|
|
|
|
Logged
|
When I was your age we used to walk to the TV to change the channel.... _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
|
|
|
|
tommytx
|
 |
« Reply #11 on: May 12, 2010, 07:14:52 PM » |
|
Do we have any window web platform experts here... i thougth not... I have been thinking about buying a windows server vice my VPS that I use now... Does that mean I could run such things as VBS and vb6 and VB.NET on the web just like I do on my desktop. I have a lot of things I run 24x7 on my desktop... and it would be great to have a desktop in the air.. I mean up on the server.... Is that how it works... does anyone know.... hell that would be another long learning page probably...
|
|
|
|
|
Logged
|
|
|
|
|
Phaėton
|
 |
« Reply #12 on: May 12, 2010, 07:28:39 PM » |
|
Yea just set xp up on a virtual host like: http://www.airvm.com/Hosted-Virtual-Servers?gclid=CJexrYeIzqECFch_5QodhxpRHwthen you can use vnc on an ipad or iphone or laptop or use it from wherever you are.
|
|
|
|
|
Logged
|
When I was your age we used to walk to the TV to change the channel.... _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
|
|
|
|
tommytx
|
 |
« Reply #13 on: May 12, 2010, 08:08:43 PM » |
|
Thanks... I will look into it..
|
|
|
|
|
Logged
|
|
|
|
|
Phaėton
|
 |
« Reply #14 on: May 12, 2010, 08:37:37 PM » |
|
You could also use php to open up a tcpip socket and send the firefox header.. its some sort of thing like Http User Agent: Mozilla/Firefox blah bla .. you can find out with something like this: http://www.ericgiguere.com/tools/http-header-viewer.htmlor open up a server socket and make a request to it from your browser... anyhow get the right header and i think cookies are just text files shaped a certain way... i think its been done in php the problem is whenever ive done any sort of bot like that i end up getting caught and they change their game up to catch me so youll have to constantly mess with it anyway. if you could get an answer to that IE.document property and how to get the html as a string in vb then you can post it as a url to a php script and do what you need ... but that wouldnt work because of your ieframe issue, right?
|
|
|
|
|
Logged
|
When I was your age we used to walk to the TV to change the channel.... _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
|
|
|
|