The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 23, 2019, 10:55:47 AM

Login with username, password and session length


Pages: [1] 2
  Print  
Author Topic: How to parse a file with VBS and grab the data in each column.  (Read 12356 times)
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« on: May 10, 2010, 08:55:18 PM »

I am back! and since you guys did such a supe job on my last question... I wanted to try again... So far without a doubt this the smartest forum around town and I hope this is posted on first page of Google Tomorrow morining and brings 5,000 new signups.. hee.. hee.. Do you want me to ping this page... I doubt its needed...

Anyhoo.. here is my predicament... and I did not give up for six hours and you guys/gals will solve it in 2 mins and piss me off... but that is how it is when you know your shit.


' This is the result I want to get.
' Col 1    Col2     Col3 Basic old 3 column table in html
' --------------------------------
'Apples|sweetfruit|yummy
'Bananas|yellow fruit|very good
'Chicken|Best Fried|excellent
'Barbeque|Love it|eat it all day

I have the above information in a basic 3 column table.  Using VBS I want to extract the data and separate it with the | symbol for later processing. I can easily capture the entire document with "Set mybody=IE.document" easy enough.  The problem I am having is parsing the document to get the data from each cell.  Is there a way I can now parse the variable mybody now and grab the cells of data and separate them with a | with a CR following each groups of 3 cells which is one line.

Some snippets are shown below but if anyone want to play with it, you can find the simple html table htm and the entire vbs code on the web at hXXp://www.vahud.com/web_question/web_question.zip

IF anyone wants more info or code, I can dump the entire code here... or you can get it from the web site above.

Below is a bunch of the stuff I have tried to no avail.

Set mybody = IE.document

Dim a,Num,All,i,links, mytable1, mytable2

set mytable1 = IE.document.getElementsbyTagname("TABLE")

set mytable2 = mytable1.getElementsbyTagname("TR")

set links = mytable2.getElementsbyTagname("TD")

For Each i in links
   olinks = i.outerHtml
   ilinks = i.innerText
   list1 = list1 + i + vbcr
   list2 = list2 + olinks + vbcr
   list3 = list3 + "|" + ilinks + "|" + vbcr
Next

Oh.. just in case anyone is skittish about downloading any zip stuff from the web from an unknown person, i have included every thing in text files on the web.. so just visit the site listed above with no file name and it will give you the directory.... the htm file and the vbs file have .txt extensions so they cannot jump on you if you are worried... but they are safe... you have my word..and besides you have my website address so you can kick my butt, and Perk may also give you my email.. just to be sure you get me.

Thanks
Tom
PLS VBS... I can't use Javascript...

Oh.. I just saw the attach button.. so use attach or get from web.. your choice...


* web_question.zip (1.01 KB - downloaded 221 times.)
* web_question.zip (1.01 KB - downloaded 219 times.)
* web_question.zip (1.01 KB - downloaded 227 times.)
Logged
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« Reply #1 on: May 10, 2010, 11:01:24 PM »

Some updated information:

bob = IE.document.getElementsbyTagname("tr")
dog = ""
for i = 0 to 2
dog = dog + bob.document.getElementsByTagName("td").item(i).innerTEXT + "|"
next
msgbox dog

This gives "Apples|Sweet Fruit|Yummy"  which is exactly what I am looking for but now I
need to make it loop three times since there is 3 rows... its is now looping 3 times for
the 3 columns as should be.  But will not loop the three times for the rows as needed.

However when I try to compete it like this: nothing works.. says can't find bob.
I surrounded the code above with this outside loop for the rows.. but no luck.

for z = 0 to 2
bob = IE.document.getElementsbyTagname("tr").item(z)
"All code above is here."
next

This should present 0,1,2 lines of TR's for the Triple TD loop to pull from.
Even if i add the following:
 bob = IE.document.getElementsbyTagname("tr").item(0)
without the next it will not work.. says its looking for bob..
So using item(0,1,or2) flat won't work.. so if it won't work manually, then of course when i try the outside loop it will not work since it won't even work with the manual control.

I will keep working...but I hope one of you miracle magicians will wave your magic wand for me as you have many times in the past. Sorry bout the 3 dupe attachments... Itried to remove 2 of them but could not..
But they are small only 1k.



« Last Edit: May 10, 2010, 11:06:22 PM by tommytx » Logged
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #2 on: May 11, 2010, 12:18:18 AM »

can i do it in php and you curl my php script with a url you want to parse Devilish
and i return a pipe delimited line by line file or something easy for you to vbparse?

could do that in a few minutes with regex

p.s. its awful cold here in virginia beach tonight, isnt it?

Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: May 11, 2010, 01:12:51 AM »

Sorry mate, definitely a job for Regex. And I'm personally allergic to VBS ... Had to get shots of adrenaline to bring me down last time touched it. Wink

Do you want to move this thread to windows or something other than JavaScript? Really doesn't look like the right board for this ...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« Reply #4 on: May 11, 2010, 09:09:17 AM »

Perk..you da boss.. move it to where ever you want it...

Phae.. Thanks for the offer, if its simple I would appreciate that php... but unless it fakes the site that it is a browser it will not let me in... it hate the dog=file("web address") and also any attempt at file_get_contents() has been routinely rejected.. I tried all that.. and of course VBS simulates a browser completely... IF you have some secrets via php that will fake it into thinking its a live visitor then that would be worth its weight in gold....gold to me is a 25 dollar gift certificate... so anything you could do would be great... PHP would be much better as I could hook it to the Cron and set and forget as it must scrape each 4 hours.. and keep in a database...

I had thought of buying a windows platform to see if I could run from there but not sure how all that works.. but I do have lots of stufff that needs to simulate a browser.. and wondered if VBS could run 24x7 on a windows server an no one would know it was not a live browser.... just dumb about all that..


Logged
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #5 on: May 12, 2010, 05:40:01 PM »

Code:
dim mybody
mybody="<table><tr><td>apples</td><td>sweet fruit</td><td>yummy</td>"


msgbox mybody

' of  course have the mybody.IE.text be what sets the data for mybody if we
' can figure out how to get the IE.document html into a string
' then you could have the real internet explorer browser do the browsing
' and ship the html off to the php

Set objShell = CreateObject("WScript.Shell")
set WshShell = WScript.CreateObject("WScript.Shell")

Const cURL = "http://www.tomstats.info/script.php?thebodydata="+mybody

Dim IE
Dim objShell
Set objShell = CreateObject("WScript.Shell")
    Set IE = CreateObject("InternetExplorer.Application")
        IE.Visible = True
        IE.Navigate curl
    While IE.Busy
    Wend
    WScript.Sleep(1000)



Wscript.Quit(0)


it could work like the above..


i tried to set mybody=IE.document but it returned the string [object] instead of html

you can see here:  http://www.youtube.com/watch?v=0g1FYC0kZMQ

is it IE.document.text maybe?  if we could get the .text from it you could post it to
a php script easy to write and then get back pipe delimited line by line file of the
table...

ive got two windows machines .... maybe get a virtual host to run a windows xp vm
online somewhere.... then you could vnc in and use phpdesigner 2008 from your UI of choice.


« Last Edit: May 12, 2010, 06:45:08 PM by Phaėton » Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #6 on: May 12, 2010, 06:36:15 PM »

Is it some flash supercookie type of stuff that you need the ie browser for?

couldnt you just do this in php?

Code:
<?php

$thehtml 
file_get_contents('http://www.tomstats.info/_btlogin/login_tom.htm');
// and then a wicked preg_split sequence or whatever to draw the data out of
// the result...
?>


or are there certain things like flash supercookies or some java elements
that you just cant get with a simple http get scrape and thats why you
want to use the automation of the real internet explorer browser?


Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« Reply #7 on: May 12, 2010, 06:56:33 PM »

Php would be great if I could teach it to accept cookies and then return the cookie when asked for..
When I go in with vbs of course it activates IE which does a nice job of handling cookies and sessions..
Not sure if I have to handle sessions, but the vbs IE works perfect... would love to do it with php so my desktop would not have to be on line 24x7..

The vb6 Web Browser does a great job also, but it continuously blows the hell out of my ieframe.dll and makes me reload the IE from scratch as I tried to replace the dll.. but that does not work. Also I went into the register and rewrote the file location C:\Windows\system32\ieframe.dll but that made no freaking difference... seems the ieframe.dll blows up bout every 4 to 6 hours... all the other IE stuff for vb6 continues to work fine...
IF you search google "ieframe.dll blows out my butt... not really just search for ieframe.dll fails etc you find a ton of complaints... its really a piece of sh?t.  But I really like the web browser.. wish I could find a fix for it... the web is filled with complaints.

So bottom line... I know a lot of php but can't seem to get it to process cookies... can sent ok.. but it needs to read and write on demand on the site.. the site checks the cookie all the time...
I just need to go in and fill in the user and pass and grab one page each 4 hours... and vbs and IE does it well so does the vb6 webbrowser... but have not been able to convinve php to do it..
I tried snoopy, curl and a lot other cookie handlers, but my level of knowledge has not solved it you..

I would love to give you the site and password.. but its a major MLS network.. and they would pull my real estate license if someone were caught ttying to get in... actually all I need to do is find one of many baisc sites that use cookies adn even sessions then use that site.. one that would just delete your password it they detected automation.

I am not trying to break in to anything.. I have user and password.. just want to get the data each 4 hours wihtout having to do it manually..

Thanks for all the help... I will look at what you have offered and see if I can do anything with it..

If anyone reading this whinning.. and knows how to fix the failing ieframe.dll for longer than 4 to 6 hours I would forever be in your debt... I will post on some of the vb6 forums for this problem.. but you just never know who might help...

Logged
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« Reply #8 on: May 12, 2010, 07:01:39 PM »

<quote>
$thehtml = file_get_contents('http
</quote>

Oh! just notice this.. both the file_get_ contents() and the file() are being blocked... but I don't think its the actual command, but being blocked since you are coming in from php with a browser to manage the cookie requests...so I think that is what is causing the block... keeps folks from scrapping the massive MLS stuff...

Logged
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #9 on: May 12, 2010, 07:09:15 PM »

You could use delphi or c++ to load the i.e. object
 and get the  html from it..

if you can get the ie.document.Huh?? figured out and into
the mybody variable that would work.

but apparently youd still have to reset ie every 4-6 hours?
Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #10 on: May 12, 2010, 07:11:51 PM »

you could do some delphi code with sendkey.pas to execute the keystrokes
as macros to view the site ... ctrl+l for location ... then use sendkey to poke
the keys into the keypress buffer like ctrl+a to select all the text in the bar
where the ctrl+l took you and then do a file save page as auto macro
and then have the php just strobe the dir... hahahah

Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« Reply #11 on: May 12, 2010, 07:14:52 PM »

Do we have any window web platform experts here... i thougth not...
I have been thinking about buying a windows server vice my VPS that I use now...
Does that mean I could run such things as VBS and vb6 and VB.NET on the web just like I do on my desktop.
I have a lot of things I run 24x7 on my desktop... and it would be great to have a desktop in the air..  I mean up on the server.... Is that how it works... does anyone know.... hell that would be another long learning page probably...
Logged
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #12 on: May 12, 2010, 07:28:39 PM »

Yea just set xp up on a virtual host like:

http://www.airvm.com/Hosted-Virtual-Servers?gclid=CJexrYeIzqECFch_5QodhxpRHw

then you can use vnc on an ipad or iphone or laptop or use it from wherever you are.

Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« Reply #13 on: May 12, 2010, 08:08:43 PM »

Thanks... I will look into it..
Logged
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #14 on: May 12, 2010, 08:37:37 PM »

You could also use php to open up a tcpip socket
and send the firefox header.. its some sort of
thing like

Http
User Agent: Mozilla/Firefox blah bla

.. you can find out with something like this:

http://www.ericgiguere.com/tools/http-header-viewer.html

or open up a server socket and make a request to it from your browser...

anyhow get the right header and i think cookies are just text files
shaped a certain way... i think its been done in php the problem is
whenever ive done any sort of bot like that i end up getting caught and
they change their game up to catch me so youll have to constantly mess with it
anyway.

if you could get an answer to that IE.document property and how to get the html
as a string in vb then you can post it as a url to a php script and do what you
need ... but that wouldnt work because of your ieframe issue, right?


Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
Pages: [1] 2
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!