The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. October 14, 2019, 06:57:58 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Javascript Parsing and Interpretation  (Read 4550 times)
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« on: July 14, 2008, 01:06:34 PM »

Well, I think I'm going to bite off more than I can chew with this one, but I'm going to give it the old college try anyhow.

What I want to do is write a javascript interpreter for PHP. The difference between what I want to do and what several others have already done (believe me, I've looked) is that I don't really care about interpretation of worthless things like arithmetic. I'm looking to interpret the things a surfer's browser would in a format that a "virtual browser" can handle.

E.g. I am writing a class that is indistinguishable in look and behavior from a regular browser, with Perk's webRequest class at its core, except that it's handled entirely via its object interface.

So for example, it loads a page, uses PHP DOM to traverse its content, checking first for <script> tags and then for linked javascript. It will parse out directives like window.location and/or document.write, and act accordingly, adding to the stored DOM of the page or redirecting to a different page, before returning. Then, let's say we want to make a POST request with the source of that page. We'll use the DOM object stored in our browser object like $browser->DOMTree->Input1->value = 'foo'; We'll set everything to what we need it to be, and away we will go.

Most of this, I'm perfectly clear on how I'm going to do - but I'll be the first to admit it - I'm a javascript n00b. I just don't even know where to begin interpreting javascript.

Ideas?
Logged

hai
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #1 on: July 14, 2008, 01:38:04 PM »

Writing the interpreter in JS is PHP is a waste of time, just get an interpretor like spidermonkey and embed it into PHP.
That way u know u have an interpretor that will parse JS real time.

Biggest problem faced is parsing the html code.
All of the html parsers out there barf on bad html code.
Thing is that a web-browsers parser will still render html even if it is bad.

Another problem is that even if u embedd spidermonkey for example, it only contains the interpretor as per the JS definition.
All functions like walking the html dom tree etc will have to be handled by hand.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: July 14, 2008, 01:41:35 PM »

The first question is if you've ever written an interpreter. I've written many, and they're not trivial, but if you do it in an OO way you can make it pretty clean and pretty damn fast. I can help you along here in a big way if you'd like it. My particular speciality is reentrantly handling complex right-side evaluations.

The next problem will be things like timeouts, intervals - these will have to be ignored for the present because they offer another level of complexity that I don't think you want to tackle just yet. That being said, it can be done.

The next problem will be object orientation. Interpreting a straight-ahead scripting language is fun enough... but with class definitions in any number of formats and the usage of said classes even more funky, this is where I see the real interpretation problems. Javascript OO is not OO at all, it's more, like, FU. I despise it, and if you try to interpret it you will as well.

What is your thinking on the Rhino project?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: July 14, 2008, 01:42:29 PM »

Crap NOP jumped in just as I was posting.

He and I are on the same page here... if someone else can do the interpretation and you can just branch on event pops you'll be WAY ahead.
« Last Edit: July 14, 2008, 02:51:56 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #4 on: July 14, 2008, 02:45:05 PM »

@Perks my thoughts exactly
It all depends what u are using it for, what ur needs are etc.

The OO problem is a biggie, for a person with a background in delphi/C++ OO in JS appears not to be OO.
But JS OO is OO Smiley But yah a very valid point.
For example window.location is same as window["location"]
(i am not JS object expert, but there is like a ton of ways to access object properties)

I used a variant of the rhino parser when i scan for all the xss which i posted elsewhere.
Problem with rhino is that it slow, and also it is not thread safe (in a really nasty way) Smiley
So u can not get the speed, if u just need to uses it for a few requests then it works good.

That is reason I would go with spidermonkey interpretor, and make module for PHP.

Speed is one of the reasons i dumped using perl as my system and switched to lisp.
In lisp u have http://opensource.franz.com/xmlutils/xmlutils-dist/phtml.htm
which basically takes html like this

Code:
"<HTML>
  <HEAD>
                    <TITLE>Example HTML input</TITLE>
                    <BODY>
                    <P>Here is some text with a <B>bold</B> word<br>and a <A HREF=\"help.html\">link</P>
                    </HTML>
and it turns it into this
Code:
((:html (:head (:title "Example HTML input"))
  (:body (:p "Here is some text with a " (:b "bold") " word" :br "and a "
                  ((:a :href "help.html") "link")))))
which is a lisp list (of sorts Smiley)

Since lisp compiles directly to machine code it is blindingly fast (ussually 2x the speed of C, compared to scripting language at 10x)
Also since the list is basic structure of lisp, i can now walk the lisp html generated list.

I am using SBCL (lisp has many different types of distros) which allows direct memory access.
As a result all of my http functions are built ontop of libcurl which is all written in C, but since i can access memory directly I do not have the speed loss accrued that a scripting language will suffer from.

Also since the html is now a list (what is called LHTML, idea of html was stolen from lisp, except they replaced ( with < and kinda fuked it up) if i decided to embedd a JS interpretor, when it needs to modify the dom, it would just modify the lhtml instead.
(Before i start working on system, i planned that in future i may want to do this)
(basically right now i have a lisp clone of perl mechanize, except built with lisp ontop of libcurl)

Lisp is kinda like matlab as in u have a console and u compile shit.
Also more cool is u can deploy lisp onto a vps/server.
Then u can connect to it remotely using a secure socket, with emacs.
If lisp encounters an error it will catch it.
Without restarting the lisp, i can locate the cause of error, then recompile that function. And away i go again.
That is why extreme programming in other languages is just nothing but hype/jargon while in lisp it really is Smiley

Anyway i have digressed.
If i am serious about proper html parsing.
I probably will rip the parser out of mozilla, except instead of having it generate a DOM tree have it generate lhtml.
Then at same time do same thing with spidermonkey JS engine.
That way i will be able to emulate browser 100%.








Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: July 14, 2008, 02:56:14 PM »

...
That way i will be able to emulate browser 100%.

and that will be one nifty and high power trick, my friend.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #6 on: July 14, 2008, 03:02:05 PM »

Got any advice for embedding spidermonkey into PHP? I haven't the foggiest of where to start.
Logged

hai
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #7 on: July 14, 2008, 03:28:26 PM »

I have never done for PHP. Mainly for python and lua.
But the theory is the same Smiley.
Also it is a good way to learn internal workings of how VM works etc.

Perk make a simple C module back some time ago for php.
That is very good place to start.
Also fuking docs for this shit suck shit, i learned by finding other peoples extensions, pulling them appart and seeing what they do.

You can attempt to use
http://www.swig.org/Doc1.3/Php.html

Calling C/C++ code from scripting language is pretty simple.
Swig for python anyway do pretty good job generating code.

Where the problem comes is when u have to do callbacks, as in
C/C++ -> scripting language -> C/C++
When that happens u have to convert from C/C++ datatypes -> scripting language datatypes do work and back again.
All sorts of evil can happen there.

Callbacks can take 2 forms, as in actuall callback where pointer of function is passed to C function,
Or where u have inherited virtual functions (or is it methods from a C++ class)
Python solution is to create a proxy class, with stubs to all the virtual functions.
Also python classes are much more advanced then PHP.
It is one reason why python has so many interface to GUI toolkits, C++ code etc,
Because it is "realitively"  ROFLMAO ROFLMAO easy to create modules.
I have not really work with lua too much, but same story.

Most of the code is "boiler plate".
Sorry to bring up lisp/scheme again, but this is one of reason i switch over.
Because of advanced macros in lisp/scheme, i created an entire libcurl binding in like 400 lines of code.
That is one reason i switch to these language.

Again this not to say PHP sucks, just right tool for right job.






Logged
dimitry12
Rookie
**
Offline Offline

Posts: 27



View Profile
« Reply #8 on: July 19, 2008, 12:44:56 PM »

well, interesting discussion here

when thinking about 100% emulating the real browser I thought about real browser:

http://www.mozilla.org/projects/embedding/PublicAPIs.html

look at that too: http://simile.mit.edu/wiki/Crowbar

Though I didn't manage to implement it

What are your thoughts about that?
Logged
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #9 on: July 19, 2008, 02:32:56 PM »

That looks very intresting dimitry Smiley
I have to study it
Thanx

Logged
dimitry12
Rookie
**
Offline Offline

Posts: 27



View Profile
« Reply #10 on: July 19, 2008, 02:37:20 PM »

simile is very promising, though it still needs GUI to run XUL

direct embedding looks like a nice server-side solution but bindings are not easy  Smiley
Logged
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #11 on: July 19, 2008, 02:54:32 PM »

From that article i found this
http://dadicy.wordpress.com/2007/10/09/what-will-you-need-to-run-a-headless-application-in-linux/
I did not know such thing as headless application exist.

C bindings are not too bad. But C++ binding are like fuking hell, so anything to avoid binding
Logged
dimitry12
Rookie
**
Offline Offline

Posts: 27



View Profile
« Reply #12 on: July 19, 2008, 03:01:15 PM »

very nice, did you try it?
Logged
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #13 on: July 19, 2008, 03:41:23 PM »

I am attempting to fuking figure out how to make yum go  ROFLMAO
I need to install Xvfb
I use ubuntu on desktop and centos on server. So not familar with yum shit.
Hence why i ussually just install source, but compiling Xvfb too hard.
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!