The Cache: Technology Expert's Forum
Welcome, Guest. Please login or register. September 23, 2019, 10:59:06 AM

Login with username, password and session length

Pages: [1]
Author Topic: DOM and Xpath.  (Read 1729 times)
Offline Offline

Posts: 44

View Profile
« on: January 09, 2008, 11:52:47 AM »

 I suck at anything OO. I just never really caught on to it, or saw enough advantage in it to bother. I'm trying to play 'catch up' though and learn something new.

 So for my 'learning experience' I thought I'd play around a bit with the PHP DOM stuff, and do some neat stuff with xpath. I can't figure it out though. If I'd have just stuck with fucntional code, and regexs I'd be done by now, but I wanna learn something today. Smiley

 Here is my code to find the action of the forms on a page:

  //Now find all the forms.
// parse the html into a DOMDocument
//ob_start(); //this generates a lot of errors, that dump the output file
$dom = new DOMDocument();

$xpath = new DOMXPath($dom);
$forms = $xpath->evaluate("/html/body//form");

for ($i = 0; $i < $forms->length; $i++) {
print "Finding form action.\n";
$form = $forms->item($i);
$action = $form->getAttribute('action');
if (substr($action,0,8) != "https://") {
    if ( substr($action,0,7) == "http://" ) $action = $action;
    elseif ( substr($action,0,1) == "/" ) $action = "http://$domain" . $action;
    else $action = "http://$domain/" . $action;
   }else $action = $action;

 This code actually works, though it seems really ugly most of that ugly is my crappy 'relative to absolute URL' kludge. The problem is I can't figure out how to get more data about my form out. Like when I do '$action = $form->getAttribute('action');', isn't there some way I could do something like '$formstuff[] = $dom->getAllTheFuckinElementsAndAttributesAndPutThemInANormalArrayYou Bastard('*');' instead?

 It keeps returning DOMNodeDingleberries instead of anything useful that I can just iterate through, and then for each DOMDingleberrie the only way I can see to get anything useful out of it is to already know the tag name or ID which I don't have.

 I mean All I really want is to pull out all the forms, then return an array with all the various elements and their values and whatnot, but the way I'm seeing this is that I'm going to have to write something like the above hunk O' crap for every possible tag (input, option, yadadadad), and then parse each form.

 What obvious crap am I missing? There has to be something, or PHPs DOM functionality is pretty damned pointless.

 Also: XML makes the baby Jesus cry.
Offline Offline

Posts: 244

View Profile
« Reply #1 on: January 10, 2008, 04:53:39 AM »

lol I dont quite agree about XML, I find it far easier than parsing using regular expressions.

Having recently gone through the same process, learning to use PHP DOM functions and xPath I can really see where your coming from with this, although the DOM structure is powerful, its really frustrating at the same time. Its almost like getting your head around the idea that the data is there, but you just can't 'see it' using traditional PHP methods. I found this function somewhere that might help with that:

function domNodeList_to_string($DomNodeList) {
    $output = '';
    $doc = new DOMDocument;
    while ( $node = $DomNodeList->item($i) ) {
        // import node
        $domNode = $doc->importNode($node, true);
        // append node
    $output = $doc->saveXML();
    $output = print_r($output, 1);
    $output = htmlspecialchars($output);
    return $output;

I'd have a stab at the solution to your problem being to recursively itterate though the nodes and their children returned from your xPath query and then maybe build an associative array of the results if you'd like to use more traditional ways to access the data, similar to the above process.

Pages: [1]
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks

Valid XHTML 1.0! Valid CSS!