Thread: DOM and Xpath.
emonk

I suck at anything OO. I just never really caught on to it, or saw enough advantage in it to bother. I'm trying to play 'catch up' though and

learn

  something new.

So for my '

learn

 ing  experience' I thought I'd play around a bit with the

PHP

  DOM stuff, and do some neat stuff with xpath. I can't figure it out though. If I'd have just stuck with fucntional code, and

regex

 s I'd be done by now, but I wanna

learn

  something today. Applause

Here is my code to find the action of the forms on a page:

  //Now find all the forms.
// parse the html into a DOMDocument
//ob_start(); //this generates a lot of errors, that dump the output file
$dom = new DOMDocument();
@$dom->loadHTML($page);
//ob_end_clean();

$xpath = new DOMXPath($dom);
$forms = $xpath->evaluate("/html/body//form");

for ($i = 0; $i < $forms->length; $i++) {
print "Finding form action. ";
$form = $forms->item($i);
$action = $form->getAttribute('action');
if (substr($action,0,Applause != "https://"Applause {
  if ( substr($action,0,7) == "http://" ) $action = $action;
  elseif ( substr($action,0,1) == "/" ) $action = "http://$domain" . $action;
  else $action = "http://$domain/" . $action;
  }else $action = $action;
}


This code actually works, though it seems really ugly most of that ugly is my crappy 'relative to absolute URL' kludge. The problem is I can't figure out how to get more data about my form out. Like when I do '$action = $form->getAttribute('action');', isn't there some way I could do something like '$formstuff[] = $dom->getAllTheFishinElementsAndAttributesAndPutThemInANormalArrayYouBastard('*');' instead?

It keeps returning DOMNodeDingleberries instead of anything useful that I can just iterate through, and then for each DOMDingleberrie the only way I can see to get anything useful out of it is to already know the tag name or ID which I don't have.

I mean All I really want is to pull out all the forms, then return an array with all the various elements and their values and whatnot, but the way I'm seeing this is that I'm going to have to write something like the above hunk O' crap for every possible tag (input, option, yadadadad), and then parse each form.

What obvious crap am I missing? There has to be something, or

PHP

 s DOM functionality is pretty damned pointless.

Also: XML makes the baby Jesus cry.

DangerMouse

lol I dont quite agree about XML, I find it far easier than parsing using

regular expression

 s.

Having recently gone through the same process,

learn

 ing  to use

PHP

  DOM functions and xPath I can really see where your coming from with this, although the DOM structure is powerful, its really frustrating at the same time. Its almost like getting your head around the idea that the data is there, but you just can't 'see it' using traditional

PHP

  methods. I found this function somewhere that might help with that:

function domNodeList_to_string($DomNodeList) {
    $output = '';
    $doc = new DOMDocument;
    while ( $node = $DomNodeList->item($i) ) {
        // import node
        $domNode = $doc->importNode($node, true);
        // append node
        $doc->appendChild($domNode);
        $i++;
    }
    $output = $doc->saveXML();
    $output = print_r($output, 1);
    $output = htmlspecialchars($output);
    return $output;
}


I'd have a stab at the solution to your problem being to recursively itterate though the nodes and their children returned from your xPath query and then maybe build an associative array of the results if you'd like to use more traditional ways to access the data, similar to the above process.

DM


Perkiset's Place Home   Politics @ Perkiset's