
![]() |
DangerMouse
Hi all,
Just a quick question out of curiousity rather than based on a specific example at the moment - what method do you guys tend to use for HTML parsing? Am aware I can attempt to learnpreg_match andregular expressions but these seem to suffer from complexity and potential unreliability (depending on the content on the page and how often it changes). I've spotted a few classes although none of them appearto do have the full functionality i expected. Although HTML isnt always strict XML I was expecting something that would allow me to 'walk' it in the same way, accessing elements, attributes and contents on both a name and parent-child basis.Just thought I might have missed something obvious, or maybe I'm expecting it too 'easy' ? (although little is easy for a noob phpcoder like me!)Cheers, Steve perkiset
It completely depends on what you are trying to do.
If you are simply trying to extract a couple things from a page, and you know what they look like, the most effective way to do it is to learnto useregexs - and then the preg_match and preg_match_all functions are REALLY handy. IF it's something really small then the substr and strpos functions can be employed, but that will get out of hand quickly. |

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads