The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 18, 2019, 10:31:21 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Prepping text for Html and Xml with php  (Read 1781 times)
jammaster82
Lifer
*****
Offline Offline

Posts: 666


Thats craigs list for ya


View Profile
« on: April 04, 2010, 09:16:13 PM »

What things do i need to do to html to store it in XML? 

Logged

The watched pot, never boils... But if you walk away from it , the soup burns.  What gives?
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #1 on: April 05, 2010, 01:30:33 AM »

you have to make sure you follow xhtml standard.
then it is xml Smiley
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: April 05, 2010, 07:24:12 AM »

Str_replace al <> with () or something so that it doesn't confuse the XML. I used to use [$fegt] and [$felt] (forced encode greater|less than) because it was jest fantastically obvious what was supposed to be a < . You're technically allowed to store ecial chars either, so you'll still need to convert others as well. Given the ease of it, you might just consider converting it all to hex and call it a day, except that you wont be able to see/edit that easily.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #3 on: April 05, 2010, 03:24:45 PM »

What things do i need to do to html to store it in XML? 

What nopster said.

However, there was a time when I needed to send html while posting to wp blog.  It's not
feasible to clean out the html cuz how could I put it back in on the other end?

Here's an example, the only things missing are $user and $pass.

Code:
<param><value><struct>
<member><name>title</name><value><string>$title</string></value></member>
<member><name>description</name><value><string><![CDATA[$description]]></string></value></member>
<member><name>mt_allow_comments</name><value><string>0</string></value></member>
<member><name>mt_allow_pings</name><value><string>0</string></value></member>
<member><name>mt_keywords</name><value><array><data>$tag_data</data></array></value></member>
</struct></value></param>

The $description that you see in the CDATA contains html and all sorts of otherwise
xml-illegal characters.  CDATA is the miracle pill in this case.

Bompa
Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #4 on: April 06, 2010, 02:41:24 PM »

Purpose of xml is to dump data in a text format so it can be used by used in a language agnostic way. Also you can then edit the xml with a simple text editor etc.

The reason you want to use xhtml instead of html is not to shove it into a DB, but so u can load it into a DOM tree.
Once it is in a DOM tree, then u can manipulate the tree, then convert it back to xhtml to display to user.

If you have shitty html, you can attempt to use http://tidy.sourceforge.net I am guessing it probably has a PHP binding.

If you want to store actual XML inside the DB you should be using blob, sorry perks but replacing the <> with () with a regexp is down right dangerous, and wastes computer cycles.

I think perks mentioned that PHP has pickle. I am presuming you are wanting to store xhtml in DB so u can later present it to user.
Pickle is way quicker (i am not sure how internals of PHP work), since it is actual bytes of VM memory dumped.
http://php.net/manual/en/book.domxml.php seems like ur standard DOM Smiley
So you could load XHTML into DOM tree using PHP parser. Then Pickle the entire DOM tree Smiley. Save the pickle into DB.

Then when u want to present to user, use the various DOM functions to manipulate it. Once done, then dump the file to display to user.
Most DOM tree implementation are usually made in C/C++ so it will be lot faster and safer then using regexp to play with attributes urself Smiley.

With that cache thingie that you can put on apache that perks talks about, and a way to check if pages are "dirty" you could get some amazing speed me thinks.
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!