Purpose of xml is to dump data in a text format so it can be used by used in a language agnostic way. Also you can then edit the xml with a simple text editor etc.
The reason you want to use xhtml instead of html is not to shove it into a DB, but so u can load it into a DOM tree.
Once it is in a DOM tree, then u can manipulate the tree, then convert it back to xhtml to display to user.
If you have shitty html, you can attempt to use
http://tidy.sourceforge.net I am guessing it probably has a PHP binding.
If you want to store actual XML inside the DB you should be using blob, sorry perks but replacing the <> with () with a regexp is down right dangerous, and wastes computer cycles.
I think perks mentioned that PHP has pickle. I am presuming you are wanting to store xhtml in DB so u can later present it to user.
Pickle is way quicker (i am not sure how internals of PHP work), since it is actual bytes of VM memory dumped.
http://php.net/manual/en/book.domxml.php seems like ur standard DOM

So you could load XHTML into DOM tree using PHP parser. Then Pickle the entire DOM tree

. Save the pickle into DB.
Then when u want to present to user, use the various DOM functions to manipulate it. Once done, then dump the file to display to user.
Most DOM tree implementation are usually made in C/C++ so it will be lot faster and safer then using regexp to play with attributes urself

.
With that cache thingie that you can put on apache that perks talks about, and a way to check if pages are "dirty" you could get some amazing speed me thinks.