Instead of eliminating unwanted shit, I only allow alphanurmerics plus a few others like
the underscore, (or whatever).
i also do along the lines of what bomps does.
this is an ASP function that does exactly that without using regex. I actually found this to be faster for really long text. i know this is the PERL board, but the concept is the same and doesnt use any functions that wouldnt be available in any language.
have a string of valid characters.
check eat letter in the dirty string against the valid string.
replace the character if its bad.
so for URLs i run it as stripnonalphanumerics(someURL,"-")
for content i run it as stripnonalphanumerics(someURL," ")
for i = 1 to len(dirtystring)
if instr(validstring,letter) then