ok i made it part of the way. I created a converter function to convert the multibyte chars into their standard equivs.
This method also works for testing the copyright symbol and registration mark.
I'm still stuck though. this code page charset stupidity of the internets is driving me bonkers. Is there a way to convert a string from whatever charset it is, into UTF8?
these are all coming from live scrapes btw, so the question about coming from a file, the answer is nope, it comes from the tubes and is used for pluging up the tubes with my turds.
function convertchars($string)
{
$search = array(chr(0xe2) . chr(0x80) . chr(0x98),
chr(0xe2) . chr(0x80) . chr(0x99),
chr(0xe2) . chr(0x80) . chr(0x9c),
chr(0xe2) . chr(0x80) . chr(0x9d),
chr(0xe2) . chr(0x80) . chr(0x93),
chr(0xe2) . chr(0x80) . chr(0x94),
chr(0xe2) . chr(0x80) . chr(0xa6),
chr(0xc2) . chr(0xab),
chr(0xc2) . chr(0xbb),
chr(0xc2) . chr(0xb4));
$replace = array('\'',
'\'',
'"',
'"',
'-',
'-',
'...',
'<<',
'>>',
'\'');
return str_replace($search, $replace, $string);
}
if (strpos($s,chr(0xc2).chr(0xa9)) > 0)
{
$matched = true; //copyrightcymbol
$err .= 'CopyrightSymbol:';
}
if (strpos($s,chr(0xc2).chr(0xae)) > 0)
{
$matched = true; //registermark
$err .= 'RegisterMark:';
}