Not sure what the correct term is.
for the browser is it set by this tag
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
you can change the charset to different encodings.
so on this board it right now set at utf-8
but if u scrape and use different encoding it will be fuked up.
perl have different codec you use to translate between the encodingshttp://perldoc.perl.org/utf8.html
so does pythonhttp://evanjones.ca/python-utf8.html
where the problem happens is that it possible that the char is not valid for the encoding you have picked, then the codec make error message.
this happens when u do shit like scrape a russian site (char set will be set at like what ever russian uses), but meanwhile text of russian site in english.
so now u have english with russian char set.
fuking pain in the ass