The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register.
Did you miss your activation email?
May 24, 2012, 02:32:16 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Do I need to install a dozen different character sets on Windows?  (Read 554 times)
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 524


View Profile
« on: October 02, 2010, 01:11:11 AM »

I really did spend some time looking for a good board for this, but this is the
best I could find.  My issue has to do with the way websites in foreign languages
are handled by Windows OS after scraping.

My code attempts to join a site and the website returns an error in Polish,
or whatever.  In the html, the Polish characters are intact, but when my
code saves that same html source to a file on my hd, the characters are
gibberish; no longer Polish.

Here is an example in Russian:

Символы, набираемые Вами, не совпадают с символами на изображении

Translates to: "Characters that you type does not match the characters in the image"

Characters in firefox html source are the same, so that's nice, but in my saved file on my
hd, it is: Ошибка!

How can my perl code detect what the error is if it does not save the right characters?

Do I need to install a dozen different character sets on Windows?

And even if I do that, how do I know that will display correctly in my editor (UltraEdit)?

Bompa
Logged

"Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted." -- Albert Einstein
Phaëton
Lifer
*****
Offline Offline

Posts: 507


⎝⏠⏝⏠⎠


View Profile
« Reply #1 on: October 02, 2010, 05:04:57 AM »

popcorn.gif
Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #2 on: October 02, 2010, 10:28:57 PM »

the page you are hitting "should" have a header that informs of the codepage that is being used.
if not then they are "assuming" only correct language folks who have a default codepage set which would match, ie, polish peeps hitting a polish site.

Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!