SPSSX Discussion

unicode vs local locale. where would read errors be found?

Classic

List

Threaded

3 messages Options

Maguin, Eugene

unicode vs local locale. where would read errors be found?

I got this message upon opening a file that was written in Unicode and opened on my computer with Unicode turned off (locale=local).

Warning. Command name: GET FILE

SPSS Statistics data file "U:\DAP\ERIN\RESPONSES_CLEANED.SAV" is written in a character encoding (ISO_8859-1:1987) incompatible with the current LOCALE setting. It may not be readable.

Consider changing LOCALE or setting UNICODE on. (DATA 1721)

I understand that to get rid of the warning all I need to do is to change the character encoding to Unicode in the Edit->Options menu.

What I’m curious about is whether any statements can be made about where read errors will occur given that the file almost certainly contains only floating point numbers and US standard English characters and numerals.

Thanks, Gene Maguin

Jon K Peck

Re: unicode vs local locale. where would read errors be found?

The warning is based on the encoding in the file vs the Statistics setting. In many cases, everything will be fine. When there is a problem, say a Japanese source with a western locale, you will see all sorts of garbage in variable names, labels, string data etc when there are Japanese characters (there might not be any). The Japanese have a name for this - mojibake - "ghost characters".

More subtle problems might occur with particular characters such as, say, the Euro sign, the encoding of which varies some from one code page to another. For a few characters, there are differences between the Windows and Mac versions of the same code page.

One thing you can count on, though, is that if all the text is plain, 7-bit ascii, it will work regardless of any locale or Unicode settings.

And once you have converted to Unicode, you can forget about all these annoying encoding problems forever after.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: "Maguin, Eugene" <[hidden email]>
To: [hidden email],
Date: 07/31/2013 07:44 AM
Subject: [SPSSX-L] unicode vs local locale. where would read errors be found?
Sent by: "SPSSX(r) Discussion" <[hidden email]>

I got this message upon opening a file that was written in Unicode and opened on my computer with Unicode turned off (locale=local).

Warning. Command name: GET FILE
SPSS Statistics data file "U:\DAP\ERIN\RESPONSES_CLEANED.SAV" is written in a character encoding (ISO_8859-1:1987) incompatible with the current LOCALE setting. It may not be readable.
Consider changing LOCALE or setting UNICODE on. (DATA 1721)

I understand that to get rid of the warning all I need to do is to change the character encoding to Unicode in the Edit->Options menu.

What I’m curious about is whether any statements can be made about where read errors will occur given that the file almost certainly contains only floating point numbers and US standard English characters and numerals.

Thanks, Gene Maguin

Rick Oliver-3

Re: unicode vs local locale. where would read errors be found?

In reply to this post by Maguin, Eugene

If it contains nothing but numbers and 7-bit ASCII characters, there shouldn't be a problem.

According to Wikipedia, ISO-8859 "is generally intended for “Western European” languages (see below for a list). It is the basis for most popular 8-bit character sets, including Windows-1252." So if you're running in Windows-1252 code page, I wouldn't think there would be any data loss.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]

From: "Maguin, Eugene" <[hidden email]>
To: [hidden email],
Date: 07/31/2013 08:47 AM
Subject: unicode vs local locale. where would read errors be found?
Sent by: "SPSSX(r) Discussion" <[hidden email]>