SPSSX Discussion

Foreign Charahcter & Question mark ???

Classic

List

Threaded

6 messages Options

alia

Foreign Charahcter & Question mark ???

Hello everyone,

I have SPSS 16 and I tried to open a data file but it has an arabic charchters & SPSS displays them as question mark??? :(

How can I fix this ?

thank you in advance:)

Jon K Peck

Re: Foreign Character & Question mark ???

I suspect that your Windows system is not running in an Arabic locale, so text is expected to fit in some other code page.

Execute this from a syntax window.
SHOW LOCALE.
SET LOCALE=arabic.
SHOW LOCALE.

You will see then what locale SPSS was running in and whether the arabic locale setting worked. You can't have any dataset open when you do this.

You might also want to run in Unicode mode:
SET UNICODE ON.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: alia <[hidden email]>
To: [hidden email],
Date: 11/03/2012 04:31 PM
Subject: [SPSSX-L] Foreign Charahcter & Question mark ???
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hello everyone, I have SPSS 16 and I tried to open a data file but it has an arabic charchters & SPSS displays them as question mark??? :( How can I fix this ? thank you in advance:) -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Foreign-Charahcter-Question-mark-tp5716026.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Albert-Jan Roskam

Re: Foreign Character & Question mark ???

I thought that the backend had its own locale? From the I/O module book: "The I/O Module's locale is separate from that of the client application".
Or does the client application determine from which set of locales could be selected? For example in an e.g. English application locale, one will have to switch to unicode mode,
whereas for somebody from Saudi Arabia, codepage mode + set locale will do the trick?
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "arabic")
Traceback (most recent call last):

....
....
Error: unsupported locale setting
>>> locale.setlocale(locale.LC_ALL, "dutch")
'Dutch_Netherlands.1252'
>>> locale.setlocale(locale.LC_ALL, "english")
'English_United States.1252'
>>>
set locale = "arabic".
803 M> set locale = "arabic".
>Warning # 849 in column 14. Text: arabic
>The LOCALE subcommand of the SET command has an invalid parameter. It
>could not be mapped to a valid backend locale.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck

Re: Foreign Character & Question mark ???

The i/o module is not involved in regular Statistics usage. The SPSS locale governs both backend and frontend and is separate from the OS locale, but it defaults to that locale if the user has never set it otherwise via SET LOCALE. In this particular case, IIRC, the OS locale was English.

While Unicode mode governs how characters will be represented internally to Statistics (code page or Unicode), the SPSS locale setting determines how data and text and interpreted when read in, and, if in code page mode, how the character codes are understood in Statistics. Sav files created by SPSS 15 or later have their character encoding marked, but sav files created by third parties as in this case might not be correctly marked.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: Albert-Jan Roskam <[hidden email]>
To: Jon K Peck/Chicago/IBM@IBMUS, "[hidden email]" <[hidden email]>,
Date: 11/07/2012 02:54 AM
Subject: Re: [SPSSX-L] Foreign Character & Question mark ???

>________________________________ >From: Jon K Peck <[hidden email]> >To: [hidden email] >Sent: Sunday, November 4, 2012 2:55 PM >Subject: Re: [SPSSX-L] Foreign Character & Question mark ??? > > >I suspect that your Windows system is not running in an Arabic locale, so text is expected to fit in some other code page. > >Execute this from a syntax window. >SHOW LOCALE. >SET LOCALE=arabic. >SHOW LOCALE. > >You will see then what locale SPSS was running in and whether the arabic locale setting worked. You can't have any dataset open when you do this. > >You might also want to run in Unicode mode: >SET UNICODE ON. > I thought that the backend had its own locale? From the I/O module book: "The I/O Module's locale is separate from that of the client application". Or does the client application determine from which set of locales could be selected? For example in an e.g. English application locale, one will have to switch to unicode mode, whereas for somebody from Saudi Arabia, codepage mode + set locale will do the trick? >>> import locale >>> locale.setlocale(locale.LC_ALL, "arabic") Traceback (most recent call last): .... .... Error: unsupported locale setting >>> locale.setlocale(locale.LC_ALL, "dutch") 'Dutch_Netherlands.1252' >>> locale.setlocale(locale.LC_ALL, "english") 'English_United States.1252' >>> set locale = "arabic". 803 M> set locale = "arabic". >Warning # 849 in column 14. Text: arabic >The LOCALE subcommand of the SET command has an invalid parameter. It >could not be mapped to a valid backend locale.

Albert-Jan Roskam

Re: Foreign Character & Question mark ???

________________________________

>From: Jon K Peck <[hidden email]>
>To: Albert-Jan Roskam <[hidden email]>
>Cc: "[hidden email]" <[hidden email]>
>Sent: Wednesday, November 7, 2012 3:07 PM
>Subject: Re: [SPSSX-L] Foreign Character & Question mark ???
>
>
>The i/o module is not involved in regular Statistics usage. The SPSS locale governs both backend and frontend and is separate from the OS locale, but it defaults to that locale if the user has never set it otherwise via SET LOCALE. In this particular case, IIRC, the OS locale was English.
>
>While Unicode mode governs how characters will be represented internally to Statistics (code page or Unicode), the SPSS locale setting determines how data and text and interpreted when read in, and, if in code page mode, how the character codes are understood in Statistics. Sav files created by SPSS 15 or later have their character encoding marked, but sav files created by third parties as in this case might not be correctly marked.
>

Ahh. Sorry for dragging on about this but... The CSR seems to say that SET UNICODE=ON mode only affects the LC_CTYPE locale category, whereas SET LOCALE affects all (codepage mode) or all locale categories *except* the LC_CTYPE locale category (unicode mode ). I would expect to see e.g. a comma as a decimal separator in a Dutch locale, in both codepage and unicode mode, but nope (neither in the data editor or the output)
LC_ALL = 0
LC_COLLATE = 1
LC_CTYPE = 2
LC_MONETARY = 3
LC_NUMERIC = 4
LC_TIME = 5

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck

Re: Foreign Character & Question mark ???

See below.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: Albert-Jan Roskam <[hidden email]>
To: Jon K Peck/Chicago/IBM@IBMUS,
Cc: "[hidden email]" <[hidden email]>
Date: 11/07/2012 08:27 AM
Subject: Re: [SPSSX-L] Foreign Character & Question mark ???

________________________________ >From: Jon K Peck <[hidden email]> >To: Albert-Jan Roskam <[hidden email]> >Cc: "[hidden email]" <[hidden email]> >Sent: Wednesday, November 7, 2012 3:07 PM >Subject: Re: [SPSSX-L] Foreign Character & Question mark ??? >[snip] > Ahh. Sorry for dragging on about this but... The CSR seems to say that SET UNICODE=ON mode only affects the LC_CTYPE locale category, whereas SET LOCALE affects all (codepage mode) or all locale categories *except* the LC_CTYPE locale category (unicode mode ). I would expect to see e.g. a comma as a decimal separator in a Dutch locale, in both codepage and unicode mode, but nope (neither in the data editor or the output) LC_ALL = 0 LC_COLLATE = 1 LC_CTYPE = 2 LC_MONETARY = 3 LC_NUMERIC = 4 LC_TIME = 5 >>>If I set a Dutch locale, set locale=nl_NL, I see, e.g., a comma decimal in pivot tables, dialog boxes, and the Data Editor. It sets the default code page for data sources to cp1252 where this is not already indicated and when saving, say, syntax files. (Some pt and DE settings are determined by specific data formats such as DOT that are not locale sensitive.)

SET UNICODE sets how characters are handled within Statistics and default assumptions for some external files such as text and syntax, although these can mostly be overridden.

Fundamentally when in Unicode mode, everything inside Statistics is in Unicode, and mixed character sets can be handled, but the Unicode setting has no effect on locale parameters unrelated to the character set.

You could, if you wanted to be perverse, do this:
set locale="nl_NL.windows-932".
That means a Dutch locale but a Japanese code page. If you are in Unicode mode, that would mean input of appropriate sorts would be converted to Unicode assuming that the input is in a Japanese character set. In code page mode, it would be kept in cp 932 and characters such as e with acute accent could not be represented.

HTH (probably not)
-Jon