get file problem involving codepage/locale unicode

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

get file problem involving codepage/locale unicode

Maguin, Eugene

Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.

So: how do I understand what each of the two warnings are saying?

Can I conclude that the data, both text and numeric, are intact?

If not, what would need to be done to get an intact version from the person in Taiwan?

 

Thanks, Gene Maguin

 

 

get file='C:\Users\...'.

 

Warning.  Command name: get file

At least one character in the the dictionary could not be interpreted in the

current code page (Big5).  LOCALE may be set incorrectly for this data file.

(DATA 1701)

 

Warning # 5281.  Command name: get file

SPSS Statistics is running in Unicode encoding mode.  This file is encoded in

a locale-specific (code page) encoding.  The defined width of any string

variables are automatically tripled in order to avoid possible data loss.  You

can use ALTER TYPE to set the width of string variables to the width of the

longest observed value for each string variable.

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: get file problem involving codepage/locale unicode

Jon Peck
The second one can be ignored.  The first one probably means that there are extended characters in the file that don't fit the locale setting.  There are several ways to override that depending on your Statistics version.

First thing to try is Set Unicode off and then open the file which will avoid need for character conversion.  If that doesn't work, get back in touch.

On Tuesday, July 26, 2016, Maguin, Eugene <[hidden email]> wrote:

Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.

So: how do I understand what each of the two warnings are saying?

Can I conclude that the data, both text and numeric, are intact?

If not, what would need to be done to get an intact version from the person in Taiwan?

 

Thanks, Gene Maguin

 

 

get file='C:\Users\...'.

 

Warning.  Command name: get file

At least one character in the the dictionary could not be interpreted in the

current code page (Big5).  LOCALE may be set incorrectly for this data file.

(DATA 1701)

 

Warning # 5281.  Command name: get file

SPSS Statistics is running in Unicode encoding mode.  This file is encoded in

a locale-specific (code page) encoding.  The defined width of any string

variables are automatically tripled in order to avoid possible data loss.  You

can use ALTER TYPE to set the width of string variables to the width of the

longest observed value for each string variable.

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;LISTSERV@LISTSERV.UGA.EDU&#39;);" target="_blank">LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: get file problem involving codepage/locale unicode

Maguin, Eugene

Jon, thank you so much for your quick reply.

 

When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If you would please, what is “Big5” and what is the impact of changing the locale setting?  

 

 

 

* Encoding: UTF-8.

get file='C:\Users\...’.

 

Warning.  Command name: get file

SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5)

incompatible with the current LOCALE setting.  It may not be readable.

Consider changing LOCALE or setting UNICODE on.  (DATA 1721)

 

 

 

 

 

From: Jon Peck [mailto:[hidden email]]
Sent: Tuesday, July 26, 2016 4:35 PM
To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]
Subject: Re: [SPSSX-L] get file problem involving codepage/locale unicode

 

The second one can be ignored.  The first one probably means that there are extended characters in the file that don't fit the locale setting.  There are several ways to override that depending on your Statistics version.

 

First thing to try is Set Unicode off and then open the file which will avoid need for character conversion.  If that doesn't work, get back in touch.

On Tuesday, July 26, 2016, Maguin, Eugene <[hidden email]> wrote:

Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.

So: how do I understand what each of the two warnings are saying?

Can I conclude that the data, both text and numeric, are intact?

If not, what would need to be done to get an intact version from the person in Taiwan?

 

Thanks, Gene Maguin

 

 

get file='C:\Users\...'.

 

Warning.  Command name: get file

At least one character in the the dictionary could not be interpreted in the

current code page (Big5).  LOCALE may be set incorrectly for this data file.

(DATA 1701)

 

Warning # 5281.  Command name: get file

SPSS Statistics is running in Unicode encoding mode.  This file is encoded in

a locale-specific (code page) encoding.  The defined width of any string

variables are automatically tripled in order to avoid possible data loss.  You

can use ALTER TYPE to set the width of string variables to the width of the

longest observed value for each string variable.

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to <a href="javascript:_e(%7B%7D,'cvml','LISTSERV@LISTSERV.UGA.EDU');" target="_blank"> LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--

Jon K Peck
[hidden email]

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: get file problem involving codepage/locale unicode

Jon Peck
Big5 is one of the Chinese character code page encoding schemes (there are two - one for traditional and one for simplified).  The message is telling you that the encoding marked in the file is big5 but you are reading it as if it is in the western encoding (presumably), cp1252, so Chinese characters would not display correctly.

If you want to send me the file, I can figure out exactly how it is actually encoded and whether there is any problem.

On Wed, Jul 27, 2016 at 7:27 AM, Maguin, Eugene <[hidden email]> wrote:

Jon, thank you so much for your quick reply.

 

When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If you would please, what is “Big5” and what is the impact of changing the locale setting?  

 

 

 

* Encoding: UTF-8.

get file='C:\Users\...’.

 

Warning.  Command name: get file

SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5)

incompatible with the current LOCALE setting.  It may not be readable.

Consider changing LOCALE or setting UNICODE on.  (DATA 1721)

 

 

 

 

 

From: Jon Peck [mailto:[hidden email]]
Sent: Tuesday, July 26, 2016 4:35 PM
To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]
Subject: Re: [SPSSX-L] get file problem involving codepage/locale unicode

 

The second one can be ignored.  The first one probably means that there are extended characters in the file that don't fit the locale setting.  There are several ways to override that depending on your Statistics version.

 

First thing to try is Set Unicode off and then open the file which will avoid need for character conversion.  If that doesn't work, get back in touch.

On Tuesday, July 26, 2016, Maguin, Eugene <[hidden email]> wrote:

Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.

So: how do I understand what each of the two warnings are saying?

Can I conclude that the data, both text and numeric, are intact?

If not, what would need to be done to get an intact version from the person in Taiwan?

 

Thanks, Gene Maguin

 

 

get file='C:\Users\...'.

 

Warning.  Command name: get file

At least one character in the the dictionary could not be interpreted in the

current code page (Big5).  LOCALE may be set incorrectly for this data file.

(DATA 1701)

 

Warning # 5281.  Command name: get file

SPSS Statistics is running in Unicode encoding mode.  This file is encoded in

a locale-specific (code page) encoding.  The defined width of any string

variables are automatically tripled in order to avoid possible data loss.  You

can use ALTER TYPE to set the width of string variables to the width of the

longest observed value for each string variable.

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--

Jon K Peck
[hidden email]

 




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: get file problem involving codepage/locale unicode

Rick Oliver-3
In reply to this post by Maguin, Eugene
Big5 is a traditional Chinese locale.

I think this essentially the same as zh_tw, so
SET LOCALE zh_tw

should set the locale correctly.

You might not notice any immediate or significant differences running in this locale, depending on what you are doing. It looks like traditional Chinese uses a period as the decimal indicator, so by default numeric values in both Viewer output and the Data Editor would look the same as the English locale. If you changed the locale to something like German or French, however, a comma is used as the decimal indicator. When reading and writing text data files, the default decimal indicator is the locale decimal indicator (although you can change this). When entering data in the Data Editor, the decimal indicator is always the locale decimal indicator. In syntax, the decimal indicator is always a period (except in inline data defined with BEGIN DATA-END DATA, which is treated like reading a text data file).

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        "Maguin, Eugene" <[hidden email]>
To:        [hidden email]
Date:        07/27/2016 03:36 PM
Subject:        Re: get file problem involving codepage/locale unicode
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Jon, thank you so much for your quick reply.
 
When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If you would please, what is “Big5” and what is the impact of changing the locale setting?  
 
 
 
* Encoding: UTF-8.
get file='C:\Users\...’.
 
Warning.  Command name: get file
SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5)
incompatible with the current LOCALE setting.  It may not be readable.
Consider changing LOCALE or setting UNICODE on.  (DATA 1721)
 
 
 
 
 
From: Jon Peck [mailto:jkpeck@...]
Sent:
Tuesday, July 26, 2016 4:35 PM
To:
Maguin, Eugene <[hidden email]>
Cc:
[hidden email]
Subject:
Re: [SPSSX-L] get file problem involving codepage/locale unicode

 
The second one can be ignored.  The first one probably means that there are extended characters in the file that don't fit the locale setting.  There are several ways to override that depending on your Statistics version.
 
First thing to try is Set Unicode off and then open the file which will avoid need for character conversion.  If that doesn't work, get back in touch.

On Tuesday, July 26, 2016, Maguin, Eugene <
emaguin@...> wrote:
Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.
So: how do I understand what each of the two warnings are saying?
Can I conclude that the data, both text and numeric, are intact?
If not, what would need to be done to get an intact version from the person in Taiwan?
 
Thanks, Gene Maguin
 
 
get file='C:\Users\...'.
 
Warning.  Command name: get file
At least one character in the the dictionary could not be interpreted in the
current code page (Big5).  LOCALE may be set incorrectly for this data file.
(DATA 1701)
 
Warning # 5281.  Command name: get file
SPSS Statistics is running in Unicode encoding mode.  This file is encoded in
a locale-specific (code page) encoding.  The defined width of any string
variables are automatically tripled in order to avoid possible data loss.  You
can use ALTER TYPE to set the width of string variables to the width of the
longest observed value for each string variable.
 
 
 
===================== To manage your subscription to SPSSX-L, send a message to <a href="javascript:_e(%7B%7D,'cvml','LISTSERV@LISTSERV.UGA.EDU');" target=_blank>LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--

Jon K Peck
jkpeck@...
 

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: get file problem involving codepage/locale unicode

Maguin, Eugene

I think the problem is now understandable and no longer a problem. After seeing your message I looked this morning and there is no Zh_tw locale but there is a PRC locale and a Taiwan locale. And, what’s kind of curious to me (I’m not bitching or wanting a reason) is that the output and user interface settings offer traditional Chinese and simplified Chinese. No reponse needed. Gene Maguin

 

 

 

From: Rick Oliver [mailto:[hidden email]]
Sent: Thursday, July 28, 2016 11:16 AM
To: Maguin, Eugene <[hidden email]>
Cc: [hidden email]
Subject: Re: get file problem involving codepage/locale unicode

 

Big5 is a traditional Chinese locale.

I think this essentially the same as zh_tw, so
SET LOCALE zh_tw

should set the locale correctly.

You might not notice any immediate or significant differences running in this locale, depending on what you are doing. It looks like traditional Chinese uses a period as the decimal indicator, so by default numeric values in both Viewer output and the Data Editor would look the same as the English locale. If you changed the locale to something like German or French, however, a comma is used as the decimal indicator. When reading and writing text data files, the default decimal indicator is the locale decimal indicator (although you can change this). When entering data in the Data Editor, the decimal indicator is always the locale decimal indicator. In syntax, the decimal indicator is always a period (except in inline data defined with BEGIN DATA-END DATA, which is treated like reading a text data file).

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        "Maguin, Eugene" <[hidden email]>
To:        [hidden email]
Date:        07/27/2016 03:36 PM
Subject:        Re: get file problem involving codepage/locale unicode
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





Jon, thank you so much for your quick reply.
 
When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If you would please, what is “Big5” and what is the impact of changing the locale setting?  
 
 
 
* Encoding: UTF-8.
get file='C:\Users\...’.
 
Warning.  Command name: get file
SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5)
incompatible with the current LOCALE setting.  It may not be readable.
Consider changing LOCALE or setting UNICODE on.  (DATA 1721)
 
 
 
 
 
From: Jon Peck [[hidden email]]
Sent:
Tuesday, July 26, 2016 4:35 PM
To:
Maguin, Eugene <[hidden email]>
Cc:
[hidden email]
Subject:
Re: [SPSSX-L] get file problem involving codepage/locale unicode

 
The second one can be ignored.  The first one probably means that there are extended characters in the file that don't fit the locale setting.  There are several ways to override that depending on your Statistics version.
 
First thing to try is Set Unicode off and then open the file which will avoid need for character conversion.  If that doesn't work, get back in touch.

On Tuesday, July 26, 2016, Maguin, Eugene <[hidden email]> wrote:
Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.
So: how do I understand what each of the two warnings are saying?
Can I conclude that the data, both text and numeric, are intact?
If not, what would need to be done to get an intact version from the person in Taiwan?
 
Thanks, Gene Maguin
 
 
get file='C:\Users\...'.
 
Warning.  Command name: get file
At least one character in the the dictionary could not be interpreted in the
current code page (Big5).  LOCALE may be set incorrectly for this data file.
(DATA 1701)
 
Warning # 5281.  Command name: get file
SPSS Statistics is running in Unicode encoding mode.  This file is encoded in
a locale-specific (code page) encoding.  The defined width of any string
variables are automatically tripled in order to avoid possible data loss.  You
can use ALTER TYPE to set the width of string variables to the width of the
longest observed value for each string variable.
 
 
 
===================== To manage your subscription to SPSSX-L, send a message to <a href="javascript:_e(%7B%7D,'cvml','LISTSERV@LISTSERV.UGA.EDU');" target="_blank"> LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]
 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: get file problem involving codepage/locale unicode

Rick Oliver-3
zh_TW is traditional Chinese, and it is a perfectly valid locale setting.

Chinese_PRC in the UI is simplified Chinese. Chinese-Taiwan in the UI is traditional (big5) Chinese.

Try the following in syntax:

dataset close all.
new file.
set locale zh_tw.
show locale.
set locale "Chinese-Taiwan".
show locale.
set locale "Chinese-PRC".
show locale.

The first two are locale settings are essentially the same.

But you have a valid point about the inconsistency in wording between the locale options and the other language options.


Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        "Maguin, Eugene" <[hidden email]>
To:        [hidden email]
Date:        07/28/2016 11:16 AM
Subject:        Re: get file problem involving codepage/locale unicode
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I think the problem is now understandable and no longer a problem. After seeing your message I looked this morning and there is no Zh_tw locale but there is a PRC locale and a Taiwan locale. And, what’s kind of curious to me (I’m not bitching or wanting a reason) is that the output and user interface settings offer traditional Chinese and simplified Chinese. No reponse needed. Gene Maguin
 
 
 
From: Rick Oliver [mailto:oliverr@...]
Sent:
Thursday, July 28, 2016 11:16 AM
To:
Maguin, Eugene <[hidden email]>
Cc:
[hidden email]
Subject:
Re: get file problem involving codepage/locale unicode

 
Big5 is a traditional Chinese locale.

I think this essentially the same as zh_tw, so
SET LOCALE zh_tw
should set the locale correctly.


You might not notice any immediate or significant differences running in this locale, depending on what you are doing. It looks like traditional Chinese uses a period as the decimal indicator, so by default numeric values in both Viewer output and the Data Editor would look the same as the English locale. If you changed the locale to something like German or French, however, a comma is used as the decimal indicator. When reading and writing text data files, the default decimal indicator is the locale decimal indicator (although you can change this). When entering data in the Data Editor, the decimal indicator is always the locale decimal indicator. In syntax, the decimal indicator is always a period (except in inline data defined with BEGIN DATA-END DATA, which is treated like reading a text data file).


Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail:
oliverr@...



From:        
"Maguin, Eugene" <emaguin@...>
To:        
[hidden email]
Date:        
07/27/2016 03:36 PM
Subject:        
Re: get file problem involving codepage/locale unicode
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>





Jon, thank you so much for your quick reply.

When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If you would please, what is “Big5” and what is the impact of changing the locale setting?  



* Encoding: UTF-8.
get file='C:\Users\...’.

Warning.  Command name: get file
SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5)
incompatible with the current LOCALE setting.  It may not be readable.
Consider changing LOCALE or setting UNICODE on.  (DATA 1721)





From:
Jon Peck [
mailto:jkpeck@...]
Sent:
Tuesday, July 26, 2016 4:35 PM
To:
Maguin, Eugene <
emaguin@...>
Cc:
[hidden email]
Subject:
Re: [SPSSX-L] get file problem involving codepage/locale unicode


The second one can be ignored.  The first one probably means that there are extended characters in the file that don't fit the locale setting.  There are several ways to override that depending on your Statistics version.

First thing to try is Set Unicode off and then open the file which will avoid need for character conversion.  If that doesn't work, get back in touch.

On Tuesday, July 26, 2016, Maguin, Eugene <
emaguin@...> wrote:
Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.
So: how do I understand what each of the two warnings are saying?
Can I conclude that the data, both text and numeric, are intact?
If not, what would need to be done to get an intact version from the person in Taiwan?

Thanks, Gene Maguin



get file='C:\Users\...'.

Warning.  Command name: get file
At least one character in the the dictionary could not be interpreted in the
current code page (Big5).  LOCALE may be set incorrectly for this data file.
(DATA 1701)

Warning # 5281.  Command name: get file
SPSS Statistics is running in Unicode encoding mode.  This file is encoded in
a locale-specific (code page) encoding.  The defined width of any string
variables are automatically tripled in order to avoid possible data loss.  You
can use ALTER TYPE to set the width of string variables to the width of the
longest observed value for each string variable.



===================== To manage your subscription to SPSSX-L, send a message to
<a href="javascript:_e(%7B%7D,'cvml','LISTSERV@LISTSERV.UGA.EDU');" target=_blank>LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck

jkpeck@...

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: get file problem involving codepage/locale unicode

Jon Peck
The trouble is that locale names vary by operating system.  The ones in the Options list were chosen to be as generic and intelligible to nonspecialists as possible.

On Thu, Jul 28, 2016 at 10:56 AM, Rick Oliver <[hidden email]> wrote:
zh_TW is traditional Chinese, and it is a perfectly valid locale setting.

Chinese_PRC in the UI is simplified Chinese. Chinese-Taiwan in the UI is traditional (big5) Chinese.

Try the following in syntax:

dataset close all.
new file.
set locale zh_tw.
show locale.
set locale "Chinese-Taiwan".
show locale.
set locale "Chinese-PRC".
show locale.

The first two are locale settings are essentially the same.

But you have a valid point about the inconsistency in wording between the locale options and the other language options.


Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        "Maguin, Eugene" <[hidden email]>
To:        [hidden email]
Date:        07/28/2016 11:16 AM

Subject:        Re: get file problem involving codepage/locale unicode
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I think the problem is now understandable and no longer a problem. After seeing your message I looked this morning and there is no Zh_tw locale but there is a PRC locale and a Taiwan locale. And, what’s kind of curious to me (I’m not bitching or wanting a reason) is that the output and user interface settings offer traditional Chinese and simplified Chinese. No reponse needed. Gene Maguin
 
 
 
From: Rick Oliver [[hidden email]]
Sent:
Thursday, July 28, 2016 11:16 AM
To:
Maguin, Eugene <[hidden email]>
Cc:
[hidden email]
Subject:
Re: get file problem involving codepage/locale unicode

 
Big5 is a traditional Chinese locale.

I think this essentially the same as zh_tw, so
SET LOCALE zh_tw
should set the locale correctly.


You might not notice any immediate or significant differences running in this locale, depending on what you are doing. It looks like traditional Chinese uses a period as the decimal indicator, so by default numeric values in both Viewer output and the Data Editor would look the same as the English locale. If you changed the locale to something like German or French, however, a comma is used as the decimal indicator. When reading and writing text data files, the default decimal indicator is the locale decimal indicator (although you can change this). When entering data in the Data Editor, the decimal indicator is always the locale decimal indicator. In syntax, the decimal indicator is always a period (except in inline data defined with BEGIN DATA-END DATA, which is treated like reading a text data file).


Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail:
[hidden email]



From:        
"Maguin, Eugene" <[hidden email]>
To:        
[hidden email]
Date:        
07/27/2016 03:36 PM
Subject:        
Re: get file problem involving codepage/locale unicode
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>





Jon, thank you so much for your quick reply.

When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If you would please, what is “Big5” and what is the impact of changing the locale setting?  



* Encoding: UTF-8.
get file='C:\Users\...’.

Warning.  Command name: get file
SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5)
incompatible with the current LOCALE setting.  It may not be readable.
Consider changing LOCALE or setting UNICODE on.  (DATA 1721)





From:
Jon Peck [
[hidden email]]
Sent:
Tuesday, July 26, 2016 4:35 PM
To:
Maguin, Eugene <
[hidden email]>
Cc:
[hidden email]
Subject:
Re: [SPSSX-L] get file problem involving codepage/locale unicode


The second one can be ignored.  The first one probably means that there are extended characters in the file that don't fit the locale setting.  There are several ways to override that depending on your Statistics version.

First thing to try is Set Unicode off and then open the file which will avoid need for character conversion.  If that doesn't work, get back in touch.

On Tuesday, July 26, 2016, Maguin, Eugene <
[hidden email]> wrote:
Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.
So: how do I understand what each of the two warnings are saying?
Can I conclude that the data, both text and numeric, are intact?
If not, what would need to be done to get an intact version from the person in Taiwan?

Thanks, Gene Maguin



get file='C:\Users\...'.

Warning.  Command name: get file
At least one character in the the dictionary could not be interpreted in the
current code page (Big5).  LOCALE may be set incorrectly for this data file.
(DATA 1701)

Warning # 5281.  Command name: get file
SPSS Statistics is running in Unicode encoding mode.  This file is encoded in
a locale-specific (code page) encoding.  The defined width of any string
variables are automatically tripled in order to avoid possible data loss.  You
can use ALTER TYPE to set the width of string variables to the width of the
longest observed value for each string variable.



===================== To manage your subscription to SPSSX-L, send a message to
[hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck

[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD