Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine.
So: how do I understand what each of the two warnings are saying?
Can I conclude that the data, both text and numeric, are intact?
If not, what would need to be done to get an intact version from the person in Taiwan? Thanks, Gene Maguin get file='C:\Users\...'. Warning. Command name: get file At least one character in the the dictionary could not be interpreted in the current code page (Big5). LOCALE may be set incorrectly for this data file. (DATA 1701) Warning # 5281. Command name: get file SPSS Statistics is running in Unicode encoding mode. This file is encoded in a locale-specific (code page) encoding. The defined width of any string variables are automatically tripled in order to avoid possible data loss. You can use ALTER TYPE to set the width of string variables to the width of the longest observed value for each string variable. |
The second one can be ignored. The first one probably means that there are extended characters in the file that don't fit the locale setting. There are several ways to override that depending on your Statistics version.
First thing to try is Set Unicode off and then open the file which will avoid need for character conversion. If that doesn't work, get back in touch. On Tuesday, July 26, 2016, Maguin, Eugene <[hidden email]> wrote:
-- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Jon, thank you so much for your quick reply. When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If
you would please, what is “Big5” and what is the impact of changing the locale setting? * Encoding: UTF-8. get file='C:\Users\...’. Warning. Command name: get file SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5) incompatible with the current LOCALE setting. It may not be readable. Consider changing LOCALE or setting UNICODE on. (DATA 1721) From: Jon Peck [mailto:[hidden email]]
The second one can be ignored. The first one probably means that there are extended characters in the file that don't fit the locale setting. There are several ways to override that depending on your Statistics version. First thing to try is Set Unicode off and then open the file which will avoid need for character conversion. If that doesn't work, get back in touch.
Jon K Peck |
Big5 is one of the Chinese character code page encoding schemes (there are two - one for traditional and one for simplified). The message is telling you that the encoding marked in the file is big5 but you are reading it as if it is in the western encoding (presumably), cp1252, so Chinese characters would not display correctly. If you want to send me the file, I can figure out exactly how it is actually encoded and whether there is any problem. On Wed, Jul 27, 2016 at 7:27 AM, Maguin, Eugene <[hidden email]> wrote:
|
In reply to this post by Maguin, Eugene
Big5 is a traditional Chinese locale.
I think this essentially the same as zh_tw, so SET LOCALE zh_tw should set the locale correctly. You might not notice any immediate or significant differences running in this locale, depending on what you are doing. It looks like traditional Chinese uses a period as the decimal indicator, so by default numeric values in both Viewer output and the Data Editor would look the same as the English locale. If you changed the locale to something like German or French, however, a comma is used as the decimal indicator. When reading and writing text data files, the default decimal indicator is the locale decimal indicator (although you can change this). When entering data in the Data Editor, the decimal indicator is always the locale decimal indicator. In syntax, the decimal indicator is always a period (except in inline data defined with BEGIN DATA-END DATA, which is treated like reading a text data file). Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: "Maguin, Eugene" <[hidden email]> To: [hidden email] Date: 07/27/2016 03:36 PM Subject: Re: get file problem involving codepage/locale unicode Sent by: "SPSSX(r) Discussion" <[hidden email]> Jon, thank you so much for your quick reply. When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If you would please, what is “Big5” and what is the impact of changing the locale setting? * Encoding: UTF-8. get file='C:\Users\...’. Warning. Command name: get file SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5) incompatible with the current LOCALE setting. It may not be readable. Consider changing LOCALE or setting UNICODE on. (DATA 1721) From: Jon Peck [mailto:jkpeck@...] Sent: Tuesday, July 26, 2016 4:35 PM To: Maguin, Eugene <[hidden email]> Cc: [hidden email] Subject: Re: [SPSSX-L] get file problem involving codepage/locale unicode The second one can be ignored. The first one probably means that there are extended characters in the file that don't fit the locale setting. There are several ways to override that depending on your Statistics version. First thing to try is Set Unicode off and then open the file which will avoid need for character conversion. If that doesn't work, get back in touch. On Tuesday, July 26, 2016, Maguin, Eugene <emaguin@...> wrote: Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine. So: how do I understand what each of the two warnings are saying? Can I conclude that the data, both text and numeric, are intact? If not, what would need to be done to get an intact version from the person in Taiwan? Thanks, Gene Maguin get file='C:\Users\...'. Warning. Command name: get file At least one character in the the dictionary could not be interpreted in the current code page (Big5). LOCALE may be set incorrectly for this data file. (DATA 1701) Warning # 5281. Command name: get file SPSS Statistics is running in Unicode encoding mode. This file is encoded in a locale-specific (code page) encoding. The defined width of any string variables are automatically tripled in order to avoid possible data loss. You can use ALTER TYPE to set the width of string variables to the width of the longest observed value for each string variable. ===================== To manage your subscription to SPSSX-L, send a message to <a href="javascript:_e(%7B%7D,'cvml','LISTSERV@LISTSERV.UGA.EDU');" target=_blank>LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Jon K Peck jkpeck@... ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
I think the problem is now understandable and no longer a problem. After seeing your message I looked this morning and there is no Zh_tw locale but there is a
PRC locale and a Taiwan locale. And, what’s kind of curious to me (I’m not bitching or wanting a reason) is that the output and user interface settings offer traditional Chinese and simplified Chinese. No reponse needed. Gene Maguin From: Rick Oliver [mailto:[hidden email]]
Big5 is a traditional Chinese locale.
===================== To manage your subscription to SPSSX-L, send a message to
[hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
zh_TW is traditional Chinese, and it is
a perfectly valid locale setting.
Chinese_PRC in the UI is simplified Chinese. Chinese-Taiwan in the UI is traditional (big5) Chinese. Try the following in syntax: dataset close all. new file. set locale zh_tw. show locale. set locale "Chinese-Taiwan". show locale. set locale "Chinese-PRC". show locale. The first two are locale settings are essentially the same. But you have a valid point about the inconsistency in wording between the locale options and the other language options. Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: "Maguin, Eugene" <[hidden email]> To: [hidden email] Date: 07/28/2016 11:16 AM Subject: Re: get file problem involving codepage/locale unicode Sent by: "SPSSX(r) Discussion" <[hidden email]> I think the problem is now understandable and no longer a problem. After seeing your message I looked this morning and there is no Zh_tw locale but there is a PRC locale and a Taiwan locale. And, what’s kind of curious to me (I’m not bitching or wanting a reason) is that the output and user interface settings offer traditional Chinese and simplified Chinese. No reponse needed. Gene Maguin From: Rick Oliver [mailto:oliverr@...] Sent: Thursday, July 28, 2016 11:16 AM To: Maguin, Eugene <[hidden email]> Cc: [hidden email] Subject: Re: get file problem involving codepage/locale unicode Big5 is a traditional Chinese locale. I think this essentially the same as zh_tw, so SET LOCALE zh_tw should set the locale correctly. You might not notice any immediate or significant differences running in this locale, depending on what you are doing. It looks like traditional Chinese uses a period as the decimal indicator, so by default numeric values in both Viewer output and the Data Editor would look the same as the English locale. If you changed the locale to something like German or French, however, a comma is used as the decimal indicator. When reading and writing text data files, the default decimal indicator is the locale decimal indicator (although you can change this). When entering data in the Data Editor, the decimal indicator is always the locale decimal indicator. In syntax, the decimal indicator is always a period (except in inline data defined with BEGIN DATA-END DATA, which is treated like reading a text data file). Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: oliverr@... From: "Maguin, Eugene" <emaguin@...> To: [hidden email] Date: 07/27/2016 03:36 PM Subject: Re: get file problem involving codepage/locale unicode Sent by: "SPSSX(r) Discussion" <[hidden email]> Jon, thank you so much for your quick reply. When I started again this morning, I marked Locale writing system in Edit->Options->Language tab. That got rid of the both messages and gave a new message. If you would please, what is “Big5” and what is the impact of changing the locale setting? * Encoding: UTF-8. get file='C:\Users\...’. Warning. Command name: get file SPSS Statistics data file "C:\Users\..." is written in a character encoding (Big5) incompatible with the current LOCALE setting. It may not be readable. Consider changing LOCALE or setting UNICODE on. (DATA 1721) From: Jon Peck [mailto:jkpeck@...] Sent: Tuesday, July 26, 2016 4:35 PM To: Maguin, Eugene <emaguin@...> Cc: [hidden email] Subject: Re: [SPSSX-L] get file problem involving codepage/locale unicode The second one can be ignored. The first one probably means that there are extended characters in the file that don't fit the locale setting. There are several ways to override that depending on your Statistics version. First thing to try is Set Unicode off and then open the file which will avoid need for character conversion. If that doesn't work, get back in touch. On Tuesday, July 26, 2016, Maguin, Eugene <emaguin@...> wrote: Here’s the message. Background. File originated in the US, opened and resaved on a Taiwan machine, set back to US and read on my machine. So: how do I understand what each of the two warnings are saying? Can I conclude that the data, both text and numeric, are intact? If not, what would need to be done to get an intact version from the person in Taiwan? Thanks, Gene Maguin get file='C:\Users\...'. Warning. Command name: get file At least one character in the the dictionary could not be interpreted in the current code page (Big5). LOCALE may be set incorrectly for this data file. (DATA 1701) Warning # 5281. Command name: get file SPSS Statistics is running in Unicode encoding mode. This file is encoded in a locale-specific (code page) encoding. The defined width of any string variables are automatically tripled in order to avoid possible data loss. You can use ALTER TYPE to set the width of string variables to the width of the longest observed value for each string variable. ===================== To manage your subscription to SPSSX-L, send a message to <a href="javascript:_e(%7B%7D,'cvml','LISTSERV@LISTSERV.UGA.EDU');" target=_blank>LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -- Jon K Peck jkpeck@... ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
The trouble is that locale names vary by operating system. The ones in the Options list were chosen to be as generic and intelligible to nonspecialists as possible. On Thu, Jul 28, 2016 at 10:56 AM, Rick Oliver <[hidden email]> wrote: zh_TW is traditional Chinese, and it is a perfectly valid locale setting. |
Free forum by Nabble | Edit this page |