Is this a Unicode issue?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Is this a Unicode issue?

Ron0z
I have been using spss ver 22 for a few years, and saved a lot of system
files that are used as reference for reporting purposes. ADD FILES or MATCH
FILES are used commonly in my day to day activates. So, now I have ver 25
installed and when I try to run various command files that use data created
under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over
the years.

For example, I know I created variable ClientID as A10 and it has been that
way for years, but when I attempt to bring old and new data together I get
an error message indicating a conflicting data issue, typically attempting
to match an A10 var to an A30 var both of which are ClientID. It seems like
spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the
window. I suspect my old data is not Unicode. (It may be risky to attempt to
recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Is this a Unicode issue?

John F Hall
I'm on SPSS 24 and regularly use files created with SPSS-X 8 (Vax cluster) and SPSS 12 or 19 downloaded from the UK Data Service.  There is always a warning message saying Unicode will triple the width for strings.  I always click Yes.  If I have made any modifications to the file I always save it with a new name.  When exiting SPSS I click No when asked if I want to save changes to the original file and have never had problems.

On  incompatible metadata, https://surveyresearch.weebly.com/british-social-attitudes-making-files-from-different-years-compatible.html  is a detailed account of anomalies, incompatibilities etc which had to be resolved before SPSS files for the British Social Attitudes Survey from 1983 to 2016 could be made mutually compatible and combined.

John F Hall  MA (Cantab) Dip Ed (Dunelm)
[Retired academic survey researcher]

Email:          [hidden email]
Website:     Journeys in Survey Research
Course:       Survey Analysis Workshop (SPSS)
Research:   Subjective Social Indicators (Quality of Life)

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ron0z
Sent: 25 October 2018 07:27
To: [hidden email]
Subject: Is this a Unicode issue?

I have been using spss ver 22 for a few years, and saved a lot of system files that are used as reference for reporting purposes. ADD FILES or MATCH FILES are used commonly in my day to day activates. So, now I have ver 25 installed and when I try to run various command files that use data created under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over the years.

For example, I know I created variable ClientID as A10 and it has been that way for years, but when I attempt to bring old and new data together I get an error message indicating a conflicting data issue, typically attempting to match an A10 var to an A30 var both of which are ClientID. It seems like spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the window. I suspect my old data is not Unicode. (It may be risky to attempt to recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Is this a Unicode issue?

Jon Peck
In reply to this post by Ron0z
In a Unicode conversion, strings widths are tripled to ensure that no data are lost due to the theoretical increase in the number of bytes required to hold the text.  If there are, in fact, no extended characters in the strings, the original widths would be fine.

So, it is best to keep the sav files consistently as Unicode or code page.  Since ADD/MATCH require string sizes for each variable to be the same across the files being matched, you can either turn off Unicode mode or convert all the sav files to Unicode.  In the long run, getting everything to Unicode is the way to go.

Converting all the files to Unicode manually would require each one to be opened in Statistics and resaved while in Unicode mode.  However, there is an easier way.  The STATS ADJUST WIDTHS extension command, which can be installed from the Extensions > Extension Hub menu, can synchronize all the files that go together at once.  You can specify a wildcard to refer to a batch of files.  From the help:
The FILES subcommand lists the datasets or file specifications to modify. The list can contain file names, file name wildcards such as "c:/data/x*.sav", and dataset names that are assigned to data. However, * is interpreted as a reference to the active dataset, not as a wildcard.  

The command provides three ways to adjust the widths: MAX, MIN, and FIRST.  MAX, which is the default, sets the width for each variable to the maximum width across all the files.  MIN sets to the minimum, and FIRST sets to the width in the first file referenced.  There are some other controls, too.  See the help for details.  Pressing F1 on an instance in the syntax editor shows the help or use STATS ADJUST WIDTH /HELP.  You can specify a different directory and/or a suffix for the file names to manage how the files are saved.





On Wed, Oct 24, 2018 at 11:26 PM Ron0z <[hidden email]> wrote:
I have been using spss ver 22 for a few years, and saved a lot of system
files that are used as reference for reporting purposes. ADD FILES or MATCH
FILES are used commonly in my day to day activates. So, now I have ver 25
installed and when I try to run various command files that use data created
under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over
the years.

For example, I know I created variable ClientID as A10 and it has been that
way for years, but when I attempt to bring old and new data together I get
an error message indicating a conflicting data issue, typically attempting
to match an A10 var to an A30 var both of which are ClientID. It seems like
spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the
window. I suspect my old data is not Unicode. (It may be risky to attempt to
recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Is this a Unicode issue?

Rick Oliver
I'll just add this:

If you want to run in code page mode, you can turn Unicode mode off:

set unicode no.

This setting persists across sessions. So you only have to set it once and forget it.


On Thu, Oct 25, 2018 at 8:50 AM Jon Peck <[hidden email]> wrote:
In a Unicode conversion, strings widths are tripled to ensure that no data are lost due to the theoretical increase in the number of bytes required to hold the text.  If there are, in fact, no extended characters in the strings, the original widths would be fine.

So, it is best to keep the sav files consistently as Unicode or code page.  Since ADD/MATCH require string sizes for each variable to be the same across the files being matched, you can either turn off Unicode mode or convert all the sav files to Unicode.  In the long run, getting everything to Unicode is the way to go.

Converting all the files to Unicode manually would require each one to be opened in Statistics and resaved while in Unicode mode.  However, there is an easier way.  The STATS ADJUST WIDTHS extension command, which can be installed from the Extensions > Extension Hub menu, can synchronize all the files that go together at once.  You can specify a wildcard to refer to a batch of files.  From the help:
The FILES subcommand lists the datasets or file specifications to modify. The list can contain file names, file name wildcards such as "c:/data/x*.sav", and dataset names that are assigned to data. However, * is interpreted as a reference to the active dataset, not as a wildcard.  

The command provides three ways to adjust the widths: MAX, MIN, and FIRST.  MAX, which is the default, sets the width for each variable to the maximum width across all the files.  MIN sets to the minimum, and FIRST sets to the width in the first file referenced.  There are some other controls, too.  See the help for details.  Pressing F1 on an instance in the syntax editor shows the help or use STATS ADJUST WIDTH /HELP.  You can specify a different directory and/or a suffix for the file names to manage how the files are saved.





On Wed, Oct 24, 2018 at 11:26 PM Ron0z <[hidden email]> wrote:
I have been using spss ver 22 for a few years, and saved a lot of system
files that are used as reference for reporting purposes. ADD FILES or MATCH
FILES are used commonly in my day to day activates. So, now I have ver 25
installed and when I try to run various command files that use data created
under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over
the years.

For example, I know I created variable ClientID as A10 and it has been that
way for years, but when I attempt to bring old and new data together I get
an error message indicating a conflicting data issue, typically attempting
to match an A10 var to an A30 var both of which are ClientID. It seems like
spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the
window. I suspect my old data is not Unicode. (It may be risky to attempt to
recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Is this a Unicode issue?

Jon Peck
True, but you have to do this for everyone in the group who might modify files.

Note also that upconverting to Unicode is lossless (as long as the input code page is correct and the input string lengths are less than 32K/3), but downconverting to code page will cause some data loss if the data contain any characters that do not exist in the target code page.

On Thu, Oct 25, 2018 at 8:38 AM Rick Oliver <[hidden email]> wrote:
I'll just add this:

If you want to run in code page mode, you can turn Unicode mode off:

set unicode no.

This setting persists across sessions. So you only have to set it once and forget it.


On Thu, Oct 25, 2018 at 8:50 AM Jon Peck <[hidden email]> wrote:
In a Unicode conversion, strings widths are tripled to ensure that no data are lost due to the theoretical increase in the number of bytes required to hold the text.  If there are, in fact, no extended characters in the strings, the original widths would be fine.

So, it is best to keep the sav files consistently as Unicode or code page.  Since ADD/MATCH require string sizes for each variable to be the same across the files being matched, you can either turn off Unicode mode or convert all the sav files to Unicode.  In the long run, getting everything to Unicode is the way to go.

Converting all the files to Unicode manually would require each one to be opened in Statistics and resaved while in Unicode mode.  However, there is an easier way.  The STATS ADJUST WIDTHS extension command, which can be installed from the Extensions > Extension Hub menu, can synchronize all the files that go together at once.  You can specify a wildcard to refer to a batch of files.  From the help:
The FILES subcommand lists the datasets or file specifications to modify. The list can contain file names, file name wildcards such as "c:/data/x*.sav", and dataset names that are assigned to data. However, * is interpreted as a reference to the active dataset, not as a wildcard.  

The command provides three ways to adjust the widths: MAX, MIN, and FIRST.  MAX, which is the default, sets the width for each variable to the maximum width across all the files.  MIN sets to the minimum, and FIRST sets to the width in the first file referenced.  There are some other controls, too.  See the help for details.  Pressing F1 on an instance in the syntax editor shows the help or use STATS ADJUST WIDTH /HELP.  You can specify a different directory and/or a suffix for the file names to manage how the files are saved.





On Wed, Oct 24, 2018 at 11:26 PM Ron0z <[hidden email]> wrote:
I have been using spss ver 22 for a few years, and saved a lot of system
files that are used as reference for reporting purposes. ADD FILES or MATCH
FILES are used commonly in my day to day activates. So, now I have ver 25
installed and when I try to run various command files that use data created
under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over
the years.

For example, I know I created variable ClientID as A10 and it has been that
way for years, but when I attempt to bring old and new data together I get
an error message indicating a conflicting data issue, typically attempting
to match an A10 var to an A30 var both of which are ClientID. It seems like
spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the
window. I suspect my old data is not Unicode. (It may be risky to attempt to
recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Is this a Unicode issue?

Robert L
In reply to this post by Ron0z

A question with more general connections but still on the Unicode thread: for SPSS setups where people do not rely only on English, like here in Sweden, what is the preferred choice? Unicode or Local encoding?

 

Robert

 

Från: SPSSX(r) Discussion [mailto:[hidden email]] För Jon Peck
Skickat: den 25 oktober 2018 15:50
Till: [hidden email]
Ämne: Re: Is this a Unicode issue?

 

In a Unicode conversion, strings widths are tripled to ensure that no data are lost due to the theoretical increase in the number of bytes required to hold the text.  If there are, in fact, no extended characters in the strings, the original widths would be fine.

 

So, it is best to keep the sav files consistently as Unicode or code page.  Since ADD/MATCH require string sizes for each variable to be the same across the files being matched, you can either turn off Unicode mode or convert all the sav files to Unicode.  In the long run, getting everything to Unicode is the way to go.

 

Converting all the files to Unicode manually would require each one to be opened in Statistics and resaved while in Unicode mode.  However, there is an easier way.  The STATS ADJUST WIDTHS extension command, which can be installed from the Extensions > Extension Hub menu, can synchronize all the files that go together at once.  You can specify a wildcard to refer to a batch of files.  From the help:

The FILES subcommand lists the datasets or file specifications to modify. The list can contain file names, file name wildcards such as "c:/data/x*.sav", and dataset names that are assigned to data. However, * is interpreted as a reference to the active dataset, not as a wildcard.  

 

The command provides three ways to adjust the widths: MAX, MIN, and FIRST.  MAX, which is the default, sets the width for each variable to the maximum width across all the files.  MIN sets to the minimum, and FIRST sets to the width in the first file referenced.  There are some other controls, too.  See the help for details.  Pressing F1 on an instance in the syntax editor shows the help or use STATS ADJUST WIDTH /HELP.  You can specify a different directory and/or a suffix for the file names to manage how the files are saved.

 

 

 

 

 

On Wed, Oct 24, 2018 at 11:26 PM Ron0z <[hidden email]> wrote:

I have been using spss ver 22 for a few years, and saved a lot of system
files that are used as reference for reporting purposes. ADD FILES or MATCH
FILES are used commonly in my day to day activates. So, now I have ver 25
installed and when I try to run various command files that use data created
under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over
the years.

For example, I know I created variable ClientID as A10 and it has been that
way for years, but when I attempt to bring old and new data together I get
an error message indicating a conflicting data issue, typically attempting
to match an A10 var to an A30 var both of which are ClientID. It seems like
spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the
window. I suspect my old data is not Unicode. (It may be risky to attempt to
recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Robert Lundqvist
Reply | Threaded
Open this post in threaded view
|

Re: Is this a Unicode issue?

Jon Peck
Unicode.  Swedish can be written in the Windows Baltic code page (1257) or in cp 1252, which already adds confusion, but nearby languages may need a different code page.  And there are sometimes small differences between Mac and Windows code pages.  Of course, for international data that can contain many different scripts, Unicode is the only choice.

Unicode solves all these issues by supporting all orthographies. That is why it has been the default in  Statistics since about v22.  There can be minor issues in conversions from code page to Unicode, especially if the code page is mismarked, which sometimes happens with sav files created by third party software.  While most modern software supports Unicode, and Statistics can convert in both directions, it’s something to watch out for.

Also, the old string functions, which are byte oriented, should be replaced with the char.* functions, which are character oriented and support both code page and Unicode encodings transparently.

On Tue, Oct 30, 2018 at 1:38 AM Robert Lundqvist <[hidden email]> wrote:

A question with more general connections but still on the Unicode thread: for SPSS setups where people do not rely only on English, like here in Sweden, what is the preferred choice? Unicode or Local encoding?

 

Robert

 

Från: SPSSX(r) Discussion [mailto:[hidden email]] För Jon Peck
Skickat: den 25 oktober 2018 15:50
Till: [hidden email]
Ämne: Re: Is this a Unicode issue?

 

In a Unicode conversion, strings widths are tripled to ensure that no data are lost due to the theoretical increase in the number of bytes required to hold the text.  If there are, in fact, no extended characters in the strings, the original widths would be fine.

 

So, it is best to keep the sav files consistently as Unicode or code page.  Since ADD/MATCH require string sizes for each variable to be the same across the files being matched, you can either turn off Unicode mode or convert all the sav files to Unicode.  In the long run, getting everything to Unicode is the way to go.

 

Converting all the files to Unicode manually would require each one to be opened in Statistics and resaved while in Unicode mode.  However, there is an easier way.  The STATS ADJUST WIDTHS extension command, which can be installed from the Extensions > Extension Hub menu, can synchronize all the files that go together at once.  You can specify a wildcard to refer to a batch of files.  From the help:

The FILES subcommand lists the datasets or file specifications to modify. The list can contain file names, file name wildcards such as "c:/data/x*.sav", and dataset names that are assigned to data. However, * is interpreted as a reference to the active dataset, not as a wildcard.  

 

The command provides three ways to adjust the widths: MAX, MIN, and FIRST.  MAX, which is the default, sets the width for each variable to the maximum width across all the files.  MIN sets to the minimum, and FIRST sets to the width in the first file referenced.  There are some other controls, too.  See the help for details.  Pressing F1 on an instance in the syntax editor shows the help or use STATS ADJUST WIDTH /HELP.  You can specify a different directory and/or a suffix for the file names to manage how the files are saved.

 

 

 

 

 

On Wed, Oct 24, 2018 at 11:26 PM Ron0z <[hidden email]> wrote:

I have been using spss ver 22 for a few years, and saved a lot of system
files that are used as reference for reporting purposes. ADD FILES or MATCH
FILES are used commonly in my day to day activates. So, now I have ver 25
installed and when I try to run various command files that use data created
under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over
the years.

For example, I know I created variable ClientID as A10 and it has been that
way for years, but when I attempt to bring old and new data together I get
an error message indicating a conflicting data issue, typically attempting
to match an A10 var to an A30 var both of which are ClientID. It seems like
spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the
window. I suspect my old data is not Unicode. (It may be risky to attempt to
recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Is this a Unicode issue?

Robert L
In reply to this post by Ron0z

Many thanks for the reply. It seems reasonable, especially since there are people within my organization who also work at some elsewhere with SPSS setups usually as Unicode. But I am somewhat concerned about a possible switch. You don’t happen to know if 1) it is possible to change setup to Unicode in our concurrent network license right now rather than when wait until there will be an upgrade; and 2) if and possibly what kind of problems we could expect to follow from such a switch?

 

 

Från: Jon Peck [mailto:[hidden email]]
Skickat: den 30 oktober 2018 14:56
Till: Robert Lundqvist <[hidden email]>; [hidden email]
Ämne: Re: [SPSSX-L] Is this a Unicode issue?

 

Unicode.  Swedish can be written in the Windows Baltic code page (1257) or in cp 1252, which already adds confusion, but nearby languages may need a different code page.  And there are sometimes small differences between Mac and Windows code pages.  Of course, for international data that can contain many different scripts, Unicode is the only choice.

 

Unicode solves all these issues by supporting all orthographies. That is why it has been the default in  Statistics since about v22.  There can be minor issues in conversions from code page to Unicode, especially if the code page is mismarked, which sometimes happens with sav files created by third party software.  While most modern software supports Unicode, and Statistics can convert in both directions, it’s something to watch out for.

 

Also, the old string functions, which are byte oriented, should be replaced with the char.* functions, which are character oriented and support both code page and Unicode encodings transparently.

 

On Tue, Oct 30, 2018 at 1:38 AM Robert Lundqvist <[hidden email]> wrote:

A question with more general connections but still on the Unicode thread: for SPSS setups where people do not rely only on English, like here in Sweden, what is the preferred choice? Unicode or Local encoding?

 

Robert

 

Från: SPSSX(r) Discussion [mailto:[hidden email]] För Jon Peck
Skickat: den 25 oktober 2018 15:50
Till: [hidden email]
Ämne: Re: Is this a Unicode issue?

 

In a Unicode conversion, strings widths are tripled to ensure that no data are lost due to the theoretical increase in the number of bytes required to hold the text.  If there are, in fact, no extended characters in the strings, the original widths would be fine.

 

So, it is best to keep the sav files consistently as Unicode or code page.  Since ADD/MATCH require string sizes for each variable to be the same across the files being matched, you can either turn off Unicode mode or convert all the sav files to Unicode.  In the long run, getting everything to Unicode is the way to go.

 

Converting all the files to Unicode manually would require each one to be opened in Statistics and resaved while in Unicode mode.  However, there is an easier way.  The STATS ADJUST WIDTHS extension command, which can be installed from the Extensions > Extension Hub menu, can synchronize all the files that go together at once.  You can specify a wildcard to refer to a batch of files.  From the help:

The FILES subcommand lists the datasets or file specifications to modify. The list can contain file names, file name wildcards such as "c:/data/x*.sav", and dataset names that are assigned to data. However, * is interpreted as a reference to the active dataset, not as a wildcard.  

 

The command provides three ways to adjust the widths: MAX, MIN, and FIRST.  MAX, which is the default, sets the width for each variable to the maximum width across all the files.  MIN sets to the minimum, and FIRST sets to the width in the first file referenced.  There are some other controls, too.  See the help for details.  Pressing F1 on an instance in the syntax editor shows the help or use STATS ADJUST WIDTH /HELP.  You can specify a different directory and/or a suffix for the file names to manage how the files are saved.

 

 

 

 

 

On Wed, Oct 24, 2018 at 11:26 PM Ron0z <[hidden email]> wrote:

I have been using spss ver 22 for a few years, and saved a lot of system
files that are used as reference for reporting purposes. ADD FILES or MATCH
FILES are used commonly in my day to day activates. So, now I have ver 25
installed and when I try to run various command files that use data created
under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over
the years.

For example, I know I created variable ClientID as A10 and it has been that
way for years, but when I attempt to bring old and new data together I get
an error message indicating a conflicting data issue, typically attempting
to match an A10 var to an A30 var both of which are ClientID. It seems like
spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the
window. I suspect my old data is not Unicode. (It may be risky to attempt to
recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Robert Lundqvist
Reply | Threaded
Open this post in threaded view
|

Re: Is this a Unicode issue?

Jon Peck
Unicode mode is a preference setting, and the default is to use it.  Run SHOW UNICODE to check what you are currently using.  SYSFILE INFO shows the encoding for sav files.

When you open a code page sav file in Unicode mode, string widths are automatically tripled in order to guarantee that there is no data loss.  However, this is almost always way too conservative.  You can use ALTER TYPE to reset to the minimum required with, which may be the original size.

If you have to merge a set of sav files with mixed encodings, you might have to adjust the string widths, but, as previously discussed, the STATS ADJUST WIDTHS extension command can fix this.

Syntax files themselves can be in code page or Unicode encodings.  Statistics will figure this out automatically.

Text files that are encoded in Unicode (Utf8), cannot be fixed format, because character widths vary, and field widths are measured in bytes.

The long deprecated portable file format does not support Unicode.

The old string functions disappeared from the GUI years ago but may still be present in syntax files.  They may well still work, but it would be good to change to the char.* versions.  Some do not have char.* versions because they are not affected.

The sort order may be different in some cases, so you might need to resort files where that matters.

On Tue, Oct 30, 2018 at 8:14 AM Robert Lundqvist <[hidden email]> wrote:

Many thanks for the reply. It seems reasonable, especially since there are people within my organization who also work at some elsewhere with SPSS setups usually as Unicode. But I am somewhat concerned about a possible switch. You don’t happen to know if 1) it is possible to change setup to Unicode in our concurrent network license right now rather than when wait until there will be an upgrade; and 2) if and possibly what kind of problems we could expect to follow from such a switch?

 

 

Från: Jon Peck [mailto:[hidden email]]
Skickat: den 30 oktober 2018 14:56
Till: Robert Lundqvist <[hidden email]>; [hidden email]
Ämne: Re: [SPSSX-L] Is this a Unicode issue?

 

Unicode.  Swedish can be written in the Windows Baltic code page (1257) or in cp 1252, which already adds confusion, but nearby languages may need a different code page.  And there are sometimes small differences between Mac and Windows code pages.  Of course, for international data that can contain many different scripts, Unicode is the only choice.

 

Unicode solves all these issues by supporting all orthographies. That is why it has been the default in  Statistics since about v22.  There can be minor issues in conversions from code page to Unicode, especially if the code page is mismarked, which sometimes happens with sav files created by third party software.  While most modern software supports Unicode, and Statistics can convert in both directions, it’s something to watch out for.

 

Also, the old string functions, which are byte oriented, should be replaced with the char.* functions, which are character oriented and support both code page and Unicode encodings transparently.

 

On Tue, Oct 30, 2018 at 1:38 AM Robert Lundqvist <[hidden email]> wrote:

A question with more general connections but still on the Unicode thread: for SPSS setups where people do not rely only on English, like here in Sweden, what is the preferred choice? Unicode or Local encoding?

 

Robert

 

Från: SPSSX(r) Discussion [mailto:[hidden email]] För Jon Peck
Skickat: den 25 oktober 2018 15:50
Till: [hidden email]
Ämne: Re: Is this a Unicode issue?

 

In a Unicode conversion, strings widths are tripled to ensure that no data are lost due to the theoretical increase in the number of bytes required to hold the text.  If there are, in fact, no extended characters in the strings, the original widths would be fine.

 

So, it is best to keep the sav files consistently as Unicode or code page.  Since ADD/MATCH require string sizes for each variable to be the same across the files being matched, you can either turn off Unicode mode or convert all the sav files to Unicode.  In the long run, getting everything to Unicode is the way to go.

 

Converting all the files to Unicode manually would require each one to be opened in Statistics and resaved while in Unicode mode.  However, there is an easier way.  The STATS ADJUST WIDTHS extension command, which can be installed from the Extensions > Extension Hub menu, can synchronize all the files that go together at once.  You can specify a wildcard to refer to a batch of files.  From the help:

The FILES subcommand lists the datasets or file specifications to modify. The list can contain file names, file name wildcards such as "c:/data/x*.sav", and dataset names that are assigned to data. However, * is interpreted as a reference to the active dataset, not as a wildcard.  

 

The command provides three ways to adjust the widths: MAX, MIN, and FIRST.  MAX, which is the default, sets the width for each variable to the maximum width across all the files.  MIN sets to the minimum, and FIRST sets to the width in the first file referenced.  There are some other controls, too.  See the help for details.  Pressing F1 on an instance in the syntax editor shows the help or use STATS ADJUST WIDTH /HELP.  You can specify a different directory and/or a suffix for the file names to manage how the files are saved.

 

 

 

 

 

On Wed, Oct 24, 2018 at 11:26 PM Ron0z <[hidden email]> wrote:

I have been using spss ver 22 for a few years, and saved a lot of system
files that are used as reference for reporting purposes. ADD FILES or MATCH
FILES are used commonly in my day to day activates. So, now I have ver 25
installed and when I try to run various command files that use data created
under both ver 22 and ver 25 I’m getting errors.

The data spec (data types, variable widths etc) has undergone no change over
the years.

For example, I know I created variable ClientID as A10 and it has been that
way for years, but when I attempt to bring old and new data together I get
an error message indicating a conflicting data issue, typically attempting
to match an A10 var to an A30 var both of which are ClientID. It seems like
spss is tripling the field widths of the old data.

My ver 25 shows Unicode:ON in the “Information Area” at the bottom of the
window. I suspect my old data is not Unicode. (It may be risky to attempt to
recreate all of the system files using ver 25.)

How can I fix this?




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD