|
Dear list,
I recently noticed that the sorting order of SPSS 15 differs from SPSS 16, (at least *my* installations of these versions. The following syntax results in different output for the non-alfabethic characters (, *, ^). DATA LIST FIXED /txt 1-6 (A). BEGIN DATA. * test (test) test ^ test END DATA. SORT CASES BY txt. echo "version 16". LIST. In version 15 it results in: (test) * test ^ test test while in version 16 it gives: ^ test (test) * test test It seems to affect non-alfabethic characters. The same is true for AUTORECODE, giving different numbers to different values. Is there a setting I overlooked? Thanks ahead, Antoon Smulders, Advies- en Onderzoeksgroep Beke 026-4438619 www.beke.nl ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
SPSS 16 supports both Unicode and the traditional code page character sets. (SET UNICODE ON/OFF). As part of that, SPSS 16 follows the Unicode collation algorithm. We felt it was important to give the same sort order in both modes as long as the characters can be represented in the code page.
As you can imagine, Unicode, with around 100,000 characters defined, had to put some serious research into what collation means, especially since the order is affected by the locale/language in use. You can see the default collation tables for Unicode at http://www.unicode.org/Public/UCA/latest/allkeys.txt For simple characters, only the first weight following the "[" is necessary. Since, however, sorting words gets quite a lot more complicated when you consider French and other sorts where you can't proceed simply left to right character by character, there are other weights that affect the collation algorithm. Accented characters are treated in different ways in different locales, and multi-script combinations (Japanese with Russian, e.g.) introduce other complexities. The Unicode collation algorithm itself is explained at http://www.unicode.org/unicode/reports/tr10/#AllKeys which gives some insight into the complexities involved. If you need to get an Autorecode result that is consistent between SPSS 15 and 16, use one or the other to produce a template (AUTORECODE ... /SAVE TEMPLATE=) and apply that template (AUTORECODE ... /APPLY TEMPLATE) in the other version. Using a template with Autorecode when you need a stable recoding is always a good idea, because otherwise new values that occur (or values that disappear) will cause the recode for the other values to change. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Antoon Smulders Sent: Tuesday, March 04, 2008 3:04 AM To: [hidden email] Subject: [SPSSX-L] sorting in version 15 and 16 Dear list, I recently noticed that the sorting order of SPSS 15 differs from SPSS 16, (at least *my* installations of these versions. The following syntax results in different output for the non-alfabethic characters (, *, ^). DATA LIST FIXED /txt 1-6 (A). BEGIN DATA. * test (test) test ^ test END DATA. SORT CASES BY txt. echo "version 16". LIST. In version 15 it results in: (test) * test ^ test test while in version 16 it gives: ^ test (test) * test test It seems to affect non-alfabethic characters. The same is true for AUTORECODE, giving different numbers to different values. Is there a setting I overlooked? Thanks ahead, Antoon Smulders, Advies- en Onderzoeksgroep Beke 026-4438619 www.beke.nl ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Antoon Smulders
Hello Jon,
It makes no difference if UNICODE is SET ON or OFF in version 16. That is: for the simple example I gave. The resulting order is the same and thus different from version 15. Greetings Antoon -----Oorspronkelijk bericht----- Van: SPSSX(r) Discussion [mailto:[hidden email]] Namens Peck, Jon Verzonden: dinsdag 4 maart 2008 22:14 Aan: [hidden email] Onderwerp: Re: sorting in version 15 and 16 SPSS 16 supports both Unicode and the traditional code page character sets. (SET UNICODE ON/OFF). As part of that, SPSS 16 follows the Unicode collation algorithm. We felt it was important to give the same sort order in both modes as long as the characters can be represented in the code page. As you can imagine, Unicode, with around 100,000 characters defined, had to put some serious research into what collation means, especially since the order is affected by the locale/language in use. You can see the default collation tables for Unicode at http://www.unicode.org/Public/UCA/latest/allkeys.txt For simple characters, only the first weight following the "[" is necessary. Since, however, sorting words gets quite a lot more complicated when you consider French and other sorts where you can't proceed simply left to right character by character, there are other weights that affect the collation algorithm. Accented characters are treated in different ways in different locales, and multi-script combinations (Japanese with Russian, e.g.) introduce other complexities. The Unicode collation algorithm itself is explained at http://www.unicode.org/unicode/reports/tr10/#AllKeys which gives some insight into the complexities involved. If you need to get an Autorecode result that is consistent between SPSS 15 and 16, use one or the other to produce a template (AUTORECODE ... /SAVE TEMPLATE=) and apply that template (AUTORECODE ... /APPLY TEMPLATE) in the other version. Using a template with Autorecode when you need a stable recoding is always a good idea, because otherwise new values that occur (or values that disappear) will cause the recode for the other values to change. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Antoon Smulders Sent: Tuesday, March 04, 2008 3:04 AM To: [hidden email] Subject: [SPSSX-L] sorting in version 15 and 16 Dear list, I recently noticed that the sorting order of SPSS 15 differs from SPSS 16, (at least *my* installations of these versions. The following syntax results in different output for the non-alfabethic characters (, *, ^). DATA LIST FIXED /txt 1-6 (A). BEGIN DATA. * test (test) test ^ test END DATA. SORT CASES BY txt. echo "version 16". LIST. In version 15 it results in: (test) * test ^ test test while in version 16 it gives: ^ test (test) * test test It seems to affect non-alfabethic characters. The same is true for AUTORECODE, giving different numbers to different values. Is there a setting I overlooked? Thanks ahead, Antoon Smulders, Advies- en Onderzoeksgroep Beke 026-4438619 www.beke.nl ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
That is what I said. We need to be consistent between Unicode and code page modes. It would be terrible if the Unicode mode affected the sorting. We always use the Unicode standard in SPSS 16 regardless of the mode.
HTH, Jon -----Original Message----- From: Antoon Smulders [mailto:[hidden email]] Sent: Wednesday, March 05, 2008 2:27 AM To: Peck, Jon; [hidden email] Subject: RE: sorting in version 15 and 16 Hello Jon, It makes no difference if UNICODE is SET ON or OFF in version 16. That is: for the simple example I gave. The resulting order is the same and thus different from version 15. Greetings Antoon -----Oorspronkelijk bericht----- Van: SPSSX(r) Discussion [mailto:[hidden email]] Namens Peck, Jon Verzonden: dinsdag 4 maart 2008 22:14 Aan: [hidden email] Onderwerp: Re: sorting in version 15 and 16 SPSS 16 supports both Unicode and the traditional code page character sets. (SET UNICODE ON/OFF). As part of that, SPSS 16 follows the Unicode collation algorithm. We felt it was important to give the same sort order in both modes as long as the characters can be represented in the code page. As you can imagine, Unicode, with around 100,000 characters defined, had to put some serious research into what collation means, especially since the order is affected by the locale/language in use. You can see the default collation tables for Unicode at http://www.unicode.org/Public/UCA/latest/allkeys.txt For simple characters, only the first weight following the "[" is necessary. Since, however, sorting words gets quite a lot more complicated when you consider French and other sorts where you can't proceed simply left to right character by character, there are other weights that affect the collation algorithm. Accented characters are treated in different ways in different locales, and multi-script combinations (Japanese with Russian, e.g.) introduce other complexities. The Unicode collation algorithm itself is explained at http://www.unicode.org/unicode/reports/tr10/#AllKeys which gives some insight into the complexities involved. If you need to get an Autorecode result that is consistent between SPSS 15 and 16, use one or the other to produce a template (AUTORECODE ... /SAVE TEMPLATE=) and apply that template (AUTORECODE ... /APPLY TEMPLATE) in the other version. Using a template with Autorecode when you need a stable recoding is always a good idea, because otherwise new values that occur (or values that disappear) will cause the recode for the other values to change. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Antoon Smulders Sent: Tuesday, March 04, 2008 3:04 AM To: [hidden email] Subject: [SPSSX-L] sorting in version 15 and 16 Dear list, I recently noticed that the sorting order of SPSS 15 differs from SPSS 16, (at least *my* installations of these versions. The following syntax results in different output for the non-alfabethic characters (, *, ^). DATA LIST FIXED /txt 1-6 (A). BEGIN DATA. * test (test) test ^ test END DATA. SORT CASES BY txt. echo "version 16". LIST. In version 15 it results in: (test) * test ^ test test while in version 16 it gives: ^ test (test) * test test It seems to affect non-alfabethic characters. The same is true for AUTORECODE, giving different numbers to different values. Is there a setting I overlooked? Thanks ahead, Antoon Smulders, Advies- en Onderzoeksgroep Beke 026-4438619 www.beke.nl ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
