AHEX32 as ID in Aggregate

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

AHEX32 as ID in Aggregate

djhurio
Hi,

I have two data files from MS SQL server where variable ID is stored
in AHEX32 format. I have to aggregate one file by ID and match with
another one. The problem is that for some cases IDs are changing after
aggregate. Why some IDs are changing?

Here is an example:

****** Problem with AHEX32.

data list free
 /id_a (ahex32).
begin data
0AEFF930E6925445A3ADCF60CF3F3F3D
407852ED1BF3BC4880CDE0D95DE1F3FF
756B0AA6B2C43F449B05C9E704ABF2FF
end data.

string id_ahex (a16).
comp id_ahex=id_a.

exe.

formats id_ahex (ahex32).

list var=id_ahex.

agg out *
 /break all
 /n=n.

list var=id_ahex.


************* Output what I got:


id_ahex

0AEFF930E6925445A3ADCF60CF3F3F3D
407852ED1BF3BC4880CDE0D95DE1F3FF
756B0AA6B2C43F449B05C9E704ABF2FF

Number of cases read:  3    Number of cases listed:  3


id_ahex

0AEFF930E6925445A3ADCF60CF3F3F3D
407852ED1BF3BC4880CDE0D95DE1FBFF
756B0AA6B2C43F449B05C9E704ABFAFF

Number of cases read:  3    Number of cases listed:  3


We can see that the first ID has not changed but two others are. The
ending of send ID has changed from F3FF to FBFF and similary for the
third one F2FF to FAFF.


I am using PASW Statistics 17.0.3


Happy New Year!
Martins




--
Mārtiņš Liberts
+371 26 411 543 (mobile)
+371 67 366 618 (office)
http://djhurio.wordpress.com/
http://twitter.com/djhurio

Get informed about The Open Document Format if there are files
attached with extensions *.odt, *.ods, *.odp, *.od?
http://en.wikipedia.org/wiki/OpenDocument

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

djhurio
Changing "Character Encoding for Data and Syntax" from Locale to Unicode solved the problem with data sets. Although the example I have provided still gives strange results.

M.
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

Art Kendall
In reply to this post by djhurio
Maybe some of this will help move toward a solution.

Note that the AHEX format is for standard characters. Many of the character pairs seems to be outside the printable range.

In help search for AHEX . click on the "IMPORT/EXPORT Character Sets" link. Note what position 16 (0A) and 255 (FF) mean in the different character sets.

perhaps you can the id to a new field with a different format before you retrieve the data from the data base.

try the syntax snippets below on the originating and target platforms.
The first shows what most of the standard character are on your system

Please explain the context in which this arises.
16 alpha characters is a long id string in many contexts.

The AHEX format is used to read the hexadecimal representation of standard characters. Each set of two hexadecimal characters represents one standard character. For codes used on different operating systems, see IMPORT/EXPORT Character Sets.

•� The w specification refers to columns of the hexadecimal representation and must be an even number. Leading, trailing, and embedded blanks are not allowed, and only valid hexadecimal characters can be used in input values.

•� For some operating systems (e.g., IBM CMS), letters in hexadecimal values must be upper case.

•� The default output format for variables read with the AHEX input format is the A format. The default width is half the specified input width. For example, an input format of AHEX14 generates an output format of A7.

•� Used as an output format, the AHEX format displays the printable characters in the hexadecimal characters specific to your system. The following commands run on a UNIX system--where A=41 (decimal 65), a=61 (decimal 97), and so on--produce the output shown below:

DATA LIST FIXED
   /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
FORMATS ALL (AHEX2).
BEGIN DATA
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
END DATA.
LIST.
DATA LIST FIXED
� � /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
FORMATS ALL (AHEX2).
BEGIN DATA
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
01234567890.,?+-=!@#$%^&*(
END DATA.
LIST.


new file.
data list fixed
� /id_ahex1 to id_ahex16 (16ahex2).
begin data
0AEFF930E6925445A3ADCF60CF3F3F3D
407852ED1BF3BC4880CDE0D95DE1F3FF
756B0AA6B2C43F449B05C9E704ABF2FF
end data.
list.


Art Kendall
Social Research Consultants



On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
Hi,

I have two data files from MS SQL server where variable ID is stored
in AHEX32 format. I have to aggregate one file by ID and match with
another one. The problem is that for some cases IDs are changing after
aggregate. Why some IDs are changing?

Here is an example:

****** Problem with AHEX32.

data list free
 /id_a (ahex32).
begin data
0AEFF930E6925445A3ADCF60CF3F3F3D
407852ED1BF3BC4880CDE0D95DE1F3FF
756B0AA6B2C43F449B05C9E704ABF2FF
end data.

string id_ahex (a16).
comp id_ahex=id_a.

exe.

formats id_ahex (ahex32).

list var=id_ahex.

agg out *
 /break all
 /n=n.

list var=id_ahex.


************* Output what I got:


id_ahex

0AEFF930E6925445A3ADCF60CF3F3F3D
407852ED1BF3BC4880CDE0D95DE1F3FF
756B0AA6B2C43F449B05C9E704ABF2FF

Number of cases read:  3    Number of cases listed:  3


id_ahex

0AEFF930E6925445A3ADCF60CF3F3F3D
407852ED1BF3BC4880CDE0D95DE1FBFF
756B0AA6B2C43F449B05C9E704ABFAFF

Number of cases read:  3    Number of cases listed:  3


We can see that the first ID has not changed but two others are. The
ending of send ID has changed from F3FF to FBFF and similary for the
third one F2FF to FAFF.


I am using PASW Statistics 17.0.3


Happy New Year!
Martins




--
Mārtiņš Liberts
+371 26 411 543 (mobile)
+371 67 366 618 (office)
http://djhurio.wordpress.com/
http://twitter.com/djhurio

Get informed about The Open Document Format if there are files
attached with extensions *.odt, *.ods, *.odp, *.od?
http://en.wikipedia.org/wiki/OpenDocument

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

Jon K Peck

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        01/03/2011 04:31 PM
Subject:        Re: [SPSSX-L] AHEX32 as ID in Aggregate
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Maybe some of this will help move toward a solution.

Note that the AHEX format is for standard characters. Many of the character pairs seems to be outside the printable range.


>>>Actually, I find AHEX most useful when trying to track down nonprinting characters.  AHEX is valid for any characters.  In Unicode mode, it would show the 1-4 byte codes for the characters.

However, in this particular case, it seems that the database field was probably encoded in Unicode and Statistics was running in code page mode (cp 1257) , since switching to Unicode mode seems to have mostly cleared things up.







===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

djhurio
In reply to this post by Art Kendall
It is getting more and more interesting.

Jon.
I looked on IDs directly on server. For example. The ID on server is
"ED527840-F31B-48BC-80CD-E0D95DE1F3FF". It is translated to
"407852EFBFBD1BEFBFBD48EFBFBDEFBF" in SPSS (under Unicode).

Another example "D92E2998-1AC7-41FC-B9CF-8F725FE6E1C9" (MS SQL) -->
"EFBFBD292EEFBFBDEFBFBD1AEFBFBD41" (SPSS).

It is OK for my while the translation gives unique ID and I do not
have to import data back to server (I could try to do that).


Art.
I can not run your code on a server because it is MS SQL server.


Martins




2011/1/4 Art Kendall <[hidden email]>:

> Maybe some of this will help move toward a solution.
>
> Note that the AHEX format is for standard characters. Many of the character
> pairs seems to be outside the printable range.
>
> In help search for AHEX . click on the "IMPORT/EXPORT Character Sets" link.
> Note what position 16 (0A) and 255 (FF) mean in the different character
> sets.
>
> perhaps you can the id to a new field with a different format before you
> retrieve the data from the data base.
>
> try the syntax snippets below on the originating and target platforms.
> The first shows what most of the standard character are on your system
>
> Please explain the context in which this arises.
> 16 alpha characters is a long id string in many contexts.
>
> The AHEX format is used to read the hexadecimal representation of standard
> characters. Each set of two hexadecimal characters represents one standard
> character. For codes used on different operating systems, see IMPORT/EXPORT
> Character Sets.
>
> •� The w specification refers to columns of the hexadecimal representation
> and must be an even number. Leading, trailing, and embedded blanks are not
> allowed, and only valid hexadecimal characters can be used in input values.
>
> •� For some operating systems (e.g., IBM CMS), letters in hexadecimal values
> must be upper case.
>
> •� The default output format for variables read with the AHEX input format is
> the A format. The default width is half the specified input width. For
> example, an input format of AHEX14 generates an output format of A7.
>
> •� Used as an output format, the AHEX format displays the printable
> characters in the hexadecimal characters specific to your system. The
> following commands run on a UNIX system--where A=41 (decimal 65), a=61
> (decimal 97), and so on--produce the output shown below:
>
> DATA LIST FIXED
>    /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
> FORMATS ALL (AHEX2).
> BEGIN DATA
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> abcdefghijklmnopqrstuvwxyz
> END DATA.
> LIST.
>
> DATA LIST FIXED
> � �  /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
> FORMATS ALL (AHEX2).
> BEGIN DATA
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> abcdefghijklmnopqrstuvwxyz
> 01234567890.,?+-=!@#$%^&*(
> END DATA.
> LIST.
>
> new file.
> data list fixed
> � /id_ahex1 to id_ahex16 (16ahex2).
> begin data
> 0AEFF930E6925445A3ADCF60CF3F3F3D
> 407852ED1BF3BC4880CDE0D95DE1F3FF
> 756B0AA6B2C43F449B05C9E704ABF2FF
> end data.
> list.
>
> Art Kendall
> Social Research Consultants
>
>
>
> On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
>
> Hi,
>
> I have two data files from MS SQL server where variable ID is stored
> in AHEX32 format. I have to aggregate one file by ID and match with
> another one. The problem is that for some cases IDs are changing after
> aggregate. Why some IDs are changing?
>
> Here is an example:
>
> ****** Problem with AHEX32.
>
> data list free
>  /id_a (ahex32).
> begin data
> 0AEFF930E6925445A3ADCF60CF3F3F3D
> 407852ED1BF3BC4880CDE0D95DE1F3FF
> 756B0AA6B2C43F449B05C9E704ABF2FF
> end data.
>
> string id_ahex (a16).
> comp id_ahex=id_a.
>
> exe.
>
> formats id_ahex (ahex32).
>
> list var=id_ahex.
>
> agg out *
>  /break all
>  /n=n.
>
> list var=id_ahex.
>
>
> ************* Output what I got:
>
>
> id_ahex
>
> 0AEFF930E6925445A3ADCF60CF3F3F3D
> 407852ED1BF3BC4880CDE0D95DE1F3FF
> 756B0AA6B2C43F449B05C9E704ABF2FF
>
> Number of cases read:  3    Number of cases listed:  3
>
>
> id_ahex
>
> 0AEFF930E6925445A3ADCF60CF3F3F3D
> 407852ED1BF3BC4880CDE0D95DE1FBFF
> 756B0AA6B2C43F449B05C9E704ABFAFF
>
> Number of cases read:  3    Number of cases listed:  3
>
>
> We can see that the first ID has not changed but two others are. The
> ending of send ID has changed from F3FF to FBFF and similary for the
> third one F2FF to FAFF.
>
>
> I am using PASW Statistics 17.0.3
>
>
> Happy New Year!
> Martins
>
>
>
>
> --
> Mārtiņš Liberts
> +371 26 411 543 (mobile)
> +371 67 366 618 (office)
> http://djhurio.wordpress.com/
> http://twitter.com/djhurio
>
> Get informed about The Open Document Format if there are files
> attached with extensions *.odt, *.ods, *.odp, *.od?
> http://en.wikipedia.org/wiki/OpenDocument
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>



--
Mārtiņš Liberts
+371 26 411 543 (mobile)
+371 67 366 618 (office)
http://djhurio.wordpress.com/
http://twitter.com/djhurio

Get informed about The Open Document Format if there are files
attached with extensions *.odt, *.ods, *.odp, *.od?
http://en.wikipedia.org/wiki/OpenDocument

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

Art Kendall
Sorry. Since this is an SPSS list I thought you had SPSS.

Art

On 1/4/2011 6:26 AM, Mārtiņš Liberts wrote:

> It is getting more and more interesting.
>
> Jon.
> I looked on IDs directly on server. For example. The ID on server is
> "ED527840-F31B-48BC-80CD-E0D95DE1F3FF". It is translated to
> "407852EFBFBD1BEFBFBD48EFBFBDEFBF" in SPSS (under Unicode).
>
> Another example "D92E2998-1AC7-41FC-B9CF-8F725FE6E1C9" (MS SQL) -->
> "EFBFBD292EEFBFBDEFBFBD1AEFBFBD41" (SPSS).
>
> It is OK for my while the translation gives unique ID and I do not
> have to import data back to server (I could try to do that).
>
>
> Art.
> I can not run your code on a server because it is MS SQL server.
>
>
> Martins
>
>
>
>
> 2011/1/4 Art Kendall<[hidden email]>:
>> Maybe some of this will help move toward a solution.
>>
>> Note that the AHEX format is for standard characters. Many of the character
>> pairs seems to be outside the printable range.
>>
>> In help search for AHEX . click on the "IMPORT/EXPORT Character Sets" link.
>> Note what position 16 (0A) and 255 (FF) mean in the different character
>> sets.
>>
>> perhaps you can the id to a new field with a different format before you
>> retrieve the data from the data base.
>>
>> try the syntax snippets below on the originating and target platforms.
>> The first shows what most of the standard character are on your system
>>
>> Please explain the context in which this arises.
>> 16 alpha characters is a long id string in many contexts.
>>
>> The AHEX format is used to read the hexadecimal representation of standard
>> characters. Each set of two hexadecimal characters represents one standard
>> character. For codes used on different operating systems, see IMPORT/EXPORT
>> Character Sets.
>>
>> •� The w specification refers to columns of the hexadecimal representation
>> and must be an even number. Leading, trailing, and embedded blanks are not
>> allowed, and only valid hexadecimal characters can be used in input values.
>>
>> •� For some operating systems (e.g., IBM CMS), letters in hexadecimal values
>> must be upper case.
>>
>> •� The default output format for variables read with the AHEX input format is
>> the A format. The default width is half the specified input width. For
>> example, an input format of AHEX14 generates an output format of A7.
>>
>> •� Used as an output format, the AHEX format displays the printable
>> characters in the hexadecimal characters specific to your system. The
>> following commands run on a UNIX system--where A=41 (decimal 65), a=61
>> (decimal 97), and so on--produce the output shown below:
>>
>> DATA LIST FIXED
>>     /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>> FORMATS ALL (AHEX2).
>> BEGIN DATA
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>> abcdefghijklmnopqrstuvwxyz
>> END DATA.
>> LIST.
>>
>> DATA LIST FIXED
>> � �  /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>> FORMATS ALL (AHEX2).
>> BEGIN DATA
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>> abcdefghijklmnopqrstuvwxyz
>> 01234567890.,?+-=!@#$%^&*(
>> END DATA.
>> LIST.
>>
>> new file.
>> data list fixed
>> � /id_ahex1 to id_ahex16 (16ahex2).
>> begin data
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>> end data.
>> list.
>>
>> Art Kendall
>> Social Research Consultants
>>
>>
>>
>> On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
>>
>> Hi,
>>
>> I have two data files from MS SQL server where variable ID is stored
>> in AHEX32 format. I have to aggregate one file by ID and match with
>> another one. The problem is that for some cases IDs are changing after
>> aggregate. Why some IDs are changing?
>>
>> Here is an example:
>>
>> ****** Problem with AHEX32.
>>
>> data list free
>>   /id_a (ahex32).
>> begin data
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>> end data.
>>
>> string id_ahex (a16).
>> comp id_ahex=id_a.
>>
>> exe.
>>
>> formats id_ahex (ahex32).
>>
>> list var=id_ahex.
>>
>> agg out *
>>   /break all
>>   /n=n.
>>
>> list var=id_ahex.
>>
>>
>> ************* Output what I got:
>>
>>
>> id_ahex
>>
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>
>> Number of cases read:  3    Number of cases listed:  3
>>
>>
>> id_ahex
>>
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1FBFF
>> 756B0AA6B2C43F449B05C9E704ABFAFF
>>
>> Number of cases read:  3    Number of cases listed:  3
>>
>>
>> We can see that the first ID has not changed but two others are. The
>> ending of send ID has changed from F3FF to FBFF and similary for the
>> third one F2FF to FAFF.
>>
>>
>> I am using PASW Statistics 17.0.3
>>
>>
>> Happy New Year!
>> Martins
>>
>>
>>
>>
>> --
>> Mārtiņš Liberts
>> +371 26 411 543 (mobile)
>> +371 67 366 618 (office)
>> http://djhurio.wordpress.com/
>> http://twitter.com/djhurio
>>
>> Get informed about The Open Document Format if there are files
>> attached with extensions *.odt, *.ods, *.odp, *.od?
>> http://en.wikipedia.org/wiki/OpenDocument
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
> --
> Mārtiņš Liberts
> +371 26 411 543 (mobile)
> +371 67 366 618 (office)
> http://djhurio.wordpress.com/
> http://twitter.com/djhurio
>
> Get informed about The Open Document Format if there are files
> attached with extensions *.odt, *.ods, *.odp, *.od?
> http://en.wikipedia.org/wiki/OpenDocument
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

Jon K Peck
In reply to this post by djhurio
The first puzzle is what these hex codes represent in the database.
Take the first sequence.
ED52 is not a valid utf-8 code
ED52 as a utf-16 code would be a character in the Unicode private use area (PUA)
ED in code page 1257 would be k with cedilla below
ED in code page 1252 would be i with acute accent.
There are plenty of other character encodings that might be involved.

The second puzzle is what the ODBC converter is doing.  If these are really PUA characters, then in SPSS code page mode, they should all have become question marks (code 3F).

Can you view the database field as characters?

One possibility is that these codes were deliberately chosen as Unicode PUA values in order never to match real characters.

Regards,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Mārtiņš Liberts <[hidden email]>
To:        [hidden email]
Date:        01/04/2011 04:30 AM
Subject:        Re: [SPSSX-L] AHEX32 as ID in Aggregate
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




It is getting more and more interesting.

Jon.
I looked on IDs directly on server. For example. The ID on server is
"ED527840-F31B-48BC-80CD-E0D95DE1F3FF". It is translated to
"407852EFBFBD1BEFBFBD48EFBFBDEFBF" in SPSS (under Unicode).

Another example "D92E2998-1AC7-41FC-B9CF-8F725FE6E1C9" (MS SQL) -->
"EFBFBD292EEFBFBDEFBFBD1AEFBFBD41" (SPSS).

It is OK for my while the translation gives unique ID and I do not
have to import data back to server (I could try to do that).


Art.
I can not run your code on a server because it is MS SQL server.


Martins




2011/1/4 Art Kendall <[hidden email]>:
> Maybe some of this will help move toward a solution.
>
> Note that the AHEX format is for standard characters. Many of the character
> pairs seems to be outside the printable range.
>
> In help search for AHEX . click on the "IMPORT/EXPORT Character Sets" link.
> Note what position 16 (0A) and 255 (FF) mean in the different character
> sets.
>
> perhaps you can the id to a new field with a different format before you
> retrieve the data from the data base.
>
> try the syntax snippets below on the originating and target platforms.
> The first shows what most of the standard character are on your system
>
> Please explain the context in which this arises.
> 16 alpha characters is a long id string in many contexts.
>
> The AHEX format is used to read the hexadecimal representation of standard
> characters. Each set of two hexadecimal characters represents one standard
> character. For codes used on different operating systems, see IMPORT/EXPORT
> Character Sets.
>
> • The w specification refers to columns of the hexadecimal representation
> and must be an even number. Leading, trailing, and embedded blanks are not
> allowed, and only valid hexadecimal characters can be used in input values.
>
> • For some operating systems (e.g., IBM CMS), letters in hexadecimal values
> must be upper case.
>
> • The default output format for variables read with the AHEX input format is
> the A format. The default width is half the specified input width. For
> example, an input format of AHEX14 generates an output format of A7.
>
> • Used as an output format, the AHEX format displays the printable
> characters in the hexadecimal characters specific to your system. The
> following commands run on a UNIX system--where A=41 (decimal 65), a=61
> (decimal 97), and so on--produce the output shown below:
>
> DATA LIST FIXED
>    /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
> FORMATS ALL (AHEX2).
> BEGIN DATA
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> abcdefghijklmnopqrstuvwxyz
> END DATA.
> LIST.
>
> DATA LIST FIXED
>    /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
> FORMATS ALL (AHEX2).
> BEGIN DATA
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> abcdefghijklmnopqrstuvwxyz
> 01234567890.,?+-=!@#$%^&*(
> END DATA.
> LIST.
>
> new file.
> data list fixed
>  /id_ahex1 to id_ahex16 (16ahex2).
> begin data
> 0AEFF930E6925445A3ADCF60CF3F3F3D
> 407852ED1BF3BC4880CDE0D95DE1F3FF
> 756B0AA6B2C43F449B05C9E704ABF2FF
> end data.
> list.
>
> Art Kendall
> Social Research Consultants
>
>
>
> On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
>
> Hi,
>
> I have two data files from MS SQL server where variable ID is stored
> in AHEX32 format. I have to aggregate one file by ID and match with
> another one. The problem is that for some cases IDs are changing after
> aggregate. Why some IDs are changing?
>
> Here is an example:
>
> ****** Problem with AHEX32.
>
> data list free
>  /id_a (ahex32).
> begin data
> 0AEFF930E6925445A3ADCF60CF3F3F3D
> 407852ED1BF3BC4880CDE0D95DE1F3FF
> 756B0AA6B2C43F449B05C9E704ABF2FF
> end data.
>
> string id_ahex (a16).
> comp id_ahex=id_a.
>
> exe.
>
> formats id_ahex (ahex32).
>
> list var=id_ahex.
>
> agg out *
>  /break all
>  /n=n.
>
> list var=id_ahex.
>
>
> ************* Output what I got:
>
>
> id_ahex
>
> 0AEFF930E6925445A3ADCF60CF3F3F3D
> 407852ED1BF3BC4880CDE0D95DE1F3FF
> 756B0AA6B2C43F449B05C9E704ABF2FF
>
> Number of cases read:  3    Number of cases listed:  3
>
>
> id_ahex
>
> 0AEFF930E6925445A3ADCF60CF3F3F3D
> 407852ED1BF3BC4880CDE0D95DE1FBFF
> 756B0AA6B2C43F449B05C9E704ABFAFF
>
> Number of cases read:  3    Number of cases listed:  3
>
>
> We can see that the first ID has not changed but two others are. The
> ending of send ID has changed from F3FF to FBFF and similary for the
> third one F2FF to FAFF.
>
>
> I am using PASW Statistics 17.0.3
>
>
> Happy New Year!
> Martins
>
>
>
>
> --
> Mārtiņš Liberts
> +371 26 411 543 (mobile)
> +371 67 366 618 (office)
>
http://djhurio.wordpress.com/
>
http://twitter.com/djhurio
>
> Get informed about The Open Document Format if there are files
> attached with extensions *.odt, *.ods, *.odp, *.od?
>
http://en.wikipedia.org/wiki/OpenDocument
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>



--
Mārtiņš Liberts
+371 26 411 543 (mobile)
+371 67 366 618 (office)
http://djhurio.wordpress.com/
http://twitter.com/djhurio

Get informed about The Open Document Format if there are files
attached with extensions *.odt, *.ods, *.odp, *.od?
http://en.wikipedia.org/wiki/OpenDocument

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

djhurio
Art. Data are stored in MS SQL server. I am importing the data in SPSS
- so it is SPSS related question.

Jon. Data type of ID on SQL server is "uniqueidentifier". It is GUID:
http://en.wikipedia.org/wiki/Globally_unique_identifier
Does this help?


Martins


2011/1/4 Jon K Peck <[hidden email]>:

> The first puzzle is what these hex codes represent in the database.
> Take the first sequence.
> ED52 is not a valid utf-8 code
> ED52 as a utf-16 code would be a character in the Unicode private use area
> (PUA)
> ED in code page 1257 would be k with cedilla below
> ED in code page 1252 would be i with acute accent.
> There are plenty of other character encodings that might be involved.
>
> The second puzzle is what the ODBC converter is doing. � If these are really
> PUA characters, then in SPSS code page mode, they should all have become
> question marks (code 3F).
>
> Can you view the database field as characters?
>
> One possibility is that these codes were deliberately chosen as Unicode PUA
> values in order never to match real characters.
>
> Regards,
>
> Jon Peck
> Senior Software Engineer, IBM
> [hidden email]
> 312-651-3435
>
>
>
> From: �  �  �  � Mārtiņš Liberts <[hidden email]>
> To: �  �  �  � [hidden email]
> Date: �  �  �  � 01/04/2011 04:30 AM
> Subject: �  �  �  � Re: [SPSSX-L] AHEX32 as ID in Aggregate
> Sent by: �  �  �  � "SPSSX(r) Discussion" <[hidden email]>
> ________________________________
>
>
> It is getting more and more interesting.
>
> Jon.
> I looked on IDs directly on server. For example. The ID on server is
> "ED527840-F31B-48BC-80CD-E0D95DE1F3FF". It is translated to
> "407852EFBFBD1BEFBFBD48EFBFBDEFBF" in SPSS (under Unicode).
>
> Another example "D92E2998-1AC7-41FC-B9CF-8F725FE6E1C9" (MS SQL) -->
> "EFBFBD292EEFBFBDEFBFBD1AEFBFBD41" (SPSS).
>
> It is OK for my while the translation gives unique ID and I do not
> have to import data back to server (I could try to do that).
>
>
> Art.
> I can not run your code on a server because it is MS SQL server.
>
>
> Martins
>
>
>
>
> 2011/1/4 Art Kendall <[hidden email]>:
>> Maybe some of this will help move toward a solution.
>>
>> Note that the AHEX format is for standard characters. Many of the
>> character
>> pairs seems to be outside the printable range.
>>
>> In help search for AHEX . click on the "IMPORT/EXPORT Character Sets"
>> link.
>> Note what position 16 (0A) and 255 (FF) mean in the different character
>> sets.
>>
>> perhaps you can the id to a new field with a different format before you
>> retrieve the data from the data base.
>>
>> try the syntax snippets below on the originating and target platforms.
>> The first shows what most of the standard character are on your system
>>
>> Please explain the context in which this arises.
>> 16 alpha characters is a long id string in many contexts.
>>
>> The AHEX format is used to read the hexadecimal representation of standard
>> characters. Each set of two hexadecimal characters represents one standard
>> character. For codes used on different operating systems, see
>> IMPORT/EXPORT
>> Character Sets.
>>
>> • The w specification refers to columns of the hexadecimal representation
>> and must be an even number. Leading, trailing, and embedded blanks are not
>> allowed, and only valid hexadecimal characters can be used in input
>> values.
>>
>> • For some operating systems (e.g., IBM CMS), letters in hexadecimal
>> values
>> must be upper case.
>>
>> • The default output format for variables read with the AHEX input format
>> is
>> the A format. The default width is half the specified input width. For
>> example, an input format of AHEX14 generates an output format of A7.
>>
>> • Used as an output format, the AHEX format displays the printable
>> characters in the hexadecimal characters specific to your system. The
>> following commands run on a UNIX system--where A=41 (decimal 65), a=61
>> (decimal 97), and so on--produce the output shown below:
>>
>> DATA LIST FIXED
>> �  � /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>> FORMATS ALL (AHEX2).
>> BEGIN DATA
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>> abcdefghijklmnopqrstuvwxyz
>> END DATA.
>> LIST.
>>
>> DATA LIST FIXED
>> �  � /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>> FORMATS ALL (AHEX2).
>> BEGIN DATA
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>> abcdefghijklmnopqrstuvwxyz
>> 01234567890.,?+-=!@#$%^&*(
>> END DATA.
>> LIST.
>>
>> new file.
>> data list fixed
>> � /id_ahex1 to id_ahex16 (16ahex2).
>> begin data
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>> end data.
>> list.
>>
>> Art Kendall
>> Social Research Consultants
>>
>>
>>
>> On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
>>
>> Hi,
>>
>> I have two data files from MS SQL server where variable ID is stored
>> in AHEX32 format. I have to aggregate one file by ID and match with
>> another one. The problem is that for some cases IDs are changing after
>> aggregate. Why some IDs are changing?
>>
>> Here is an example:
>>
>> ****** Problem with AHEX32.
>>
>> data list free
>> � /id_a (ahex32).
>> begin data
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>> end data.
>>
>> string id_ahex (a16).
>> comp id_ahex=id_a.
>>
>> exe.
>>
>> formats id_ahex (ahex32).
>>
>> list var=id_ahex.
>>
>> agg out *
>> � /break all
>> � /n=n.
>>
>> list var=id_ahex.
>>
>>
>> ************* Output what I got:
>>
>>
>> id_ahex
>>
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>
>> Number of cases read: � 3 �  � Number of cases listed: � 3
>>
>>
>> id_ahex
>>
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1FBFF
>> 756B0AA6B2C43F449B05C9E704ABFAFF
>>
>> Number of cases read: � 3 �  � Number of cases listed: � 3
>>
>>
>> We can see that the first ID has not changed but two others are. The
>> ending of send ID has changed from F3FF to FBFF and similary for the
>> third one F2FF to FAFF.
>>
>>
>> I am using PASW Statistics 17.0.3
>>
>>
>> Happy New Year!
>> Martins
>>
>>
>>
>>
>> --
>> Mārtiņš Liberts
>> +371 26 411 543 (mobile)
>> +371 67 366 618 (office)
>> http://djhurio.wordpress.com/
>> http://twitter.com/djhurio
>>
>> Get informed about The Open Document Format if there are files
>> attached with extensions *.odt, *.ods, *.odp, *.od?
>> http://en.wikipedia.org/wiki/OpenDocument
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
>
> --
> Mārtiņš Liberts
> +371 26 411 543 (mobile)
> +371 67 366 618 (office)
> http://djhurio.wordpress.com/
> http://twitter.com/djhurio
>
> Get informed about The Open Document Format if there are files
> attached with extensions *.odt, *.ods, *.odp, *.od?
> http://en.wikipedia.org/wiki/OpenDocument
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>



--
Mārtiņš Liberts
+371 26 411 543 (mobile)
+371 67 366 618 (office)
http://djhurio.wordpress.com/
http://twitter.com/djhurio

Get informed about The Open Document Format if there are files
attached with extensions *.odt, *.ods, *.odp, *.od?
http://en.wikipedia.org/wiki/OpenDocument

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

Jon K Peck
Ah, that's the answer: GUIDs are not characters, so converting them as such does not make sense.  GUIDs are just hex identifiers.

I don't know what the ODBC driver would do in this situation - the behavior is probably database specific.  You could, however, transform the GUID field into something else in the SQL expression that does the extraction.  For example,

{ fn CONVERT( thatguid, SQL_CHAR ) }

See
http://msdn.microsoft.com/en-us/library/ms715381%28v=VS.85%29.aspx for more information.  I don't know which conversions would be valid for a field of this type.

If the current behavior is 1-1, this may not matter, but I don't know whether that is the case.

HTH,
Jon Peck

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Mārtiņš Liberts <[hidden email]>
To:        [hidden email]
Date:        01/04/2011 07:46 AM
Subject:        Re: [SPSSX-L] AHEX32 as ID in Aggregate
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Art. Data are stored in MS SQL server. I am importing the data in SPSS
- so it is SPSS related question.

Jon. Data type of ID on SQL server is "uniqueidentifier". It is GUID:
http://en.wikipedia.org/wiki/Globally_unique_identifier
Does this help?


Martins


2011/1/4 Jon K Peck <[hidden email]>:
> The first puzzle is what these hex codes represent in the database.
> Take the first sequence.
> ED52 is not a valid utf-8 code
> ED52 as a utf-16 code would be a character in the Unicode private use area
> (PUA)
> ED in code page 1257 would be k with cedilla below
> ED in code page 1252 would be i with acute accent.
> There are plenty of other character encodings that might be involved.
>
> The second puzzle is what the ODBC converter is doing.  If these are really
> PUA characters, then in SPSS code page mode, they should all have become
> question marks (code 3F).
>
> Can you view the database field as characters?
>
> One possibility is that these codes were deliberately chosen as Unicode PUA
> values in order never to match real characters.
>
> Regards,
>
> Jon Peck
> Senior Software Engineer, IBM
> [hidden email]
> 312-651-3435
>
>
>
> From:        Mārtiņš Liberts <[hidden email]>
> To:        [hidden email]
> Date:        01/04/2011 04:30 AM
> Subject:        Re: [SPSSX-L] AHEX32 as ID in Aggregate
> Sent by:        "SPSSX(r) Discussion" <[hidden email]>
> ________________________________
>
>
> It is getting more and more interesting.
>
> Jon.
> I looked on IDs directly on server. For example. The ID on server is
> "ED527840-F31B-48BC-80CD-E0D95DE1F3FF". It is translated to
> "407852EFBFBD1BEFBFBD48EFBFBDEFBF" in SPSS (under Unicode).
>
> Another example "D92E2998-1AC7-41FC-B9CF-8F725FE6E1C9" (MS SQL) -->
> "EFBFBD292EEFBFBDEFBFBD1AEFBFBD41" (SPSS).
>
> It is OK for my while the translation gives unique ID and I do not
> have to import data back to server (I could try to do that).
>
>
> Art.
> I can not run your code on a server because it is MS SQL server.
>
>
> Martins
>
>
>
>
> 2011/1/4 Art Kendall <[hidden email]>:
>> Maybe some of this will help move toward a solution.
>>
>> Note that the AHEX format is for standard characters. Many of the
>> character
>> pairs seems to be outside the printable range.
>>
>> In help search for AHEX . click on the "IMPORT/EXPORT Character Sets"
>> link.
>> Note what position 16 (0A) and 255 (FF) mean in the different character
>> sets.
>>
>> perhaps you can the id to a new field with a different format before you
>> retrieve the data from the data base.
>>
>> try the syntax snippets below on the originating and target platforms.
>> The first shows what most of the standard character are on your system
>>
>> Please explain the context in which this arises.
>> 16 alpha characters is a long id string in many contexts.
>>
>> The AHEX format is used to read the hexadecimal representation of standard
>> characters. Each set of two hexadecimal characters represents one standard
>> character. For codes used on different operating systems, see
>> IMPORT/EXPORT
>> Character Sets.
>>
>> • The w specification refers to columns of the hexadecimal representation
>> and must be an even number. Leading, trailing, and embedded blanks are not
>> allowed, and only valid hexadecimal characters can be used in input
>> values.
>>
>> • For some operating systems (e.g., IBM CMS), letters in hexadecimal
>> values
>> must be upper case.
>>
>> • The default output format for variables read with the AHEX input format
>> is
>> the A format. The default width is half the specified input width. For
>> example, an input format of AHEX14 generates an output format of A7.
>>
>> • Used as an output format, the AHEX format displays the printable
>> characters in the hexadecimal characters specific to your system. The
>> following commands run on a UNIX system--where A=41 (decimal 65), a=61
>> (decimal 97), and so on--produce the output shown below:
>>
>> DATA LIST FIXED
>>    /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>> FORMATS ALL (AHEX2).
>> BEGIN DATA
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>> abcdefghijklmnopqrstuvwxyz
>> END DATA.
>> LIST.
>>
>> DATA LIST FIXED
>>    /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>> FORMATS ALL (AHEX2).
>> BEGIN DATA
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>> abcdefghijklmnopqrstuvwxyz
>> 01234567890.,?+-=!@#$%^&*(
>> END DATA.
>> LIST.
>>
>> new file.
>> data list fixed
>>  /id_ahex1 to id_ahex16 (16ahex2).
>> begin data
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>> end data.
>> list.
>>
>> Art Kendall
>> Social Research Consultants
>>
>>
>>
>> On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
>>
>> Hi,
>>
>> I have two data files from MS SQL server where variable ID is stored
>> in AHEX32 format. I have to aggregate one file by ID and match with
>> another one. The problem is that for some cases IDs are changing after
>> aggregate. Why some IDs are changing?
>>
>> Here is an example:
>>
>> ****** Problem with AHEX32.
>>
>> data list free
>>  /id_a (ahex32).
>> begin data
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>> end data.
>>
>> string id_ahex (a16).
>> comp id_ahex=id_a.
>>
>> exe.
>>
>> formats id_ahex (ahex32).
>>
>> list var=id_ahex.
>>
>> agg out *
>>  /break all
>>  /n=n.
>>
>> list var=id_ahex.
>>
>>
>> ************* Output what I got:
>>
>>
>> id_ahex
>>
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>
>> Number of cases read:  3    Number of cases listed:  3
>>
>>
>> id_ahex
>>
>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>> 407852ED1BF3BC4880CDE0D95DE1FBFF
>> 756B0AA6B2C43F449B05C9E704ABFAFF
>>
>> Number of cases read:  3    Number of cases listed:  3
>>
>>
>> We can see that the first ID has not changed but two others are. The
>> ending of send ID has changed from F3FF to FBFF and similary for the
>> third one F2FF to FAFF.
>>
>>
>> I am using PASW Statistics 17.0.3
>>
>>
>> Happy New Year!
>> Martins
>>
>>
>>
>>
>> --
>> Mārtiņš Liberts
>> +371 26 411 543 (mobile)
>> +371 67 366 618 (office)
>>
http://djhurio.wordpress.com/
>>
http://twitter.com/djhurio
>>
>> Get informed about The Open Document Format if there are files
>> attached with extensions *.odt, *.ods, *.odp, *.od?
>>
http://en.wikipedia.org/wiki/OpenDocument
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
>
> --
> Mārtiņš Liberts
> +371 26 411 543 (mobile)
> +371 67 366 618 (office)
>
http://djhurio.wordpress.com/
>
http://twitter.com/djhurio
>
> Get informed about The Open Document Format if there are files
> attached with extensions *.odt, *.ods, *.odp, *.od?
>
http://en.wikipedia.org/wiki/OpenDocument
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>



--
Mārtiņš Liberts
+371 26 411 543 (mobile)
+371 67 366 618 (office)
http://djhurio.wordpress.com/
http://twitter.com/djhurio

Get informed about The Open Document Format if there are files
attached with extensions *.odt, *.ods, *.odp, *.od?
http://en.wikipedia.org/wiki/OpenDocument

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

Art Kendall
In reply to this post by djhurio
Then perhaps running the two sets of spss syntax might give some insight
into what is being done on the platform you are using.

Art

On 1/4/2011 9:43 AM, Mārtiņš Liberts wrote:

> Art. Data are stored in MS SQL server. I am importing the data in SPSS
> - so it is SPSS related question.
>
> Jon. Data type of ID on SQL server is "uniqueidentifier". It is GUID:
> http://en.wikipedia.org/wiki/Globally_unique_identifier
> Does this help?
>
>
> Martins
>
>
> 2011/1/4 Jon K Peck<[hidden email]>:
>> The first puzzle is what these hex codes represent in the database.
>> Take the first sequence.
>> ED52 is not a valid utf-8 code
>> ED52 as a utf-16 code would be a character in the Unicode private use area
>> (PUA)
>> ED in code page 1257 would be k with cedilla below
>> ED in code page 1252 would be i with acute accent.
>> There are plenty of other character encodings that might be involved.
>>
>> The second puzzle is what the ODBC converter is doing. � If these are really
>> PUA characters, then in SPSS code page mode, they should all have become
>> question marks (code 3F).
>>
>> Can you view the database field as characters?
>>
>> One possibility is that these codes were deliberately chosen as Unicode PUA
>> values in order never to match real characters.
>>
>> Regards,
>>
>> Jon Peck
>> Senior Software Engineer, IBM
>> [hidden email]
>> 312-651-3435
>>
>>
>>
>> From: �  �  �  � Mārtiņš Liberts<[hidden email]>
>> To: �  �  �  � [hidden email]
>> Date: �  �  �  � 01/04/2011 04:30 AM
>> Subject: �  �  �  � Re: [SPSSX-L] AHEX32 as ID in Aggregate
>> Sent by: �  �  �  � "SPSSX(r) Discussion"<[hidden email]>
>> ________________________________
>>
>>
>> It is getting more and more interesting.
>>
>> Jon.
>> I looked on IDs directly on server. For example. The ID on server is
>> "ED527840-F31B-48BC-80CD-E0D95DE1F3FF". It is translated to
>> "407852EFBFBD1BEFBFBD48EFBFBDEFBF" in SPSS (under Unicode).
>>
>> Another example "D92E2998-1AC7-41FC-B9CF-8F725FE6E1C9" (MS SQL) -->
>> "EFBFBD292EEFBFBDEFBFBD1AEFBFBD41" (SPSS).
>>
>> It is OK for my while the translation gives unique ID and I do not
>> have to import data back to server (I could try to do that).
>>
>>
>> Art.
>> I can not run your code on a server because it is MS SQL server.
>>
>>
>> Martins
>>
>>
>>
>>
>> 2011/1/4 Art Kendall<[hidden email]>:
>>> Maybe some of this will help move toward a solution.
>>>
>>> Note that the AHEX format is for standard characters. Many of the
>>> character
>>> pairs seems to be outside the printable range.
>>>
>>> In help search for AHEX . click on the "IMPORT/EXPORT Character Sets"
>>> link.
>>> Note what position 16 (0A) and 255 (FF) mean in the different character
>>> sets.
>>>
>>> perhaps you can the id to a new field with a different format before you
>>> retrieve the data from the data base.
>>>
>>> try the syntax snippets below on the originating and target platforms.
>>> The first shows what most of the standard character are on your system
>>>
>>> Please explain the context in which this arises.
>>> 16 alpha characters is a long id string in many contexts.
>>>
>>> The AHEX format is used to read the hexadecimal representation of standard
>>> characters. Each set of two hexadecimal characters represents one standard
>>> character. For codes used on different operating systems, see
>>> IMPORT/EXPORT
>>> Character Sets.
>>>
>>> • The w specification refers to columns of the hexadecimal representation
>>> and must be an even number. Leading, trailing, and embedded blanks are not
>>> allowed, and only valid hexadecimal characters can be used in input
>>> values.
>>>
>>> • For some operating systems (e.g., IBM CMS), letters in hexadecimal
>>> values
>>> must be upper case.
>>>
>>> • The default output format for variables read with the AHEX input format
>>> is
>>> the A format. The default width is half the specified input width. For
>>> example, an input format of AHEX14 generates an output format of A7.
>>>
>>> • Used as an output format, the AHEX format displays the printable
>>> characters in the hexadecimal characters specific to your system. The
>>> following commands run on a UNIX system--where A=41 (decimal 65), a=61
>>> (decimal 97), and so on--produce the output shown below:
>>>
>>> DATA LIST FIXED
>>> �  � /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>>> FORMATS ALL (AHEX2).
>>> BEGIN DATA
>>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>>> abcdefghijklmnopqrstuvwxyz
>>> END DATA.
>>> LIST.
>>>
>>> DATA LIST FIXED
>>> �  � /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>>> FORMATS ALL (AHEX2).
>>> BEGIN DATA
>>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>>> abcdefghijklmnopqrstuvwxyz
>>> 01234567890.,?+-=!@#$%^&*(
>>> END DATA.
>>> LIST.
>>>
>>> new file.
>>> data list fixed
>>> � /id_ahex1 to id_ahex16 (16ahex2).
>>> begin data
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>> end data.
>>> list.
>>>
>>> Art Kendall
>>> Social Research Consultants
>>>
>>>
>>>
>>> On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
>>>
>>> Hi,
>>>
>>> I have two data files from MS SQL server where variable ID is stored
>>> in AHEX32 format. I have to aggregate one file by ID and match with
>>> another one. The problem is that for some cases IDs are changing after
>>> aggregate. Why some IDs are changing?
>>>
>>> Here is an example:
>>>
>>> ****** Problem with AHEX32.
>>>
>>> data list free
>>> � /id_a (ahex32).
>>> begin data
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>> end data.
>>>
>>> string id_ahex (a16).
>>> comp id_ahex=id_a.
>>>
>>> exe.
>>>
>>> formats id_ahex (ahex32).
>>>
>>> list var=id_ahex.
>>>
>>> agg out *
>>> � /break all
>>> � /n=n.
>>>
>>> list var=id_ahex.
>>>
>>>
>>> ************* Output what I got:
>>>
>>>
>>> id_ahex
>>>
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>>
>>> Number of cases read: � 3 �  � Number of cases listed: � 3
>>>
>>>
>>> id_ahex
>>>
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1FBFF
>>> 756B0AA6B2C43F449B05C9E704ABFAFF
>>>
>>> Number of cases read: � 3 �  � Number of cases listed: � 3
>>>
>>>
>>> We can see that the first ID has not changed but two others are. The
>>> ending of send ID has changed from F3FF to FBFF and similary for the
>>> third one F2FF to FAFF.
>>>
>>>
>>> I am using PASW Statistics 17.0.3
>>>
>>>
>>> Happy New Year!
>>> Martins
>>>
>>>
>>>
>>>
>>> --
>>> Mārtiņš Liberts
>>> +371 26 411 543 (mobile)
>>> +371 67 366 618 (office)
>>> http://djhurio.wordpress.com/
>>> http://twitter.com/djhurio
>>>
>>> Get informed about The Open Document Format if there are files
>>> attached with extensions *.odt, *.ods, *.odp, *.od?
>>> http://en.wikipedia.org/wiki/OpenDocument
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>>
>>
>>
>> --
>> Mārtiņš Liberts
>> +371 26 411 543 (mobile)
>> +371 67 366 618 (office)
>> http://djhurio.wordpress.com/
>> http://twitter.com/djhurio
>>
>> Get informed about The Open Document Format if there are files
>> attached with extensions *.odt, *.ods, *.odp, *.od?
>> http://en.wikipedia.org/wiki/OpenDocument
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
> --
> Mārtiņš Liberts
> +371 26 411 543 (mobile)
> +371 67 366 618 (office)
> http://djhurio.wordpress.com/
> http://twitter.com/djhurio
>
> Get informed about The Open Document Format if there are files
> attached with extensions *.odt, *.ods, *.odp, *.od?
> http://en.wikipedia.org/wiki/OpenDocument
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

djhurio
In reply to this post by Jon K Peck
Jon,

Your solution works very well. I do not see any problems with it:
{fn CONVERT(id,SQL_CHAR)} as id_char
Only it is weird that id_char in a SPSS data file is a string variable
with a length 24000!!! How it is possible?

I have added RTRIM to it:
RTRIM({fn CONVERT(id,SQL_CHAR)}) as id_char
Now it gives id_char in format A255.


Thanks a lot,
Martins


2011/1/4 Jon K Peck <[hidden email]>:

> Ah, that's the answer: GUIDs are not characters, so converting them as such
> does not make sense. � GUIDs are just hex identifiers.
>
> I don't know what the ODBC driver would do in this situation - the behavior
> is probably database specific. � You could, however, transform the GUID field
> into something else in the SQL expression that does the extraction. � For
> example,
>
> { fn CONVERT( thatguid, SQL_CHAR ) }
>
> See http://msdn.microsoft.com/en-us/library/ms715381%28v=VS.85%29.aspx for
> more information. � I don't know which conversions would be valid for a field
> of this type.
>
> If the current behavior is 1-1, this may not matter, but I don't know
> whether that is the case.
>
> HTH,
> Jon Peck
>
> Jon Peck
> Senior Software Engineer, IBM
> [hidden email]
> 312-651-3435
>
>
>
> From: �  �  �  � Mārtiņš Liberts <[hidden email]>
> To: �  �  �  � [hidden email]
> Date: �  �  �  � 01/04/2011 07:46 AM
> Subject: �  �  �  � Re: [SPSSX-L] AHEX32 as ID in Aggregate
> Sent by: �  �  �  � "SPSSX(r) Discussion" <[hidden email]>
> ________________________________
>
>
> Art. Data are stored in MS SQL server. I am importing the data in SPSS
> - so it is SPSS related question.
>
> Jon. Data type of ID on SQL server is "uniqueidentifier". It is GUID:
> http://en.wikipedia.org/wiki/Globally_unique_identifier
> Does this help?
>
>
> Martins
>
>
> 2011/1/4 Jon K Peck <[hidden email]>:
>> The first puzzle is what these hex codes represent in the database.
>> Take the first sequence.
>> ED52 is not a valid utf-8 code
>> ED52 as a utf-16 code would be a character in the Unicode private use area
>> (PUA)
>> ED in code page 1257 would be k with cedilla below
>> ED in code page 1252 would be i with acute accent.
>> There are plenty of other character encodings that might be involved.
>>
>> The second puzzle is what the ODBC converter is doing. � If these are
>> really
>> PUA characters, then in SPSS code page mode, they should all have become
>> question marks (code 3F).
>>
>> Can you view the database field as characters?
>>
>> One possibility is that these codes were deliberately chosen as Unicode
>> PUA
>> values in order never to match real characters.
>>
>> Regards,
>>
>> Jon Peck
>> Senior Software Engineer, IBM
>> [hidden email]
>> 312-651-3435
>>
>>
>>
>> From: �  �  �  � Mārtiņš Liberts <[hidden email]>
>> To: �  �  �  � [hidden email]
>> Date: �  �  �  � 01/04/2011 04:30 AM
>> Subject: �  �  �  � Re: [SPSSX-L] AHEX32 as ID in Aggregate
>> Sent by: �  �  �  � "SPSSX(r) Discussion" <[hidden email]>
>> ________________________________
>>
>>
>> It is getting more and more interesting.
>>
>> Jon.
>> I looked on IDs directly on server. For example. The ID on server is
>> "ED527840-F31B-48BC-80CD-E0D95DE1F3FF". It is translated to
>> "407852EFBFBD1BEFBFBD48EFBFBDEFBF" in SPSS (under Unicode).
>>
>> Another example "D92E2998-1AC7-41FC-B9CF-8F725FE6E1C9" (MS SQL) -->
>> "EFBFBD292EEFBFBDEFBFBD1AEFBFBD41" (SPSS).
>>
>> It is OK for my while the translation gives unique ID and I do not
>> have to import data back to server (I could try to do that).
>>
>>
>> Art.
>> I can not run your code on a server because it is MS SQL server.
>>
>>
>> Martins
>>
>>
>>
>>
>> 2011/1/4 Art Kendall <[hidden email]>:
>>> Maybe some of this will help move toward a solution.
>>>
>>> Note that the AHEX format is for standard characters. Many of the
>>> character
>>> pairs seems to be outside the printable range.
>>>
>>> In help search for AHEX . click on the "IMPORT/EXPORT Character Sets"
>>> link.
>>> Note what position 16 (0A) and 255 (FF) mean in the different character
>>> sets.
>>>
>>> perhaps you can the id to a new field with a different format before you
>>> retrieve the data from the data base.
>>>
>>> try the syntax snippets below on the originating and target platforms.
>>> The first shows what most of the standard character are on your system
>>>
>>> Please explain the context in which this arises.
>>> 16 alpha characters is a long id string in many contexts.
>>>
>>> The AHEX format is used to read the hexadecimal representation of
>>> standard
>>> characters. Each set of two hexadecimal characters represents one
>>> standard
>>> character. For codes used on different operating systems, see
>>> IMPORT/EXPORT
>>> Character Sets.
>>>
>>> • The w specification refers to columns of the hexadecimal representation
>>> and must be an even number. Leading, trailing, and embedded blanks are
>>> not
>>> allowed, and only valid hexadecimal characters can be used in input
>>> values.
>>>
>>> • For some operating systems (e.g., IBM CMS), letters in hexadecimal
>>> values
>>> must be upper case.
>>>
>>> • The default output format for variables read with the AHEX input format
>>> is
>>> the A format. The default width is half the specified input width. For
>>> example, an input format of AHEX14 generates an output format of A7.
>>>
>>> • Used as an output format, the AHEX format displays the printable
>>> characters in the hexadecimal characters specific to your system. The
>>> following commands run on a UNIX system--where A=41 (decimal 65), a=61
>>> (decimal 97), and so on--produce the output shown below:
>>>
>>> DATA LIST FIXED
>>> �  � /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>>> FORMATS ALL (AHEX2).
>>> BEGIN DATA
>>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>>> abcdefghijklmnopqrstuvwxyz
>>> END DATA.
>>> LIST.
>>>
>>> DATA LIST FIXED
>>> �  � /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>>> FORMATS ALL (AHEX2).
>>> BEGIN DATA
>>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>>> abcdefghijklmnopqrstuvwxyz
>>> 01234567890.,?+-=!@#$%^&*(
>>> END DATA.
>>> LIST.
>>>
>>> new file.
>>> data list fixed
>>> � /id_ahex1 to id_ahex16 (16ahex2).
>>> begin data
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>> end data.
>>> list.
>>>
>>> Art Kendall
>>> Social Research Consultants
>>>
>>>
>>>
>>> On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
>>>
>>> Hi,
>>>
>>> I have two data files from MS SQL server where variable ID is stored
>>> in AHEX32 format. I have to aggregate one file by ID and match with
>>> another one. The problem is that for some cases IDs are changing after
>>> aggregate. Why some IDs are changing?
>>>
>>> Here is an example:
>>>
>>> ****** Problem with AHEX32.
>>>
>>> data list free
>>> � /id_a (ahex32).
>>> begin data
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>> end data.
>>>
>>> string id_ahex (a16).
>>> comp id_ahex=id_a.
>>>
>>> exe.
>>>
>>> formats id_ahex (ahex32).
>>>
>>> list var=id_ahex.
>>>
>>> agg out *
>>> � /break all
>>> � /n=n.
>>>
>>> list var=id_ahex.
>>>
>>>
>>> ************* Output what I got:
>>>
>>>
>>> id_ahex
>>>
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>>
>>> Number of cases read: � 3 �  � Number of cases listed: � 3
>>>
>>>
>>> id_ahex
>>>
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1FBFF
>>> 756B0AA6B2C43F449B05C9E704ABFAFF
>>>
>>> Number of cases read: � 3 �  � Number of cases listed: � 3
>>>
>>>
>>> We can see that the first ID has not changed but two others are. The
>>> ending of send ID has changed from F3FF to FBFF and similary for the
>>> third one F2FF to FAFF.
>>>
>>>
>>> I am using PASW Statistics 17.0.3
>>>
>>>
>>> Happy New Year!
>>> Martins
>>>
>>>
>>>
>>>
>>> --
>>> Mārtiņš Liberts
>>> +371 26 411 543 (mobile)
>>> +371 67 366 618 (office)
>>> http://djhurio.wordpress.com/
>>> http://twitter.com/djhurio
>>>
>>> Get informed about The Open Document Format if there are files
>>> attached with extensions *.odt, *.ods, *.odp, *.od?
>>> http://en.wikipedia.org/wiki/OpenDocument
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>>
>>
>>
>>
>> --
>> Mārtiņš Liberts
>> +371 26 411 543 (mobile)
>> +371 67 366 618 (office)
>> http://djhurio.wordpress.com/
>> http://twitter.com/djhurio
>>
>> Get informed about The Open Document Format if there are files
>> attached with extensions *.odt, *.ods, *.odp, *.od?
>> http://en.wikipedia.org/wiki/OpenDocument
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
>
> --
> Mārtiņš Liberts
> +371 26 411 543 (mobile)
> +371 67 366 618 (office)
> http://djhurio.wordpress.com/
> http://twitter.com/djhurio
>
> Get informed about The Open Document Format if there are files
> attached with extensions *.odt, *.ods, *.odp, *.od?
> http://en.wikipedia.org/wiki/OpenDocument
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>



--
Mārtiņš Liberts
+371 26 411 543 (mobile)
+371 67 366 618 (office)
http://djhurio.wordpress.com/
http://twitter.com/djhurio

Get informed about The Open Document Format if there are files
attached with extensions *.odt, *.ods, *.odp, *.od?
http://en.wikipedia.org/wiki/OpenDocument

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AHEX32 as ID in Aggregate

Jon K Peck
Interesting.  That must be an artifact of the CONVERT function.  Maybe it produces a variable length string with no defined maximum length.

Glad that CONVERT is working, though.

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Mārtiņš Liberts <[hidden email]>
To:        [hidden email]
Date:        01/04/2011 10:04 AM
Subject:        Re: [SPSSX-L] AHEX32 as ID in Aggregate
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Jon,

Your solution works very well. I do not see any problems with it:
{fn CONVERT(id,SQL_CHAR)} as id_char
Only it is weird that id_char in a SPSS data file is a string variable
with a length 24000!!! How it is possible?

I have added RTRIM to it:
RTRIM({fn CONVERT(id,SQL_CHAR)}) as id_char
Now it gives id_char in format A255.


Thanks a lot,
Martins


2011/1/4 Jon K Peck <[hidden email]>:
> Ah, that's the answer: GUIDs are not characters, so converting them as such
> does not make sense.  GUIDs are just hex identifiers.
>
> I don't know what the ODBC driver would do in this situation - the behavior
> is probably database specific.  You could, however, transform the GUID field
> into something else in the SQL expression that does the extraction.  For
> example,
>
> { fn CONVERT( thatguid, SQL_CHAR ) }
>
> See
http://msdn.microsoft.com/en-us/library/ms715381%28v=VS.85%29.aspx for
> more information.  I don't know which conversions would be valid for a field
> of this type.
>
> If the current behavior is 1-1, this may not matter, but I don't know
> whether that is the case.
>
> HTH,
> Jon Peck
>
> Jon Peck
> Senior Software Engineer, IBM
> [hidden email]
> 312-651-3435
>
>
>
> From:        Mārtiņš Liberts <[hidden email]>
> To:        [hidden email]
> Date:        01/04/2011 07:46 AM
> Subject:        Re: [SPSSX-L] AHEX32 as ID in Aggregate
> Sent by:        "SPSSX(r) Discussion" <[hidden email]>
> ________________________________
>
>
> Art. Data are stored in MS SQL server. I am importing the data in SPSS
> - so it is SPSS related question.
>
> Jon. Data type of ID on SQL server is "uniqueidentifier". It is GUID:
>
http://en.wikipedia.org/wiki/Globally_unique_identifier
> Does this help?
>
>
> Martins
>
>
> 2011/1/4 Jon K Peck <[hidden email]>:
>> The first puzzle is what these hex codes represent in the database.
>> Take the first sequence.
>> ED52 is not a valid utf-8 code
>> ED52 as a utf-16 code would be a character in the Unicode private use area
>> (PUA)
>> ED in code page 1257 would be k with cedilla below
>> ED in code page 1252 would be i with acute accent.
>> There are plenty of other character encodings that might be involved.
>>
>> The second puzzle is what the ODBC converter is doing.  If these are
>> really
>> PUA characters, then in SPSS code page mode, they should all have become
>> question marks (code 3F).
>>
>> Can you view the database field as characters?
>>
>> One possibility is that these codes were deliberately chosen as Unicode
>> PUA
>> values in order never to match real characters.
>>
>> Regards,
>>
>> Jon Peck
>> Senior Software Engineer, IBM
>> [hidden email]
>> 312-651-3435
>>
>>
>>
>> From:        Mārtiņš Liberts <[hidden email]>
>> To:        [hidden email]
>> Date:        01/04/2011 04:30 AM
>> Subject:        Re: [SPSSX-L] AHEX32 as ID in Aggregate
>> Sent by:        "SPSSX(r) Discussion" <[hidden email]>
>> ________________________________
>>
>>
>> It is getting more and more interesting.
>>
>> Jon.
>> I looked on IDs directly on server. For example. The ID on server is
>> "ED527840-F31B-48BC-80CD-E0D95DE1F3FF". It is translated to
>> "407852EFBFBD1BEFBFBD48EFBFBDEFBF" in SPSS (under Unicode).
>>
>> Another example "D92E2998-1AC7-41FC-B9CF-8F725FE6E1C9" (MS SQL) -->
>> "EFBFBD292EEFBFBDEFBFBD1AEFBFBD41" (SPSS).
>>
>> It is OK for my while the translation gives unique ID and I do not
>> have to import data back to server (I could try to do that).
>>
>>
>> Art.
>> I can not run your code on a server because it is MS SQL server.
>>
>>
>> Martins
>>
>>
>>
>>
>> 2011/1/4 Art Kendall <[hidden email]>:
>>> Maybe some of this will help move toward a solution.
>>>
>>> Note that the AHEX format is for standard characters. Many of the
>>> character
>>> pairs seems to be outside the printable range.
>>>
>>> In help search for AHEX . click on the "IMPORT/EXPORT Character Sets"
>>> link.
>>> Note what position 16 (0A) and 255 (FF) mean in the different character
>>> sets.
>>>
>>> perhaps you can the id to a new field with a different format before you
>>> retrieve the data from the data base.
>>>
>>> try the syntax snippets below on the originating and target platforms.
>>> The first shows what most of the standard character are on your system
>>>
>>> Please explain the context in which this arises.
>>> 16 alpha characters is a long id string in many contexts.
>>>
>>> The AHEX format is used to read the hexadecimal representation of
>>> standard
>>> characters. Each set of two hexadecimal characters represents one
>>> standard
>>> character. For codes used on different operating systems, see
>>> IMPORT/EXPORT
>>> Character Sets.
>>>
>>> • The w specification refers to columns of the hexadecimal representation
>>> and must be an even number. Leading, trailing, and embedded blanks are
>>> not
>>> allowed, and only valid hexadecimal characters can be used in input
>>> values.
>>>
>>> • For some operating systems (e.g., IBM CMS), letters in hexadecimal
>>> values
>>> must be upper case.
>>>
>>> • The default output format for variables read with the AHEX input format
>>> is
>>> the A format. The default width is half the specified input width. For
>>> example, an input format of AHEX14 generates an output format of A7.
>>>
>>> • Used as an output format, the AHEX format displays the printable
>>> characters in the hexadecimal characters specific to your system. The
>>> following commands run on a UNIX system--where A=41 (decimal 65), a=61
>>> (decimal 97), and so on--produce the output shown below:
>>>
>>> DATA LIST FIXED
>>>    /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>>> FORMATS ALL (AHEX2).
>>> BEGIN DATA
>>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>>> abcdefghijklmnopqrstuvwxyz
>>> END DATA.
>>> LIST.
>>>
>>> DATA LIST FIXED
>>>    /A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z 1-26 (A).
>>> FORMATS ALL (AHEX2).
>>> BEGIN DATA
>>> ABCDEFGHIJKLMNOPQRSTUVWXYZ
>>> abcdefghijklmnopqrstuvwxyz
>>> 01234567890.,?+-=!@#$%^&*(
>>> END DATA.
>>> LIST.
>>>
>>> new file.
>>> data list fixed
>>>  /id_ahex1 to id_ahex16 (16ahex2).
>>> begin data
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>> end data.
>>> list.
>>>
>>> Art Kendall
>>> Social Research Consultants
>>>
>>>
>>>
>>> On 12/30/2010 10:29 AM, Mārtiņš Liberts wrote:
>>>
>>> Hi,
>>>
>>> I have two data files from MS SQL server where variable ID is stored
>>> in AHEX32 format. I have to aggregate one file by ID and match with
>>> another one. The problem is that for some cases IDs are changing after
>>> aggregate. Why some IDs are changing?
>>>
>>> Here is an example:
>>>
>>> ****** Problem with AHEX32.
>>>
>>> data list free
>>>  /id_a (ahex32).
>>> begin data
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>> end data.
>>>
>>> string id_ahex (a16).
>>> comp id_ahex=id_a.
>>>
>>> exe.
>>>
>>> formats id_ahex (ahex32).
>>>
>>> list var=id_ahex.
>>>
>>> agg out *
>>>  /break all
>>>  /n=n.
>>>
>>> list var=id_ahex.
>>>
>>>
>>> ************* Output what I got:
>>>
>>>
>>> id_ahex
>>>
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1F3FF
>>> 756B0AA6B2C43F449B05C9E704ABF2FF
>>>
>>> Number of cases read:  3    Number of cases listed:  3
>>>
>>>
>>> id_ahex
>>>
>>> 0AEFF930E6925445A3ADCF60CF3F3F3D
>>> 407852ED1BF3BC4880CDE0D95DE1FBFF
>>> 756B0AA6B2C43F449B05C9E704ABFAFF
>>>
>>> Number of cases read:  3    Number of cases listed:  3
>>>
>>>
>>> We can see that the first ID has not changed but two others are. The
>>> ending of send ID has changed from F3FF to FBFF and similary for the
>>> third one F2FF to FAFF.
>>>
>>>
>>> I am using PASW Statistics 17.0.3
>>>
>>>
>>> Happy New Year!
>>> Martins
>>>
>>>
>>>
>>>
>>> --
>>> Mārtiņš Liberts
>>> +371 26 411 543 (mobile)
>>> +371 67 366 618 (office)
>>>
http://djhurio.wordpress.com/
>>>
http://twitter.com/djhurio
>>>
>>> Get informed about The Open Document Format if there are files
>>> attached with extensions *.odt, *.ods, *.odp, *.od?
>>>
http://en.wikipedia.org/wiki/OpenDocument
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>>
>>
>>
>>
>> --
>> Mārtiņš Liberts
>> +371 26 411 543 (mobile)
>> +371 67 366 618 (office)
>>
http://djhurio.wordpress.com/
>>
http://twitter.com/djhurio
>>
>> Get informed about The Open Document Format if there are files
>> attached with extensions *.odt, *.ods, *.odp, *.od?
>>
http://en.wikipedia.org/wiki/OpenDocument
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
>
>
> --
> Mārtiņš Liberts
> +371 26 411 543 (mobile)
> +371 67 366 618 (office)
>
http://djhurio.wordpress.com/
>
http://twitter.com/djhurio
>
> Get informed about The Open Document Format if there are files
> attached with extensions *.odt, *.ods, *.odp, *.od?
>
http://en.wikipedia.org/wiki/OpenDocument
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>



--
Mārtiņš Liberts
+371 26 411 543 (mobile)
+371 67 366 618 (office)
http://djhurio.wordpress.com/
http://twitter.com/djhurio

Get informed about The Open Document Format if there are files
attached with extensions *.odt, *.ods, *.odp, *.od?
http://en.wikipedia.org/wiki/OpenDocument

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD