Hi,
I'm trying to write strings into a syntax file. It works but there is an issue with encoding, as non ASCII characters are badly decoded: BEGIN PROGRAM PYTHON3. import SpssClient syntax = '* abc ... äöüß.' SpssClient.StartClient() NewSyntaxDoc = SpssClient.NewSyntaxDoc() NewSyntaxDoc.SetAsDesignatedSyntaxDoc() NewSyntaxDoc.SetSyntax(syntax) END PROGRAM. Any ideas how to fix it? I'm using SPSS 25. Thanks, Mario Munich, Germany ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Since you are using Python 3, I expect that you are in Unicode mode. When I ran the code, all characters appeared correctly in the new syntax window. I did this in V27. I can dig up a V25 version to try, but I wouldn't expect any difference. What exactly did you see in the new syntax window? Maybe a screenshot would help. Hi, |
Hi, Jon,
I get this result: * abc ... äöüß. Am Sonntag, 14. Februar 2021, 18:16:27 MEZ hat Jon Peck <[hidden email]> Folgendes geschrieben: Since you are using Python 3, I expect that you are in Unicode mode. When I ran the code, all characters appeared correctly in the new syntax window. I did this in V27. I can dig up a V25 version to try, but I wouldn't expect any difference. What exactly did you see in the new syntax window? Maybe a screenshot would help. On Sun, Feb 14, 2021 at 3:30 AM [hidden email] <[hidden email]> wrote: > Hi, > > I'm trying to write strings into a syntax file. > It works but there is an issue with encoding, as non ASCII characters are badly decoded: > > BEGIN PROGRAM PYTHON3. > import SpssClient > syntax = '* abc ... äöüß.' > SpssClient.StartClient() > NewSyntaxDoc = SpssClient.NewSyntaxDoc() > NewSyntaxDoc.SetAsDesignatedSyntaxDoc() > NewSyntaxDoc.SetSyntax(syntax) > END PROGRAM. > > Any ideas how to fix it? I'm using SPSS 25. > > Thanks, > > Mario > Munich, Germany > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- Jon K Peck [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I see that in V25, this problem occurs, but it is okay in 27. (I didn't try V26). Your string comes out as a sequence of UTF-8 characters being interpreted as a string of single byte characters. For example, the first two bytes are C3 83, which is the UTF-8 encoding of capital A with tilde. As bytes, starting with the first nonascii character, the string is C3 83 C2 A4 C3 83 C2 B6 C3 83 C2 BC C3 83 C5 B8 A with tilde currency sign ... paragraph mark etc so it is being treated as cp 1252 instead of utf-8. The way this is executed is that the begin/end program block, which is in Unicode UTF-16 is sent to the Statistics backend, where it would be converted to utf-8, which is what the backend uses, and then passed to the Python process for execution. In Python3, I would expect the whole program to be converted back to utf-16 (Unicode). I tried several methods to make the literal survive these translations properly, including encoding and decoding and using Python 3 byte arrays, but none of that worked. (Some of them seriously annoyed the Python process.) I don't have time to try this today, but I think it would work to write the text you want to inject into the new window to a file as utf-8 using the UTF-8 Python codec with the codec.encode method; then read it back using codec.decode and inject that object into the new window. That would prevent the text from ever being exposed to the backend process or go through the conversion used to pass to the Python process. It would be a good idea to start the file you write with the comment line you see at the top of a syntax window (the equivalent of As Declared when you open a syntax file in the gui). If the SetSyntax method is broken, though, this might not work. Hi, Jon, |
Thanks for your effort, Jon!
I'll give it a try as soon as I can. Bye for now. --- Am Sonntag, 14. Februar 2021, 23:45:56 MEZ hat Jon Peck <[hidden email]> Folgendes geschrieben: I see that in V25, this problem occurs, but it is okay in 27. (I didn't try V26). Your string comes out as a sequence of UTF-8 characters being interpreted as a string of single byte characters. For example, the first two bytes are C3 83, which is the UTF-8 encoding of capital A with tilde. As bytes, starting with the first nonascii character, the string is C3 83 C2 A4 C3 83 C2 B6 C3 83 C2 BC C3 83 C5 B8 A with tilde currency sign ... paragraph mark etc so it is being treated as cp 1252 instead of utf-8. The way this is executed is that the begin/end program block, which is in Unicode UTF-16 is sent to the Statistics backend, where it would be converted to utf-8, which is what the backend uses, and then passed to the Python process for execution. In Python3, I would expect the whole program to be converted back to utf-16 (Unicode). I tried several methods to make the literal survive these translations properly, including encoding and decoding and using Python 3 byte arrays, but none of that worked. (Some of them seriously annoyed the Python process.) I don't have time to try this today, but I think it would work to write the text you want to inject into the new window to a file as utf-8 using the UTF-8 Python codec with the codec.encode method; then read it back using codec.decode and inject that object into the new window. That would prevent the text from ever being exposed to the backend process or go through the conversion used to pass to the Python process. It would be a good idea to start the file you write with the comment line you see at the top of a syntax window (the equivalent of As Declared when you open a syntax file in the gui). If the SetSyntax method is broken, though, this might not work. On Sun, Feb 14, 2021 at 12:31 PM [hidden email] <[hidden email]> wrote: > Hi, Jon, > > I get this result: > > * abc ... äöüß. > > > > > > > Am Sonntag, 14. Februar 2021, 18:16:27 MEZ hat Jon Peck <[hidden email]> Folgendes geschrieben: > > > > > > Since you are using Python 3, I expect that you are in Unicode mode. When I ran the code, all characters appeared correctly in the new syntax window. I did this in V27. I can dig up a V25 version to try, but I wouldn't expect any difference. > > What exactly did you see in the new syntax window? Maybe a screenshot would help. > > On Sun, Feb 14, 2021 at 3:30 AM [hidden email] <[hidden email]> wrote: >> Hi, >> >> I'm trying to write strings into a syntax file. >> It works but there is an issue with encoding, as non ASCII characters are badly decoded: >> >> BEGIN PROGRAM PYTHON3. >> import SpssClient >> syntax = '* abc ... äöüß.' >> SpssClient.StartClient() >> NewSyntaxDoc = SpssClient.NewSyntaxDoc() >> NewSyntaxDoc.SetAsDesignatedSyntaxDoc() >> NewSyntaxDoc.SetSyntax(syntax) >> END PROGRAM. >> >> Any ideas how to fix it? I'm using SPSS 25. >> >> Thanks, >> >> Mario >> Munich, Germany >> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > Jon K Peck > [hidden email] > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD > -- Jon K Peck [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |