Encoding issue when writing into a new syntax file

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Encoding issue when writing into a new syntax file

spss.giesel@yahoo.de
Hi,

I'm trying to write strings into a syntax file.
It works but there is an issue with encoding, as non ASCII characters are badly decoded:

BEGIN PROGRAM PYTHON3.
import SpssClient
syntax = '* abc ... äöüß.'
SpssClient.StartClient()
NewSyntaxDoc = SpssClient.NewSyntaxDoc()
NewSyntaxDoc.SetAsDesignatedSyntaxDoc() 
NewSyntaxDoc.SetSyntax(syntax)
END PROGRAM.

Any ideas how to fix it? I'm using SPSS 25.

Thanks,

Mario
Munich, Germany

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Encoding issue when writing into a new syntax file

Jon Peck
Since you are using Python 3, I expect that you are in Unicode mode.  When I ran the code, all characters appeared correctly in the new syntax window.  I did this in V27.  I can dig up a V25 version to try, but I wouldn't expect any difference.

What exactly did you see in the new syntax window?  Maybe a screenshot would help.

On Sun, Feb 14, 2021 at 3:30 AM [hidden email] <[hidden email]> wrote:
Hi,

I'm trying to write strings into a syntax file.
It works but there is an issue with encoding, as non ASCII characters are badly decoded:

BEGIN PROGRAM PYTHON3.
import SpssClient
syntax = '* abc ... äöüß.'
SpssClient.StartClient()
NewSyntaxDoc = SpssClient.NewSyntaxDoc()
NewSyntaxDoc.SetAsDesignatedSyntaxDoc() 
NewSyntaxDoc.SetSyntax(syntax)
END PROGRAM.

Any ideas how to fix it? I'm using SPSS 25.

Thanks,

Mario
Munich, Germany

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Encoding issue when writing into a new syntax file

spss.giesel@yahoo.de
Hi, Jon,

I get this result:

* abc ... äöüß.






Am Sonntag, 14. Februar 2021, 18:16:27 MEZ hat Jon Peck <[hidden email]> Folgendes geschrieben:





Since you are using Python 3, I expect that you are in Unicode mode.  When I ran the code, all characters appeared correctly in the new syntax window.  I did this in V27.  I can dig up a V25 version to try, but I wouldn't expect any difference.

What exactly did you see in the new syntax window?  Maybe a screenshot would help.

On Sun, Feb 14, 2021 at 3:30 AM [hidden email] <[hidden email]> wrote:

> Hi,
>
> I'm trying to write strings into a syntax file.
> It works but there is an issue with encoding, as non ASCII characters are badly decoded:
>
> BEGIN PROGRAM PYTHON3.
> import SpssClient
> syntax = '* abc ... äöüß.'
> SpssClient.StartClient()
> NewSyntaxDoc = SpssClient.NewSyntaxDoc()
> NewSyntaxDoc.SetAsDesignatedSyntaxDoc() 
> NewSyntaxDoc.SetSyntax(syntax)
> END PROGRAM.
>
> Any ideas how to fix it? I'm using SPSS 25.
>
> Thanks,
>
> Mario
> Munich, Germany
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Encoding issue when writing into a new syntax file

Jon Peck
I see that in V25, this problem occurs, but it is okay in 27.  (I didn't try V26).


Your string comes out as a sequence of UTF-8 characters being interpreted as a string of single byte characters.  For example, the first two bytes are C3 83, which is the UTF-8 encoding of capital A with tilde.
As  bytes, starting with the first nonascii character, the string is
C3 83 C2 A4  C3 83 C2 B6  C3 83 C2 BC  C3 83 C5 B8
A with tilde
currency sign
...        
paragraph mark
etc
so it is being treated as cp 1252 instead of utf-8.

The way this is executed is that the begin/end program block, which is in Unicode UTF-16 is sent to the Statistics backend, where it would be converted to utf-8, which is what the backend uses, and then passed to the Python process for execution.  In Python3, I would expect the whole program to be converted back to utf-16 (Unicode).  I tried several methods to make the literal survive these translations properly, including encoding and decoding and using Python 3 byte arrays, but none of that worked.  (Some of them seriously annoyed the Python process.)

I don't have time to try this today, but I think it would  work to write the text you want to inject into the new window to a file as utf-8 using the UTF-8 Python codec with the codec.encode method; then read it back using codec.decode and inject that object into the new window.  That would prevent the text from ever being exposed to the backend process or go through the conversion used to pass to the Python process.  It would be a good idea to start the file you write with the comment line you see at the top of a syntax window (the equivalent of As Declared when you open a syntax file in the gui).

If the SetSyntax method is broken, though, this might not work.

On Sun, Feb 14, 2021 at 12:31 PM [hidden email] <[hidden email]> wrote:
Hi, Jon,

I get this result:

* abc ... äöüß.






Am Sonntag, 14. Februar 2021, 18:16:27 MEZ hat Jon Peck <[hidden email]> Folgendes geschrieben:





Since you are using Python 3, I expect that you are in Unicode mode.  When I ran the code, all characters appeared correctly in the new syntax window.  I did this in V27.  I can dig up a V25 version to try, but I wouldn't expect any difference.

What exactly did you see in the new syntax window?  Maybe a screenshot would help.

On Sun, Feb 14, 2021 at 3:30 AM [hidden email] <[hidden email]> wrote:
> Hi,
>
> I'm trying to write strings into a syntax file.
> It works but there is an issue with encoding, as non ASCII characters are badly decoded:
>
> BEGIN PROGRAM PYTHON3.
> import SpssClient
> syntax = '* abc ... äöüß.'
> SpssClient.StartClient()
> NewSyntaxDoc = SpssClient.NewSyntaxDoc()
> NewSyntaxDoc.SetAsDesignatedSyntaxDoc() 
> NewSyntaxDoc.SetSyntax(syntax)
> END PROGRAM.
>
> Any ideas how to fix it? I'm using SPSS 25.
>
> Thanks,
>
> Mario
> Munich, Germany
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Encoding issue when writing into a new syntax file

spss.giesel@yahoo.de
Thanks for your effort, Jon!
I'll give it a try as soon as I can.

Bye for now.
---


Am Sonntag, 14. Februar 2021, 23:45:56 MEZ hat Jon Peck <[hidden email]> Folgendes geschrieben:





I see that in V25, this problem occurs, but it is okay in 27.  (I didn't try V26).


Your string comes out as a sequence of UTF-8 characters being interpreted as a string of single byte characters.  For example, the first two bytes are C3 83, which is the UTF-8 encoding of capital A with tilde.
As  bytes, starting with the first nonascii character, the string is
C3 83 C2 A4  C3 83 C2 B6  C3 83 C2 BC  C3 83 C5 B8
A with tilde
currency sign
...        
paragraph mark
etc
so it is being treated as cp 1252 instead of utf-8.

The way this is executed is that the begin/end program block, which is in Unicode UTF-16 is sent to the Statistics backend, where it would be converted to utf-8, which is what the backend uses, and then passed to the Python process for execution.  In Python3, I would expect the whole program to be converted back to utf-16 (Unicode).  I tried several methods to make the literal survive these translations properly, including encoding and decoding and using Python 3 byte arrays, but none of that worked.  (Some of them seriously annoyed the Python process.)

I don't have time to try this today, but I think it would  work to write the text you want to inject into the new window to a file as utf-8 using the UTF-8 Python codec with the codec.encode method; then read it back using codec.decode and inject that object into the new window.  That would prevent the text from ever being exposed to the backend process or go through the conversion used to pass to the Python process.  It would be a good idea to start the file you write with the comment line you see at the top of a syntax window (the equivalent of As Declared when you open a syntax file in the gui).

If the SetSyntax method is broken, though, this might not work.

On Sun, Feb 14, 2021 at 12:31 PM [hidden email] <[hidden email]> wrote:

> Hi, Jon,
>
> I get this result:
>
> * abc ... äöüß.
>
>
>
>
>
>
> Am Sonntag, 14. Februar 2021, 18:16:27 MEZ hat Jon Peck <[hidden email]> Folgendes geschrieben:
>
>
>
>
>
> Since you are using Python 3, I expect that you are in Unicode mode.  When I ran the code, all characters appeared correctly in the new syntax window.  I did this in V27.  I can dig up a V25 version to try, but I wouldn't expect any difference.
>
> What exactly did you see in the new syntax window?  Maybe a screenshot would help.
>
> On Sun, Feb 14, 2021 at 3:30 AM [hidden email] <[hidden email]> wrote:
>> Hi,
>>
>> I'm trying to write strings into a syntax file.
>> It works but there is an issue with encoding, as non ASCII characters are badly decoded:
>>
>> BEGIN PROGRAM PYTHON3.
>> import SpssClient
>> syntax = '* abc ... äöüß.'
>> SpssClient.StartClient()
>> NewSyntaxDoc = SpssClient.NewSyntaxDoc()
>> NewSyntaxDoc.SetAsDesignatedSyntaxDoc() 
>> NewSyntaxDoc.SetSyntax(syntax)
>> END PROGRAM.
>>
>> Any ideas how to fix it? I'm using SPSS 25.
>>
>> Thanks,
>>
>> Mario
>> Munich, Germany
>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> Jon K Peck
> [hidden email]
>
> ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
>


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD