Parsing text fields, a potential encoding issue?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing text fields, a potential encoding issue?

Andy W

Long story short, I have a file that (I believe) has some type of encoding that is preventing me from parsing text fields. I have attached the sav file I am working with to the NABBLE post, although I know that sometimes works haphazardly so I also uploaded the data file, syntax, and original xls spreadsheet the data was generated from in this dropbox link. I can provide more explicit code of how I went from the spreadsheet to the SPSS file if needed (or how the tables were generated if needed).

So attached I have an example utilizing both the attached file, and some tests using the same data reading in the data directly using DATA LIST commands. What is weird is that in the first data list I used the data pasted exactly from the first three records and fields directly in the command, and the if statement failed to recognize "BLOCK". In the second data list set, with me just typing in the data it worked as it should. Note I also tried all of this with and without setting the system encoding to UNICODE. Below I have the syntax pasted, although I have no idea if by the time it is copy-pasted from the browser if the same problem will persist. Also I have copy and pasted the text into notepad++, there doesn't appear to be anything awry with the encoding there either I believe.

dataset close ALL.
output close ALL.
new file.

*tried also setting to UNICODE mode, had no impact.
*SET Unicode = NO.
*SET Unicode = YES.
*FILE HANDLE data /name = "YOUR PATH HERE".
get file = "data\tables_AMIN.sav".
dataset name full_file.

compute flag = 0.
if V1 = "BLOCK " flag = 1.
exe.
*This is the data copy and pasted from the data file, the if statement fails.
data list free ("|") / V1 (A20) V2 (A20) V3 (A20).
begin data
BLOCK | LOT | ADDRESS
2215 | 116 | 5210 BROADWAY
2215 | 116 | 5220 BROADWAY
end data.
dataset name input.
dataset activate input.
compute flag = 0.
if V1 = "BLOCK " flag = 1.
exe.

list ALL.

*This is the data I typed in, the if statement works like it is suppossed to.
data list free ("|") / V1 (A20) V2 (A20) V3 (A20).
begin data
BLOCK | LOT | ADDRESS
2215 | 116 | 5210 BROADWAY
2215 | 116 | 5210 BROADWAY
end data.
dataset name input2.
dataset activate input2.

compute flag = 0.
if V1 = "BLOCK " flag = 1.
exe.

list ALL.

I would appreciate it if someone would just open up the file and run the code to confirm that I am not crazy! My current version of SPSS is 19.0.0.2, and I have tested this on two different Window's machines (one XP and one 7). Any advice is appreciated.

Andy W

tables_AMIN.sav
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Parsing text fields, a potential encoding issue?

Jon K Peck
Reading in  your sav file, although the first case for V1 looks like BLOCK, you can see that if you change the format to AHEX(240), that it is actually BLOCKx0A.  That is BLOCK<LF>.
It's not an encoding problem.

HTH,
Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email]
Date:        06/20/2012 01:55 PM
Subject:        [SPSSX-L] Parsing text fields, a potential encoding issue?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Long story short, I have a file that (I believe) has some type of encoding that is preventing me from parsing text fields. I have attached the sav file I am working with to the NABBLE post, although I know that sometimes works haphazardly so I also uploaded the data file, syntax, and original xls spreadsheet the data was generated from in this dropbox link. I can provide more explicit code of how I went from the spreadsheet to the SPSS file if needed (or how the tables were generated if needed).

So attached I have an example utilizing both the attached file, and some tests using the same data reading in the data directly using DATA LIST commands. What is weird is that in the first data list I used the data pasted exactly from the first three records and fields directly in the command, and the if statement failed to recognize "BLOCK". In the second data list set, with me just typing in the data it worked as it should. Note I also tried all of this with and without setting the system encoding to UNICODE. Below I have the syntax pasted, although I have no idea if by the time it is copy-pasted from the browser if the same problem will persist. Also I have copy and pasted the text into notepad++, there doesn't appear to be anything awry with the encoding there either I believe.

dataset close ALL. output close ALL. new file. *tried also setting to UNICODE mode, had no impact. *SET Unicode = NO. *SET Unicode = YES. *FILE HANDLE data /name = "YOUR PATH HERE". get file = "data\tables_AMIN.sav". dataset name full_file. compute flag = 0. if V1 = "BLOCK " flag = 1. exe. *This is the data copy and pasted from the data file, the if statement fails. data list free ("|") / V1 (A20) V2 (A20) V3 (A20). begin data BLOCK | LOT | ADDRESS 2215 | 116 | 5210 BROADWAY 2215 | 116 | 5220 BROADWAY end data. dataset name input. dataset activate input. compute flag = 0. if V1 = "BLOCK " flag = 1. exe. list ALL. *This is the data I typed in, the if statement works like it is suppossed to. data list free ("|") / V1 (A20) V2 (A20) V3 (A20). begin data BLOCK | LOT | ADDRESS 2215 | 116 | 5210 BROADWAY 2215 | 116 | 5210 BROADWAY end data. dataset name input2. dataset activate input2. compute flag = 0. if V1 = "BLOCK " flag = 1. exe. list ALL.

I would appreciate it if someone would just open up the file and run the code to confirm that I am not crazy! My current version of SPSS is 19.0.0.2, and I have tested this on two different Window's machines (one XP and one 7). Any advice is appreciated.

Andy W

tables_AMIN.sav
Andy W
[hidden email]

http://andrewpwheeler.wordpress.com/


View this message in context: Parsing text fields, a potential encoding issue?
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Parsing text fields, a potential encoding issue?

David Marso
Administrator
In reply to this post by Andy W
Andy,
I opened your data file (.sav) and determined upon QAD analysis that the terminating character in your string fields is *NOT* a space.  It turns out to be a LF character (ASCII 10 rather than ASCII 32 -space-).
HTH: David
--
Quick fix:
DO REPEAT V=V1 TO V8.
COMPUTE #=LENGTH(RTRIM(V)).
IF NUMBER(SUBSTR(V,#,1),PIB)=10 SUBSTR(V,#,1)=" ".
END REPEAT.
if V1 = "BLOCK " flag = 1.
exe.


Andy W wrote
<p>Long story short, I have a file that (I believe) has some type of encoding that is preventing me from parsing text fields. I have attached the sav file I am working with to the NABBLE post, although I know that sometimes works haphazardly so I also uploaded the data file, syntax, and original xls spreadsheet the data was generated from in this dropbox link. I can provide more explicit code of how I went from the spreadsheet to the SPSS file if needed (or how the tables were generated if needed).</P>

<p>So attached I have an example utilizing both the attached file, and some tests using the same data reading in the data directly using <code>DATA LIST</code> commands. What is weird is that in the first data list I used the data pasted exactly from the first three records and fields directly in the command, and the if statement failed to recognize "BLOCK". In the second data list set, with me just typing in the data it worked as it should. Note I also tried all of this with and without setting the system encoding to UNICODE. Below I have the syntax pasted, although I have no idea if by the time it is copy-pasted from the browser if the same problem will persist. Also I have copy and pasted the text into notepad++, there doesn't appear to be anything awry with the encoding there either I believe.</p>

<p><code>
dataset close ALL.</BR>
output close ALL.</BR>
new file.</BR>
</BR>
*tried also setting to UNICODE mode, had no impact.</BR>
*SET Unicode = NO.</BR>
*SET Unicode = YES.</BR>

*FILE HANDLE data /name = "YOUR PATH HERE".</BR>
get file = "data\tables_AMIN.sav".</BR>
dataset name full_file.</BR>
</BR>
compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>

*This is the data copy and pasted from the data file, the if statement fails.</BR>
data list free ("|") / V1 (A20) V2 (A20) V3 (A20).</BR>
begin data</BR>
BLOCK | LOT | ADDRESS</BR>
2215 | 116 | 5210 BROADWAY</BR>
2215 | 116 | 5220 BROADWAY </BR>
end data.</BR>
dataset name input.</BR>
dataset activate input.</BR>

compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>
</BR>
list ALL.</BR>
</BR>
*This is the data I typed in, the if statement works like it is suppossed to.</BR>
data list free ("|") / V1 (A20) V2 (A20) V3 (A20).</BR>
begin data</BR>
BLOCK | LOT | ADDRESS</BR>
2215 | 116 | 5210 BROADWAY</BR>
2215 | 116 | 5210 BROADWAY</BR>
end data.</BR>
dataset name input2.</BR>
dataset activate input2.</BR>
</BR>
compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>
</BR>
list ALL.</BR>
</code></p>

<p>I would appreciate it if someone would just open up the file and run the code to confirm that I am not crazy! My current version of SPSS is 19.0.0.2, and I have tested this on two different Window's machines (one XP and one 7). Any advice is appreciated.</p>

<p>Andy W</p>

tables_AMIN.sav
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Parsing text fields, a potential encoding issue?

Bruce Weaver
Administrator
Another option:

compute flag = index(v1,"BLOCK") GT 0.

This method and David's flag the same 103 records in Andy's file.


David Marso wrote
Andy,
I opened your data file (.sav) and determined upon QAD analysis that the terminating character in your string fields is *NOT* a space.  It turns out to be a LF character (ASCII 10 rather than ASCII 32 -space-).
HTH: David
--
Quick fix:
DO REPEAT V=V1 TO V8.
COMPUTE #=LENGTH(RTRIM(V)).
IF NUMBER(SUBSTR(V,#,1),PIB)=10 SUBSTR(V,#,1)=" ".
END REPEAT.
if V1 = "BLOCK " flag = 1.
exe.


Andy W wrote
<p>Long story short, I have a file that (I believe) has some type of encoding that is preventing me from parsing text fields. I have attached the sav file I am working with to the NABBLE post, although I know that sometimes works haphazardly so I also uploaded the data file, syntax, and original xls spreadsheet the data was generated from in this dropbox link. I can provide more explicit code of how I went from the spreadsheet to the SPSS file if needed (or how the tables were generated if needed).</P>

<p>So attached I have an example utilizing both the attached file, and some tests using the same data reading in the data directly using <code>DATA LIST</code> commands. What is weird is that in the first data list I used the data pasted exactly from the first three records and fields directly in the command, and the if statement failed to recognize "BLOCK". In the second data list set, with me just typing in the data it worked as it should. Note I also tried all of this with and without setting the system encoding to UNICODE. Below I have the syntax pasted, although I have no idea if by the time it is copy-pasted from the browser if the same problem will persist. Also I have copy and pasted the text into notepad++, there doesn't appear to be anything awry with the encoding there either I believe.</p>

<p><code>
dataset close ALL.</BR>
output close ALL.</BR>
new file.</BR>
</BR>
*tried also setting to UNICODE mode, had no impact.</BR>
*SET Unicode = NO.</BR>
*SET Unicode = YES.</BR>

*FILE HANDLE data /name = "YOUR PATH HERE".</BR>
get file = "data\tables_AMIN.sav".</BR>
dataset name full_file.</BR>
</BR>
compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>

*This is the data copy and pasted from the data file, the if statement fails.</BR>
data list free ("|") / V1 (A20) V2 (A20) V3 (A20).</BR>
begin data</BR>
BLOCK | LOT | ADDRESS</BR>
2215 | 116 | 5210 BROADWAY</BR>
2215 | 116 | 5220 BROADWAY </BR>
end data.</BR>
dataset name input.</BR>
dataset activate input.</BR>

compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>
</BR>
list ALL.</BR>
</BR>
*This is the data I typed in, the if statement works like it is suppossed to.</BR>
data list free ("|") / V1 (A20) V2 (A20) V3 (A20).</BR>
begin data</BR>
BLOCK | LOT | ADDRESS</BR>
2215 | 116 | 5210 BROADWAY</BR>
2215 | 116 | 5210 BROADWAY</BR>
end data.</BR>
dataset name input2.</BR>
dataset activate input2.</BR>
</BR>
compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>
</BR>
list ALL.</BR>
</code></p>

<p>I would appreciate it if someone would just open up the file and run the code to confirm that I am not crazy! My current version of SPSS is 19.0.0.2, and I have tested this on two different Window's machines (one XP and one 7). Any advice is appreciated.</p>

<p>Andy W</p>

tables_AMIN.sav
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Parsing text fields, a potential encoding issue?

David Marso
Administrator
Except notice that *EVERY* string in the file has been affected by the appended <LF>.
Best to fix it ASAP.
Aside from that:
One could also use :
COMPUTE flag=SUBSTR(V1,1,5) EQ "BLOCK".
--
Bruce Weaver wrote
Another option:

compute flag = index(v1,"BLOCK") GT 0.

This method and David's flag the same 103 records in Andy's file.


David Marso wrote
Andy,
I opened your data file (.sav) and determined upon QAD analysis that the terminating character in your string fields is *NOT* a space.  It turns out to be a LF character (ASCII 10 rather than ASCII 32 -space-).
HTH: David
--
Quick fix:
DO REPEAT V=V1 TO V8.
COMPUTE #=LENGTH(RTRIM(V)).
IF NUMBER(SUBSTR(V,#,1),PIB)=10 SUBSTR(V,#,1)=" ".
END REPEAT.
if V1 = "BLOCK " flag = 1.
exe.


Andy W wrote
<p>Long story short, I have a file that (I believe) has some type of encoding that is preventing me from parsing text fields. I have attached the sav file I am working with to the NABBLE post, although I know that sometimes works haphazardly so I also uploaded the data file, syntax, and original xls spreadsheet the data was generated from in this dropbox link. I can provide more explicit code of how I went from the spreadsheet to the SPSS file if needed (or how the tables were generated if needed).</P>

<p>So attached I have an example utilizing both the attached file, and some tests using the same data reading in the data directly using <code>DATA LIST</code> commands. What is weird is that in the first data list I used the data pasted exactly from the first three records and fields directly in the command, and the if statement failed to recognize "BLOCK". In the second data list set, with me just typing in the data it worked as it should. Note I also tried all of this with and without setting the system encoding to UNICODE. Below I have the syntax pasted, although I have no idea if by the time it is copy-pasted from the browser if the same problem will persist. Also I have copy and pasted the text into notepad++, there doesn't appear to be anything awry with the encoding there either I believe.</p>

<p><code>
dataset close ALL.</BR>
output close ALL.</BR>
new file.</BR>
</BR>
*tried also setting to UNICODE mode, had no impact.</BR>
*SET Unicode = NO.</BR>
*SET Unicode = YES.</BR>

*FILE HANDLE data /name = "YOUR PATH HERE".</BR>
get file = "data\tables_AMIN.sav".</BR>
dataset name full_file.</BR>
</BR>
compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>

*This is the data copy and pasted from the data file, the if statement fails.</BR>
data list free ("|") / V1 (A20) V2 (A20) V3 (A20).</BR>
begin data</BR>
BLOCK | LOT | ADDRESS</BR>
2215 | 116 | 5210 BROADWAY</BR>
2215 | 116 | 5220 BROADWAY </BR>
end data.</BR>
dataset name input.</BR>
dataset activate input.</BR>

compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>
</BR>
list ALL.</BR>
</BR>
*This is the data I typed in, the if statement works like it is suppossed to.</BR>
data list free ("|") / V1 (A20) V2 (A20) V3 (A20).</BR>
begin data</BR>
BLOCK | LOT | ADDRESS</BR>
2215 | 116 | 5210 BROADWAY</BR>
2215 | 116 | 5210 BROADWAY</BR>
end data.</BR>
dataset name input2.</BR>
dataset activate input2.</BR>
</BR>
compute flag = 0.</BR>
if V1 = "BLOCK " flag = 1.</BR>
exe.</BR>
</BR>
list ALL.</BR>
</code></p>

<p>I would appreciate it if someone would just open up the file and run the code to confirm that I am not crazy! My current version of SPSS is 19.0.0.2, and I have tested this on two different Window's machines (one XP and one 7). Any advice is appreciated.</p>

<p>Andy W</p>

tables_AMIN.sav
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Parsing text fields, a potential encoding issue?

Jon K Peck
Using newer functionality...

match files file=* /keep table all.
do repeat #v = v1 to v10.
compute #v=replace(#v, string(10, pib1),"").
end repeat.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        06/20/2012 03:37 PM
Subject:        Re: [SPSSX-L] Parsing text fields, a potential encoding issue?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Except notice that *EVERY* string in the file has been affected by the
appended <LF>.
Best to fix it ASAP.
Aside from that:
One could also use :
COMPUTE flag=SUBSTR(V1,1,5) EQ "BLOCK".
--

Bruce Weaver wrote
>
> Another option:
>
> compute flag = index(v1,"BLOCK") GT 0.
>
> This method and David's flag the same 103 records in Andy's file.
>
>
>
> David Marso wrote
>>
>> Andy,
>> I opened your data file (.sav) and determined upon QAD analysis that the
>> terminating character in your string fields is *NOT* a space.  It turns
>> out to be a LF character (ASCII 10 rather than ASCII 32 -space-).
>> HTH: David
>> --
>> Quick fix:
>> DO REPEAT V=V1 TO V8.
>> COMPUTE #=LENGTH(RTRIM(V)).
>> IF NUMBER(SUBSTR(V,#,1),PIB)=10 SUBSTR(V,#,1)=" ".
>> END REPEAT.
>> if V1 = "BLOCK " flag = 1.
>> exe.
>>
>>
>>
>> Andy W wrote
>>>
>>> <p>Long story short, I have a file that (I believe) has some type of
>>> encoding that is preventing me from parsing text fields. I have attached
>>> the sav file I am working with to the NABBLE post, although I know that
>>> sometimes works haphazardly so I also uploaded the data file, syntax,
>>> and original xls spreadsheet the data was generated from in
>>>
http://dl.dropbox.com/u/3385251/Nabble_Post.zip this dropbox link . I
>>> can provide more explicit code of how I went from the spreadsheet to the
>>> SPSS file if needed (or how the tables were generated if needed).</P>
>>>
>>> <p>So attached I have an example utilizing both the attached file, and
>>> some tests using the same data reading in the data directly using
>>> <code>DATA LIST</code> commands. What is weird is that in the first data
>>> list I used the data pasted exactly from the first three records and
>>> fields directly in the command, and the if statement failed to recognize
>>> "BLOCK". In the second data list set, with me just typing in the data it
>>> worked as it should. Note I also tried all of this with and without
>>> setting the system encoding to UNICODE. Below I have the syntax pasted,
>>> although I have no idea if by the time it is copy-pasted from the
>>> browser if the same problem will persist. Also I have copy and pasted
>>> the text into notepad++, there doesn't appear to be anything awry with
>>> the encoding there either I believe.</p>
>>>
>>> <p><code>
>>> dataset close ALL.</BR>
>>> output close ALL.</BR>
>>> new file.</BR>
>>> </BR>
>>> *tried also setting to UNICODE mode, had no impact.</BR>
>>> *SET Unicode = NO.</BR>
>>> *SET Unicode = YES.</BR>
>>>
>>> *FILE HANDLE data /name = "YOUR PATH HERE".</BR>
>>> get file = "data\tables_AMIN.sav".</BR>
>>> dataset name full_file.</BR>
>>> </BR>
>>> compute flag = 0.</BR>
>>> if V1 = "BLOCK " flag = 1.</BR>
>>> exe.</BR>
>>>
>>> *This is the data copy and pasted from the data file, the if statement
>>> fails.</BR>
>>> data list free ("|") / V1 (A20) V2 (A20) V3 (A20).</BR>
>>> begin data</BR>
>>> BLOCK       | LOT   | ADDRESS</BR>
>>> 2215        | 116   | 5210 BROADWAY</BR>
>>> 2215        | 116   | 5220 BROADWAY </BR>
>>> end data.</BR>
>>> dataset name input.</BR>
>>> dataset activate input.</BR>
>>>
>>> compute flag = 0.</BR>
>>> if V1 = "BLOCK " flag = 1.</BR>
>>> exe.</BR>
>>> </BR>
>>> list ALL.</BR>
>>> </BR>
>>> *This is the data I typed in, the if statement works like it is
>>> suppossed to.</BR>
>>> data list free ("|") / V1 (A20) V2 (A20) V3 (A20).</BR>
>>> begin data</BR>
>>> BLOCK | LOT | ADDRESS</BR>
>>> 2215 | 116 | 5210 BROADWAY</BR>
>>> 2215 | 116 | 5210 BROADWAY</BR>
>>> end data.</BR>
>>> dataset name input2.</BR>
>>> dataset activate input2.</BR>
>>> </BR>
>>> compute flag = 0.</BR>
>>> if V1 = "BLOCK " flag = 1.</BR>
>>> exe.</BR>
>>> </BR>
>>> list ALL.</BR>
>>> </code></p>
>>>
>>> <p>I would appreciate it if someone would just open up the file and run
>>> the code to confirm that I am not crazy! My current version of SPSS is
>>> 19.0.0.2, and I have tested this on two different Window's machines (one
>>> XP and one 7). Any advice is appreciated.</p>
>>>
>>> <p>Andy W</p>
>>>
>>>
>>>
http://spssx-discussion.1045642.n5.nabble.com/file/n5713724/tables_AMIN.sav
>>> tables_AMIN.sav
>>>
>>
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Parsing-text-fields-a-potential-encoding-issue-tp5713724p5713728.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Parsing text fields, a potential encoding issue?

Andy W

Thank you Jon, David and Bruce. So in the future I should just Alter type mystring (A = AHEX240). (is that how you guys figured it out?) What is QAD analysis?

Apparently if you copy and paste directly from the data editor the line feed is not carried over into the text field (either into the native SPSS syntax editor or other text editors). Some quick experimentation suggests plain text copy and pasting on my windows machine does not carry the line feed (is the line feed character a *NIX thing?).

Andy W

PS: I answer a question question and I get 20 out of office replies to my email, I ask a question and don't get any (which I suppose I shouldn't be complaining about). WTF Nabble?

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Parsing text fields, a potential encoding issue?

David Marso
Administrator
Andy W wrote
<p>Thank you Jon, David and Bruce. So in the future I should just <code>Alter type mystring (A = AHEX240).</code> (is that how you guys figured it out?) What is QAD analysis?</p>

I just figured something funky was happening at the end of the string other than a space and deduced from
COMPUTE fubar=NUMBER(SUBSTR(v1,6,1),PIB).
that indicated said culprit was 10 rather than 32.
Further inquiry revealed *ALL* string data elements were so effected hence my solution to convert the last character of *ALL* strings to " ".

QAD: Quick and Dirty... Gotta love those TLA's

<p>Apparently if you copy and paste directly from the data editor the line feed is not carried over into the text field (either into the native SPSS syntax editor or other text editors). Some quick experimentation suggests plain text copy and pasting on my windows machine does not carry the line feed (is the line feed character a *NIX thing?).</p>

<p>Andy W</p>

<p>PS: I answer a question question and I get 20 out of office replies to my email, I ask a question and don't get any (which I suppose I shouldn't be complaining about). WTF Nabble?<p>

I just mark them as SPAM and the problem eventually goes away!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Parsing text fields, a potential encoding issue?

Jon K Peck
In reply to this post by Andy W
AHEX is just a format, not a type, so you change an An field to an AHEX2n field using the FORMATS command.

Windows uses crlf for a line break while *nix systems use just lf.  Statistics tries to cope with this variation on all platforms, but it might depend on exactly what you did.  Also, different editors will display these sequences differently, so only a hex dump can prove what is really there.



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email]
Date:        06/20/2012 07:15 PM
Subject:        Re: [SPSSX-L] Parsing text fields, a potential encoding issue?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Thank you Jon, David and Bruce. So in the future I should just Alter type mystring (A = AHEX240). (is that how you guys figured it out?) What is QAD analysis?

Apparently if you copy and paste directly from the data editor the line feed is not carried over into the text field (either into the native SPSS syntax editor or other text editors). Some quick experimentation suggests plain text copy and pasting on my windows machine does not carry the line feed (is the line feed character a *NIX thing?).

Andy W

PS: I answer a question question and I get 20 out of office replies to my email, I ask a question and don't get any (which I suppose I shouldn't be complaining about). WTF Nabble?

Andy W
[hidden email]

http://andrewpwheeler.wordpress.com/



View this message in context: Re: Parsing text fields, a potential encoding issue?
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.