Non printing character problems--I think

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Non printing character problems--I think

Maguin, Eugene
I have some text variables that I am standardizing. One part of the
standardizing process is to replace these values with spaces:

DO REPEAT X=
   LDRNAMES2 LDRNAMES3 LDRNAMES4 LDRNAMES5 LDRNAMES6.
+  COMPUTE #LEN=CHAR.LENGTH(X).
+  DO IF (#LEN GT 0).
+     COMPUTE X=REPLACE(X,'Old item - use 2.1','                  ').
+     COMPUTE X=REPLACE(X,'Old item - use 3.1','                  ').
+     COMPUTE X=REPLACE(X,'Old item - use 4.1','                  ').
+     COMPUTE X=REPLACE(X,'Old item - use 5.1','                  ').
+     COMPUTE X=REPLACE(X,'Old item - use 6.1','                  ').
+     COMPUTE X=REPLACE(X,'Old item, use 2.1','                 ').
+     COMPUTE X=REPLACE(X,'Old item, use 3.1','                 ').
+     COMPUTE X=REPLACE(X,'Old item, use 4.1','                 ').
+     COMPUTE X=REPLACE(X,'Old item, use 5.1','                 ').
+     COMPUTE X=REPLACE(X,'Old item, use 6.1','                 ').
+  END IF.
END REPEAT.
EXECUTE.

I then run this code to check for other 'problem' characters.

*  CHECK FOR INVALID SEPARATOR CHARACTERS IN NAME FIELDS.
DO REPEAT X=
   EVALUATOR LDRNAMES1 LDRNAMES2 LDRNAMES3 LDRNAMES4 LDRNAMES5 LDRNAMES6
   LDRNAMES7 LDRNAMES8 LDRNAMES9 LDRNAMES10 LDRNAMES11 LDRNAMES2A LDRNAMES3A

   LDRNAMES4A LDRNAMES5A LDRNAMES6A CLEADER1 CLEADER2 PLEADER1 PLEADER2/
   Y=ERR1 TO ERR21.
+  COMPUTE #LEN=CHAR.LENGTH(X).
+  DO IF (#LEN GT 0).
+     COMPUTE Y=#LEN.
+     LOOP #I=1 TO #LEN.
+        COMPUTE ACHAR=UPCASE(SUBSTR(X,#I,1)).
+        IF ((ACHAR GE 'A' AND ACHAR LE 'Z') OR
         ACHAR EQ ',' OR ACHAR EQ ' ') Y=Y-1.
+     END LOOP.
+  END IF.
END REPEAT.

For each variable, I check to see that each character is 'A' thru 'Z' or ','
or ' ' (space).

What I find is that those cases whose values for the variables in the first
code piece have been replaced now are identified as having problem
characters. However, when I print the values and when I examine the values
in the data editor, the values appear to be blanks--as they are supposed to
be. Perhaps the variables are non-printing characters. Several people have
posted in the past about non-printing characters. Using the following code I
checked things.

*  CHECK FOR INVALID SEPARATOR CHARACTERS IN NAME FIELDS.
*  SEEMS TO BE NON PRINTING CHARACTERS. IDENTIFY AND PRINT OUT.
VECTOR BCHAR(4,A1).
COMPUTE #I=0.
DO REPEAT X=
   EVALUATOR LDRNAMES1 LDRNAMES2 LDRNAMES3 LDRNAMES4 LDRNAMES5 LDRNAMES6
   LDRNAMES7 LDRNAMES8 LDRNAMES9 LDRNAMES10 LDRNAMES11 LDRNAMES2A LDRNAMES3A

   LDRNAMES4A LDRNAMES5A LDRNAMES6A CLEADER1 CLEADER2 PLEADER1 PLEADER2/
   Y=ERR1 TO ERR21.
+  COMPUTE #LEN=CHAR.LENGTH(X).
+  DO IF (#LEN GT 0).
+     COMPUTE Y=#LEN.
+     LOOP #I=1 TO #LEN.
+        DO IF ((ACHAR GE 'A' AND ACHAR LE 'Z') OR
         ACHAR EQ ',' OR ACHAR EQ ' ').
+           COMPUTE Y=Y-1.
+        ELSE.
+           COMPUTE #I=#I+1.
+           COMPUTE BCHAR(#I)=ACHAR.
+        END IF.
+     END LOOP.
+  END IF.
END REPEAT.

FREQUENCIES ERR1 TO ERR21.

*  PROBLEMS REMAIN FOR ERR3-ERR6. WHAT ARE THE PROBLEM CHARACTERS.
FORMAT BCHAR1 TO BCHAR4(AHEX).

By changing the format of BCHAR1 to BCHAR4 to AHEX, i get the hex value of
the character. The value is consistently '20', which is the hex equivalent
of 32. The import-export character appendix in the CSR shows 32 to be a
space--which is what it was changed to.

My question is what is happening? Something is not working as intended,
which probably means I don't understand something, but I don't know what it
is.

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Non printing character problems--I think

Richard Ristow
At 02:16 PM 7/5/2011, Gene Maguin wrote:

>I run this code to check for 'problem' characters.

i.e.,

>I check to see that each character is 'A' thru 'Z' [upper or lower
>case], or ',' or ' ' (space).
>
>*  CHECK FOR INVALID SEPARATOR CHARACTERS IN NAME FIELDS.
>DO REPEAT X=
>    EVALUATOR
>    LDRNAMES1  LDRNAMES2  LDRNAMES3  LDRNAMES4  LDRNAMES5
>    LDRNAMES6  LDRNAMES7  LDRNAMES8  LDRNAMES9  LDRNAMES10
>    LDRNAMES11
>    LDRNAMES2A LDRNAMES3A LDRNAMES4A LDRNAMES5A LDRNAMES6A
>    CLEADER1   CLEADER2   PLEADER1   PLEADER2/
>    Y=ERR1 TO ERR21.
>+  COMPUTE #LEN=CHAR.LENGTH(X).
>+  DO IF (#LEN GT 0).
>+     COMPUTE Y=#LEN.
>+     LOOP #I=1 TO #LEN.
>+        COMPUTE ACHAR=UPCASE(SUBSTR(X,#I,1)).
>+        IF ((ACHAR GE 'A' AND ACHAR LE 'Z') OR
>          ACHAR EQ ',' OR ACHAR EQ ' ') Y=Y-1.
>+     END LOOP.
>+  END IF.
>END REPEAT.
>
>What I find is that those cases whose values for the variables have
>been replaced [by blanks] now are identified as having problem
>characters. However, when I print the values and when I examine the
>values in the data editor, the values appear to be blanks--as they
>are supposed to be.

This is slightly odd logic, so let me be sure I understand it:

Variable ERRn (stand-in variable Y) is to hold the number of
erroneous characters in the nth variable (stand-in variable X) of the
variables you are testing. You calculate this by setting the number
of errors initially to the length of the string, and then
decrementing it for each character that is *not* erroneous. (An
alternative, of course, would be to initialize Y to 0, and then
increment it for each error found; I think that would be more
straightforward to read.)

And you get non-zero error counts for variables that are entirely
blank; you don't say what error counts you get. Are they always 1, or
something else? I assume they're always 1; but if they're higher,
that's illuminating. If they aren't always the same value, you may
indeed have a data problem.

I don't have a modern version of SPSS with the CHAR.ff functions, but
if CHAR.LEN(X) is 0 when X is entirely blank, it looks like the
entire calculation of error count should be bypassed, and the error
variable remain system-missing. So by my logic tracing, the anomaly
isn't that you have non-zero error counts, but that you have any
valid numeric values for those counts. The error variables aren't
RETAINed, by any chance?

Have you checked the CHAR.LEN values for those 'blank' variables?
That's the next thing I'd look at -- comparing it, of course, with
the reported error count.

Pending further investigation and testing, this feels to me more like
a logic problem than a strange-character problem, especially as you
aren't finding strange characters.

Good luck with it.

-With curiousity,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Non printing character problems--I think

Maguin, Eugene
Hi Richard,

You saw my retraction message. I was 'looking and fixing', repetitively. So,
I was reusing variables. My real error was not initializing err1 to err21 to
0 when they were reused. So: Oops!

With respect to your questions.

>>Variable ERRn (stand-in variable Y) is to hold the number of
erroneous characters in the nth variable (stand-in variable X) of the
variables you are testing. You calculate this by setting the number
of errors initially to the length of the string, and then
decrementing it for each character that is *not* erroneous. (An
alternative, of course, would be to initialize Y to 0, and then
increment it for each error found; I think that would be more
straightforward to read.)

I pulled this code from another use and the problem was that I needed to
find a number of non-printing and printing, but unwanted, characters. It
seemed easier to list what was acceptable than to list what was
unacceptable. I also think that at the time I did not understand how to use
AHEX or PIBHEX formats, I think would have been the more straightforward
option.


>>And you get non-zero error counts for variables that are entirely
blank; you don't say what error counts you get. Are they always 1, or
something else? I assume they're always 1; but if they're higher,
that's illuminating. If they aren't always the same value, you may
indeed have a data problem.

I would and do get error counts greater than 1. Greater than 1 just means
multiple problem characters.  Originally, the non printing characters were
carriage returns because the data was coming out of a Lotus Notes database
where carriage returns were acceptable.


>>I don't have a modern version of SPSS with the CHAR.ff functions, but
if CHAR.LEN(X) is 0 when X is entirely blank, it looks like the
entire calculation of error count should be bypassed, and the error
variable remain system-missing. So by my logic tracing, the anomaly
isn't that you have non-zero error counts, but that you have any
valid numeric values for those counts. The error variables aren't
RETAINed, by any chance?

Not sure what you mean by 'RETAINed'.


>>Have you checked the CHAR.LEN values for those 'blank' variables?
That's the next thing I'd look at -- comparing it, of course, with
the reported error count.

See the note at the top of the message.

Gene


Pending further investigation and testing, this feels to me more like
a logic problem than a strange-character problem, especially as you
aren't finding strange characters.

Good luck with it.

-With curiousity,
  Richard

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Tuesday, July 05, 2011 3:24 PM
To: Gene Maguin; [hidden email]
Subject: Re: Non printing character problems--I think

At 02:16 PM 7/5/2011, Gene Maguin wrote:

>I run this code to check for 'problem' characters.

i.e.,

>I check to see that each character is 'A' thru 'Z' [upper or lower
>case], or ',' or ' ' (space).
>
>*  CHECK FOR INVALID SEPARATOR CHARACTERS IN NAME FIELDS.
>DO REPEAT X=
>    EVALUATOR
>    LDRNAMES1  LDRNAMES2  LDRNAMES3  LDRNAMES4  LDRNAMES5
>    LDRNAMES6  LDRNAMES7  LDRNAMES8  LDRNAMES9  LDRNAMES10
>    LDRNAMES11
>    LDRNAMES2A LDRNAMES3A LDRNAMES4A LDRNAMES5A LDRNAMES6A
>    CLEADER1   CLEADER2   PLEADER1   PLEADER2/
>    Y=ERR1 TO ERR21.
>+  COMPUTE #LEN=CHAR.LENGTH(X).
>+  DO IF (#LEN GT 0).
>+     COMPUTE Y=#LEN.
>+     LOOP #I=1 TO #LEN.
>+        COMPUTE ACHAR=UPCASE(SUBSTR(X,#I,1)).
>+        IF ((ACHAR GE 'A' AND ACHAR LE 'Z') OR
>          ACHAR EQ ',' OR ACHAR EQ ' ') Y=Y-1.
>+     END LOOP.
>+  END IF.
>END REPEAT.
>
>What I find is that those cases whose values for the variables have
>been replaced [by blanks] now are identified as having problem
>characters. However, when I print the values and when I examine the
>values in the data editor, the values appear to be blanks--as they
>are supposed to be.

This is slightly odd logic, so let me be sure I understand it:

Variable ERRn (stand-in variable Y) is to hold the number of
erroneous characters in the nth variable (stand-in variable X) of the
variables you are testing. You calculate this by setting the number
of errors initially to the length of the string, and then
decrementing it for each character that is *not* erroneous. (An
alternative, of course, would be to initialize Y to 0, and then
increment it for each error found; I think that would be more
straightforward to read.)

And you get non-zero error counts for variables that are entirely
blank; you don't say what error counts you get. Are they always 1, or
something else? I assume they're always 1; but if they're higher,
that's illuminating. If they aren't always the same value, you may
indeed have a data problem.

I don't have a modern version of SPSS with the CHAR.ff functions, but
if CHAR.LEN(X) is 0 when X is entirely blank, it looks like the
entire calculation of error count should be bypassed, and the error
variable remain system-missing. So by my logic tracing, the anomaly
isn't that you have non-zero error counts, but that you have any
valid numeric values for those counts. The error variables aren't
RETAINed, by any chance?

Have you checked the CHAR.LEN values for those 'blank' variables?
That's the next thing I'd look at -- comparing it, of course, with
the reported error count.

Pending further investigation and testing, this feels to me more like
a logic problem than a strange-character problem, especially as you
aren't finding strange characters.

Good luck with it.

-With curiousity,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Non printing character problems--I think

Richard Ristow
At 03:50 PM 7/5/2011, Gene Maguin responded:

>>The error variables aren't RETAINed, by any chance?
>
>Not sure what you mean by 'RETAINed'.

Whoops! I meant "LEAVEd" -- specified on a LEAVE statement, and so
not re-initialized for each case. Unfortunately, SAS had this
capability before SPSS did; and SAS's command for it is "RETAIN"; so
that's the one that sticks in my head.

>>Have you checked the CHAR.LEN values for those 'blank' variables?
>>That's the next thing I'd look at -- comparing it, of course, with
>>the reported error count.
>
>See the note at the top of the message.

You wrote,

>My real error was not initializing err1 to err21 to 0 when they were
>reused. So: Oops!

Yeah, that'll do it -- leave variables with values that seem
inexplicable with the latest logic. Glad you worked it out.

-Best to you,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Non printing character problems--I think

Kelly Vander Ley

I will be out of the office from July 6th-July 20th returning on July 21st.  If you need immediate assistance please call the main office number 503/223-8248 or 800/788-1887 and the receptionist will ensure that I get the message.  Thank you. Kelly