Running a syntax repeatedly for each record

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Running a syntax repeatedly for each record

Luca Meyer
Dear all,

I am trying to run a syntax like the following for each of the 390 cases in
my dataset:

TEMP.
SELECT IF CASENR=1.
STRING V2 V3 (A200).
RECODE V1 ("text to be recoded"="text recoded") INTO V2.
LIST CASENR.
FREQ V2 V3.

How can I do that without having to rewrite the code 390 times? I am running
SPSS 15 and I have installed Python.

Just in case you wonder, I am using this syntax to spot the record(s)
number(s) carrying one or more "non-printing character" into my dataset. The
LIST CASENR should give me indication about the record(s) containing the
character that causes problems in following SPSS analysis.

Thank you in advance,

Luca

Mr. Luca MEYER
Market research, data analysis & more
HYPERLINK "http://www.lucameyer.com/"www.lucameyer.com - Tel:
+39.339.495.00.21

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
18.26

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

R: Running a syntax repeatedly for each record

Luca Meyer
Hello Gene,

I probably need to be more specific.

By running a FREQ on a variable with at least one non-printing character I
will generate a warning in the output and by visually scrolling down the
whole output I should be able to spot what record(s) contains the anomaly.
In fact, when the record selected does not contain non-printing characters
the corresponding FREQ won't contain warnings. Once I have spotted the
record(s) with the non-printing character(s) I will search for them in the
txt file that I have previously imported in SPSS and try to remove them
before reimporting the file.

I need to find a syntax that helps my to automatically substitute CASENR=1
with CASENR=2, CASENR=3, ..., CASENR=390 in my syntax. Of course before that
I have computed CASENR=$CASENUM.

I hope now it is clearer what I am trying to do. Non-printing characters
have been causing me headheackes from time to time and I would like to solve
this issue once for all.

Thanks,
Luca

-----Messaggio originale-----
Da: Gene Maguin [mailto:[hidden email]]
Inviato: giovedì 1 novembre 2007 18.42
A: 'Luca Meyer'
Oggetto: RE: Running a syntax repeatedly for each record

Luca,

I don't understand at all. How will this syntax segment help you find
nonprinting charcters? I've searched for nonprinting characters and, based
on those experiences, this syntax won't find them. I'm wondering if there
are a number of things that you haven't explained.

TEMP.
SELECT IF CASENR=1.
STRING V2 V3 (A200).
RECODE V1 ("text to be recoded"="text recoded") INTO V2.
LIST CASENR.
FREQ V2 V3.

Second. Why not do this?

STRING V2 V3 (A200).
Do IF CASENR=1.
RECODE V1 ("text to be recoded"="text recoded") INTO V2.
End if.

*  why do this command. Casenr will always have a value of 1.
LIST CASENR.

*  same here for V2.
FREQ V2 V3.


Gene Maguin


No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
18.26


No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
18.26

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: R: Running a syntax repeatedly for each record

Hal 9000
Something like this?:

data list free /V1 (a10).
begin data
name1
name2
name3
name4
name5
end data.
dataset name A window = front.

compute CN = $casenum.
exe.

define @L ()
!do !T = 1 !to 5
temp.
select if CN = !T.
freq V1.
!doend
!enddefine.

@L.

On 11/1/07, Luca Meyer <[hidden email]> wrote:

> Hello Gene,
>
> I probably need to be more specific.
>
> By running a FREQ on a variable with at least one non-printing character I
> will generate a warning in the output and by visually scrolling down the
> whole output I should be able to spot what record(s) contains the anomaly.
> In fact, when the record selected does not contain non-printing characters
> the corresponding FREQ won't contain warnings. Once I have spotted the
> record(s) with the non-printing character(s) I will search for them in the
> txt file that I have previously imported in SPSS and try to remove them
> before reimporting the file.
>
> I need to find a syntax that helps my to automatically substitute CASENR=1
> with CASENR=2, CASENR=3, ..., CASENR=390 in my syntax. Of course before that
> I have computed CASENR=$CASENUM.
>
> I hope now it is clearer what I am trying to do. Non-printing characters
> have been causing me headheackes from time to time and I would like to solve
> this issue once for all.
>
> Thanks,
> Luca
>
> -----Messaggio originale-----
> Da: Gene Maguin [mailto:[hidden email]]
> Inviato: giovedì 1 novembre 2007 18.42
> A: 'Luca Meyer'
> Oggetto: RE: Running a syntax repeatedly for each record
>
> Luca,
>
> I don't understand at all. How will this syntax segment help you find
> nonprinting charcters? I've searched for nonprinting characters and, based
> on those experiences, this syntax won't find them. I'm wondering if there
> are a number of things that you haven't explained.
>
> TEMP.
> SELECT IF CASENR=1.
> STRING V2 V3 (A200).
> RECODE V1 ("text to be recoded"="text recoded") INTO V2.
> LIST CASENR.
> FREQ V2 V3.
>
> Second. Why not do this?
>
> STRING V2 V3 (A200).
> Do IF CASENR=1.
> RECODE V1 ("text to be recoded"="text recoded") INTO V2.
> End if.
>
> *  why do this command. Casenr will always have a value of 1.
> LIST CASENR.
>
> *  same here for V2.
> FREQ V2 V3.
>
>
> Gene Maguin
>
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
> 18.26
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
> 18.26
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Running a syntax repeatedly for each record

Albert-Jan Roskam
In reply to this post by Luca Meyer
Hi Luca,

Did you consider using the CLEAN function in Excel? It
does just what you are looking for. It's under text
tools.

Maybe the following SPSS solution will also work
(untested):

* you have to put all the letters of the alphabet
behind '#x =', plus all printable signs. I was too
lazy to do that ;-)

do repeat #x = 'a','b','c','x','y', 'z' .
if (index(v1,rtrim(lower(#x))) ne 0) nonprint = 1.
end repeat.
recode nonprint (sysmis = 0) (else = copy).
exe.
value labels nonpint 0 'contains nonprintable symbol'.

Cheers!!!
Albert-Jan


--- Luca Meyer <[hidden email]> wrote:

> Dear all,
>
> I am trying to run a syntax like the following for
> each of the 390 cases in
> my dataset:
>
> TEMP.
> SELECT IF CASENR=1.
> STRING V2 V3 (A200).
> RECODE V1 ("text to be recoded"="text recoded") INTO
> V2.
> LIST CASENR.
> FREQ V2 V3.
>
> How can I do that without having to rewrite the code
> 390 times? I am running
> SPSS 15 and I have installed Python.
>
> Just in case you wonder, I am using this syntax to
> spot the record(s)
> number(s) carrying one or more "non-printing
> character" into my dataset. The
> LIST CASENR should give me indication about the
> record(s) containing the
> character that causes problems in following SPSS
> analysis.
>
> Thank you in advance,
>
> Luca
>
> Mr. Luca MEYER
> Market research, data analysis & more
> HYPERLINK
> "http://www.lucameyer.com/"www.lucameyer.com - Tel:
> +39.339.495.00.21
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.503 / Virus Database: 269.15.14/1100 -
> Release Date: 30/10/2007
> 18.26
>
> =====================
> To manage your subscription to SPSSX-L, send a
> message to
> [hidden email] (not to SPSSX-L), with no
> body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send
> the command
> INFO REFCARD
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: R: Running a syntax repeatedly for each record

Jerabek Jindrich
In reply to this post by Hal 9000
Hello,

Sorry, I doubt the syntax can do anything useful

> > TEMP.
*Makes temporary copy of dataset.
> > SELECT IF CASENR=1.
*Drops all cases but for the first one.
> > STRING V2 V3 (A200).
*Creates new string variables.
> > RECODE V1 ("text to be recoded"="text recoded") INTO V2.
*If the first and only record contains "text to be recoded" puts "text recoded" into V2.
> > LIST CASENR.
*Lists casenr (of course 1, no more cases remain).
*And temp dataset is canceled, original dataset is back.
> > FREQ V2 V3.
*Causes error, V2 V3 were in temporary data.

You can recode V1 into V2 with this:
STRING V2  (A200).
RECODE V1 ("text to be recoded"="text recoded") INTO V2.
It works for all cases, SPSS runs commands on all cases in datafile automaticaly, no need to make loop over cases.

Btw there is a chance to get rid of unwonted characters at the beginning and the end of string with:
STRING V2  (A200).
COMPUTE V2 = LTRIM(RTRIM(V1)).

Pls do not hesitate to write more about the problem, non printing characters read from txt file are not usual in my opinion.

Regards
Jindra




> ------------ Původní zpráva ------------
> Od: Hal 9000 <[hidden email]>
> Předmět: Re: R: Running a syntax repeatedly for each record
> Datum: 01.11.2007 21:44:00
> ----------------------------------------
> Something like this?:
>
> data list free /V1 (a10).
> begin data
> name1
> name2
> name3
> name4
> name5
> end data.
> dataset name A window = front.
>
> compute CN = $casenum.
> exe.
>
> define @L ()
> !do !T = 1 !to 5
> temp.
> select if CN = !T.
> freq V1.
> !doend
> !enddefine.
>
> @L.
>
> On 11/1/07, Luca Meyer <[hidden email]> wrote:
> > Hello Gene,
> >
> > I probably need to be more specific.
> >
> > By running a FREQ on a variable with at least one non-printing character I
> > will generate a warning in the output and by visually scrolling down the
> > whole output I should be able to spot what record(s) contains the anomaly.
> > In fact, when the record selected does not contain non-printing characters
> > the corresponding FREQ won't contain warnings. Once I have spotted the
> > record(s) with the non-printing character(s) I will search for them in the
> > txt file that I have previously imported in SPSS and try to remove them
> > before reimporting the file.
> >
> > I need to find a syntax that helps my to automatically substitute CASENR=1
> > with CASENR=2, CASENR=3, ..., CASENR=390 in my syntax. Of course before that
> > I have computed CASENR=$CASENUM.
> >
> > I hope now it is clearer what I am trying to do. Non-printing characters
> > have been causing me headheackes from time to time and I would like to solve
> > this issue once for all.
> >
> > Thanks,
> > Luca
> >
> > -----Messaggio originale-----
> > Da: Gene Maguin [mailto:[hidden email]]
> > Inviato: giovedì 1 novembre 2007 18.42
> > A: 'Luca Meyer'
> > Oggetto: RE: Running a syntax repeatedly for each record
> >
> > Luca,
> >
> > I don't understand at all. How will this syntax segment help you find
> > nonprinting charcters? I've searched for nonprinting characters and, based
> > on those experiences, this syntax won't find them. I'm wondering if there
> > are a number of things that you haven't explained.
> >
> > TEMP.
> > SELECT IF CASENR=1.
> > STRING V2 V3 (A200).
> > RECODE V1 ("text to be recoded"="text recoded") INTO V2.
> > LIST CASENR.
> > FREQ V2 V3.
> >
> > Second. Why not do this?
> >
> > STRING V2 V3 (A200).
> > Do IF CASENR=1.
> > RECODE V1 ("text to be recoded"="text recoded") INTO V2.
> > End if.
> >
> > *  why do this command. Casenr will always have a value of 1.
> > LIST CASENR.
> >
> > *  same here for V2.
> > FREQ V2 V3.
> >
> >
> > Gene Maguin
> >
> >
> > No virus found in this incoming message.
> > Checked by AVG Free Edition.
> > Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
> > 18.26
> >
> >
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
> > 18.26
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: R: Running a syntax repeatedly for each record

Peck, Jon
In reply to this post by Luca Meyer
This is a nice little example for Python regular expressions.  The code below opens a sav file and finds all the string variables in it, if any, and creates new variables having the same name, string length, and variable label but with "_clean" appended to the name (code assumes the name is still legal).   It removes all the nonprinting characters from the new variables.  These are defined as anything with a code value less than blank, which covers everything you are likely to encounter in practice, including
CR, LF, VT, HT, Tab, and FF.  You don't need to tell it any variable names.

I wrote this to work with SPSS 14 and 15.  With SPSS 16 it could be simplified and could write over the values in the existing variables.

All the serious work is in the regular expression part, which is defined as
pattern = re.compile(r"[\000-\037]")
and
cleanvalues.append(re.sub(pattern,"",val))

If you wanted to see where the nonprinting characters were, you could change the replacement character to, say "*" by writing
cleanvalues.append(re.sub(pattern,"*",val))

Those numerical codes are octal.  Sorry about that.

The line
stringvars = spssaux.VariableDict(variableType="string")
gets the string variable definitions.

The line
curs = spssdata.Spssdata(indexes=stringvars.variables, accessType='w')
defines the cursor for accessing the string variables, and the block starting
        for val in case:
passes the data and makes the substitutions.

HTH,
Jon Peck

import spss, spssdata, spssaux, re
from spssdata import vdef

spss.Submit(r"""get file='c:/temp/alittledata.sav'""")
stringvars = spssaux.VariableDict(variableType="string")
nstrvars = len(stringvars)
if not nstrvars:
    print "Dataset has no string variables"
else:
    curs = spssdata.Spssdata(indexes=stringvars.variables, accessType='w')
    for v in stringvars:
        curs.append(vdef(v.VariableName +"_clean", vfmt=("A", v.VariableType), vlabel=v.VariableLabel))
    curs.commitdict()

    pattern = re.compile(r"[\000-\037]") #Cr, LF, TAB, FF, VT, etc < blank
    for case in curs:
        cleanvalues = []
        for val in case:
            cleanvalues.append(re.sub(pattern,"",val))
        curs.casevalues(cleanvalues)
    curs.CClose()
spss.Submit("save outfile='c:/temp/clean.sav'")



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Luca Meyer
Sent: Thursday, November 01, 2007 11:58 AM
To: [hidden email]
Subject: [SPSSX-L] R: Running a syntax repeatedly for each record

Hello Gene,

I probably need to be more specific.

By running a FREQ on a variable with at least one non-printing character I
will generate a warning in the output and by visually scrolling down the
whole output I should be able to spot what record(s) contains the anomaly.
In fact, when the record selected does not contain non-printing characters
the corresponding FREQ won't contain warnings. Once I have spotted the
record(s) with the non-printing character(s) I will search for them in the
txt file that I have previously imported in SPSS and try to remove them
before reimporting the file.

I need to find a syntax that helps my to automatically substitute CASENR=1
with CASENR=2, CASENR=3, ..., CASENR=390 in my syntax. Of course before that
I have computed CASENR=$CASENUM.

I hope now it is clearer what I am trying to do. Non-printing characters
have been causing me headheackes from time to time and I would like to solve
this issue once for all.

Thanks,
Luca

-----Messaggio originale-----
Da: Gene Maguin [mailto:[hidden email]]
Inviato: giovedì 1 novembre 2007 18.42
A: 'Luca Meyer'
Oggetto: RE: Running a syntax repeatedly for each record

Luca,

I don't understand at all. How will this syntax segment help you find
nonprinting charcters? I've searched for nonprinting characters and, based
on those experiences, this syntax won't find them. I'm wondering if there
are a number of things that you haven't explained.

TEMP.
SELECT IF CASENR=1.
STRING V2 V3 (A200).
RECODE V1 ("text to be recoded"="text recoded") INTO V2.
LIST CASENR.
FREQ V2 V3.

Second. Why not do this?

STRING V2 V3 (A200).
Do IF CASENR=1.
RECODE V1 ("text to be recoded"="text recoded") INTO V2.
End if.

*  why do this command. Casenr will always have a value of 1.
LIST CASENR.

*  same here for V2.
FREQ V2 V3.


Gene Maguin


No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
18.26


No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
18.26

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Running a syntax repeatedly for each record

Richard Ristow
In reply to this post by Luca Meyer
At 01:29 PM 11/1/2007, Luca Meyer wrote:

>I am trying to run a syntax like the following for each of the 390
>cases in my dataset:

OK, first: that is precisely what syntax does - run through the cases
in a file, applying the syntax to each.

>TEMP.
>SELECT IF CASENR=1.
>STRING V2 V3 (A200).
>RECODE V1 ("text to be recoded"="text recoded") INTO V2.
>LIST CASENR.
>FREQ V2 V3.

>I am using this syntax to spot the record(s) number(s) carrying one or
>more "non-printing character" into my dataset.

This looks to be an awkward way to go at it.

The usual syntax for this is something like

TEMPORARY.
NUMERIC CASENR (F4)       /* This separate variable */.
COMPUTE CASENR = $CASENUM /* may not be necessary   */.

SELECT IF <record carries a "non-printing character">.
LIST.


To test for presence of undesired characters, you normally need a
string containing the characters you don't want, and then use INDEX.
You build the string from the hex values for the characters. This has
been covered in the list, so you can probably find it in archives; I'm
sorry for being a little too tired to chase it down, tonight.

-Best of luck,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: R: Running a syntax repeatedly for each record

Art Kendall-2
In reply to this post by Peck, Jon
My SPSS is not running so I cannot test this e.g., to see if the else
section is needed.


Python is a great tool.  However it is another layer to learn.
If I recall correctly, SPSS does not have a string function that that
returns a number which is the ASCII value of a character, but it does
have one to return the hex value.


string newvar(200).
loop #i = 1 to 200.
do if number(substr(oldvar,#i,1),PIB1.0) lt 20.
compute substr(newvar,#i,1) = substr(oldvar,#i,1).
else.
compute substr(newvar,#i,1) = ''.
end loop.


Art Kendall
Social Research Consultants

Peck, Jon wrote:

> This is a nice little example for Python regular expressions.  The code below opens a sav file and finds all the string variables in it, if any, and creates new variables having the same name, string length, and variable label but with "_clean" appended to the name (code assumes the name is still legal).   It removes all the nonprinting characters from the new variables.  These are defined as anything with a code value less than blank, which covers everything you are likely to encounter in practice, including
> CR, LF, VT, HT, Tab, and FF.  You don't need to tell it any variable names.
>
> I wrote this to work with SPSS 14 and 15.  With SPSS 16 it could be simplified and could write over the values in the existing variables.
>
> All the serious work is in the regular expression part, which is defined as
> pattern = re.compile(r"[\000-\037]")
> and
> cleanvalues.append(re.sub(pattern,"",val))
>
> If you wanted to see where the nonprinting characters were, you could change the replacement character to, say "*" by writing
> cleanvalues.append(re.sub(pattern,"*",val))
>
> Those numerical codes are octal.  Sorry about that.
>
> The line
> stringvars = spssaux.VariableDict(variableType="string")
> gets the string variable definitions.
>
> The line
> curs = spssdata.Spssdata(indexes=stringvars.variables, accessType='w')
> defines the cursor for accessing the string variables, and the block starting
>         for val in case:
> passes the data and makes the substitutions.
>
> HTH,
> Jon Peck
>
> import spss, spssdata, spssaux, re
> from spssdata import vdef
>
> spss.Submit(r"""get file='c:/temp/alittledata.sav'""")
> stringvars = spssaux.VariableDict(variableType="string")
> nstrvars = len(stringvars)
> if not nstrvars:
>     print "Dataset has no string variables"
> else:
>     curs = spssdata.Spssdata(indexes=stringvars.variables, accessType='w')
>     for v in stringvars:
>         curs.append(vdef(v.VariableName +"_clean", vfmt=("A", v.VariableType), vlabel=v.VariableLabel))
>     curs.commitdict()
>
>     pattern = re.compile(r"[\000-\037]") #Cr, LF, TAB, FF, VT, etc < blank
>     for case in curs:
>         cleanvalues = []
>         for val in case:
>             cleanvalues.append(re.sub(pattern,"",val))
>         curs.casevalues(cleanvalues)
>     curs.CClose()
> spss.Submit("save outfile='c:/temp/clean.sav'")
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Luca Meyer
> Sent: Thursday, November 01, 2007 11:58 AM
> To: [hidden email]
> Subject: [SPSSX-L] R: Running a syntax repeatedly for each record
>
> Hello Gene,
>
> I probably need to be more specific.
>
> By running a FREQ on a variable with at least one non-printing character I
> will generate a warning in the output and by visually scrolling down the
> whole output I should be able to spot what record(s) contains the anomaly.
> In fact, when the record selected does not contain non-printing characters
> the corresponding FREQ won't contain warnings. Once I have spotted the
> record(s) with the non-printing character(s) I will search for them in the
> txt file that I have previously imported in SPSS and try to remove them
> before reimporting the file.
>
> I need to find a syntax that helps my to automatically substitute CASENR=1
> with CASENR=2, CASENR=3, ..., CASENR=390 in my syntax. Of course before that
> I have computed CASENR=$CASENUM.
>
> I hope now it is clearer what I am trying to do. Non-printing characters
> have been causing me headheackes from time to time and I would like to solve
> this issue once for all.
>
> Thanks,
> Luca
>
> -----Messaggio originale-----
> Da: Gene Maguin [mailto:[hidden email]]
> Inviato: giovedì 1 novembre 2007 18.42
> A: 'Luca Meyer'
> Oggetto: RE: Running a syntax repeatedly for each record
>
> Luca,
>
> I don't understand at all. How will this syntax segment help you find
> nonprinting charcters? I've searched for nonprinting characters and, based
> on those experiences, this syntax won't find them. I'm wondering if there
> are a number of things that you haven't explained.
>
> TEMP.
> SELECT IF CASENR=1.
> STRING V2 V3 (A200).
> RECODE V1 ("text to be recoded"="text recoded") INTO V2.
> LIST CASENR.
> FREQ V2 V3.
>
> Second. Why not do this?
>
> STRING V2 V3 (A200).
> Do IF CASENR=1.
> RECODE V1 ("text to be recoded"="text recoded") INTO V2.
> End if.
>
> *  why do this command. Casenr will always have a value of 1.
> LIST CASENR.
>
> *  same here for V2.
> FREQ V2 V3.
>
>
> Gene Maguin
>
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
> 18.26
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
> 18.26
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

R: Running a syntax repeatedly for each record

Luca Meyer
In reply to this post by Albert-Jan Roskam
Hello Albert-Jan,

Thank you for your suggestion. I thought that non-printing charaters come
with the original dataset but the more I dig into this issue the more it's a
kind of mistery for me how non-printing characters come about.

For instance I was working on a file and before a simple recode operation I
did FREQ ALL and i did not show any non-printing characters. After that
operation I had some non-printing characters. I am trying to replicate that
condition this morning using the same code but I simply cannot get the same
results...could it be that something went wrong during the session and I
simply had to restart the pc?

I mean now I am al right because non-printing chars are gone, but I would
like to understand what it causes them so that I can prevent them from
messing up my datasets...

Thanks,
Luca

Mr. Luca MEYER
Market research, data analysis & more
www.lucameyer.com - Tel: +39.339.495.00.21


-----Messaggio originale-----
Da: Albert-jan Roskam [mailto:[hidden email]]
Inviato: giovedì 1 novembre 2007 21.59
A: Luca Meyer; [hidden email]
Oggetto: Re: Running a syntax repeatedly for each record

Hi Luca,

Did you consider using the CLEAN function in Excel? It does just what you
are looking for. It's under text tools.

Maybe the following SPSS solution will also work
(untested):

* you have to put all the letters of the alphabet behind '#x =', plus all
printable signs. I was too lazy to do that ;-)

do repeat #x = 'a','b','c','x','y', 'z' .
if (index(v1,rtrim(lower(#x))) ne 0) nonprint = 1.
end repeat.
recode nonprint (sysmis = 0) (else = copy).
exe.
value labels nonpint 0 'contains nonprintable symbol'.

Cheers!!!
Albert-Jan


--- Luca Meyer <[hidden email]> wrote:

> Dear all,
>
> I am trying to run a syntax like the following for each of the 390
> cases in my dataset:
>
> TEMP.
> SELECT IF CASENR=1.
> STRING V2 V3 (A200).
> RECODE V1 ("text to be recoded"="text recoded") INTO V2.
> LIST CASENR.
> FREQ V2 V3.
>
> How can I do that without having to rewrite the code 390 times? I am
> running SPSS 15 and I have installed Python.
>
> Just in case you wonder, I am using this syntax to spot the record(s)
> number(s) carrying one or more "non-printing character" into my
> dataset. The LIST CASENR should give me indication about the
> record(s) containing the
> character that causes problems in following SPSS analysis.
>
> Thank you in advance,
>
> Luca
>
> Mr. Luca MEYER
> Market research, data analysis & more
> HYPERLINK
> "http://www.lucameyer.com/"www.lucameyer.com - Tel:
> +39.339.495.00.21
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date:
> 30/10/2007
> 18.26
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results
that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.15.14/1100 - Release Date: 30/10/2007
18.26
 

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.15.19/1106 - Release Date: 02/11/2007
21.46
 

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD