Conditional output for a data quality report

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Conditional output for a data quality report

Elliott G. Smith
I have written an SPSS program that scans a data file for various data
problems and generates a data quality report in the SPSS Output window.
In the report, the case number, ID variables, and the value of the
offending variable are printed.

Here's my question. The program, as you can see in the sample output
below, creates output even if there are no data problems. When I'm
testing many variables, this can result in a really long set of output
and make it hard for the reader to pick out the problems. Can anyone
suggest how to modify my syntax so that the output is printed only when
a data problem is detected? In the sample output below, one of the
records in the data are missing an ID variable, but there are no
problems with undocumented values. I would like to avoid printing the
undocumented values lines.


SAMPLE OUTPUT
**  Missing ID variables  **

    Site   CaseNum   FamilyID   Group   Time
     BFD        25         13    BFD3      .

Undocumented values for INTSTAT

    Site   CaseNum   FamilyID   Group   Time    Value



SPSS SYNTAX
*****  This file creates a data quality report card for each research
site *****.

sort cases by SITE FAMILYID TIME.

**  Missing ID variables  **.
title "**  Missing ID variables  **".
compute #case = $casenum.
do if $casenum = 1.
    print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41.
end if.
do if ( site="" or missing(familyID) or missing(time) ).
    print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group (t34,a5)
time (t41,f4).
end if.
execute.

** UNDOC Macro to identify the records that have undocumented values **.
* The first argument is the variable name.
* The second argument is the set of allowed values enclosed in
parentheses and separated by commas.

define !Undoc( !pos !tokens(1) / !pos !enclose('(',')') ).
    title !concat("Undocumented values for ",!1).
    compute #case = $casenum.
    do if $casenum = 1.
        print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41
'Value' 49.
    end if.
    do if ( not any(value(!1),!2 ) ).
        print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group
(t34,a5) time (t41,f4) !1 (t46,f8.0).
    end if.
    execute.
!enddefine.

* Call the macro.
!Undoc INTSTAT (1,2,3,4) .


Thanks for any help you can provide!

Elliott Smith


--
Elliott G. Smith, PhD
Associate Director
National Data Archive on Child Abuse and Neglect at Cornell University
[hidden email] | 607.255.8104 | www.ndacan.cornell.edu

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Conditional output for a data quality report

ViAnn Beadle
Have you looked at the Data Validation module (now called Data Preparation
in SPSS 16). It will do this and much more, presenting the violations in a
nice compact way.

Here's the spec sheet from the SPSS web site:

http://www.spss.com/PDFs/SDP16SPChr.pdf


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Elliott G. Smith
Sent: Thursday, October 25, 2007 10:40 AM
To: [hidden email]
Subject: Conditional output for a data quality report

I have written an SPSS program that scans a data file for various data
problems and generates a data quality report in the SPSS Output window.
In the report, the case number, ID variables, and the value of the
offending variable are printed.

Here's my question. The program, as you can see in the sample output
below, creates output even if there are no data problems. When I'm
testing many variables, this can result in a really long set of output
and make it hard for the reader to pick out the problems. Can anyone
suggest how to modify my syntax so that the output is printed only when
a data problem is detected? In the sample output below, one of the
records in the data are missing an ID variable, but there are no
problems with undocumented values. I would like to avoid printing the
undocumented values lines.


SAMPLE OUTPUT
**  Missing ID variables  **

    Site   CaseNum   FamilyID   Group   Time
     BFD        25         13    BFD3      .

Undocumented values for INTSTAT

    Site   CaseNum   FamilyID   Group   Time    Value



SPSS SYNTAX
*****  This file creates a data quality report card for each research
site *****.

sort cases by SITE FAMILYID TIME.

**  Missing ID variables  **.
title "**  Missing ID variables  **".
compute #case = $casenum.
do if $casenum = 1.
    print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41.
end if.
do if ( site="" or missing(familyID) or missing(time) ).
    print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group (t34,a5)
time (t41,f4).
end if.
execute.

** UNDOC Macro to identify the records that have undocumented values **.
* The first argument is the variable name.
* The second argument is the set of allowed values enclosed in
parentheses and separated by commas.

define !Undoc( !pos !tokens(1) / !pos !enclose('(',')') ).
    title !concat("Undocumented values for ",!1).
    compute #case = $casenum.
    do if $casenum = 1.
        print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41
'Value' 49.
    end if.
    do if ( not any(value(!1),!2 ) ).
        print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group
(t34,a5) time (t41,f4) !1 (t46,f8.0).
    end if.
    execute.
!enddefine.

* Call the macro.
!Undoc INTSTAT (1,2,3,4) .


Thanks for any help you can provide!

Elliott Smith


--
Elliott G. Smith, PhD
Associate Director
National Data Archive on Child Abuse and Neglect at Cornell University
[hidden email] | 607.255.8104 | www.ndacan.cornell.edu

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Conditional output for a data quality report

Elliott G. Smith
Hi ViAnn,

Thanks for the suggestion. Unfortunately, the sites I'm working with
don't have the resources to purchase additional modules and are usually
a version or two behind the latest. I'm trying to keep the syntax
compatible with Base SPSS.

Thanks,
Elliott



ViAnn Beadle wrote:

> Have you looked at the Data Validation module (now called Data Preparation
> in SPSS 16). It will do this and much more, presenting the violations in a
> nice compact way.
>
> Here's the spec sheet from the SPSS web site:
>
> http://www.spss.com/PDFs/SDP16SPChr.pdf
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Elliott G. Smith
> Sent: Thursday, October 25, 2007 10:40 AM
> To: [hidden email]
> Subject: Conditional output for a data quality report
>
> I have written an SPSS program that scans a data file for various data
> problems and generates a data quality report in the SPSS Output window.
> In the report, the case number, ID variables, and the value of the
> offending variable are printed.
>
> Here's my question. The program, as you can see in the sample output
> below, creates output even if there are no data problems. When I'm
> testing many variables, this can result in a really long set of output
> and make it hard for the reader to pick out the problems. Can anyone
> suggest how to modify my syntax so that the output is printed only when
> a data problem is detected? In the sample output below, one of the
> records in the data are missing an ID variable, but there are no
> problems with undocumented values. I would like to avoid printing the
> undocumented values lines.
>
>
> SAMPLE OUTPUT
> **  Missing ID variables  **
>
>     Site   CaseNum   FamilyID   Group   Time
>      BFD        25         13    BFD3      .
>
> Undocumented values for INTSTAT
>
>     Site   CaseNum   FamilyID   Group   Time    Value
>
>
>
> SPSS SYNTAX
> *****  This file creates a data quality report card for each research
> site *****.
>
> sort cases by SITE FAMILYID TIME.
>
> **  Missing ID variables  **.
> title "**  Missing ID variables  **".
> compute #case = $casenum.
> do if $casenum = 1.
>     print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41.
> end if.
> do if ( site="" or missing(familyID) or missing(time) ).
>     print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group (t34,a5)
> time (t41,f4).
> end if.
> execute.
>
> ** UNDOC Macro to identify the records that have undocumented values **.
> * The first argument is the variable name.
> * The second argument is the set of allowed values enclosed in
> parentheses and separated by commas.
>
> define !Undoc( !pos !tokens(1) / !pos !enclose('(',')') ).
>     title !concat("Undocumented values for ",!1).
>     compute #case = $casenum.
>     do if $casenum = 1.
>         print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41
> 'Value' 49.
>     end if.
>     do if ( not any(value(!1),!2 ) ).
>         print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group
> (t34,a5) time (t41,f4) !1 (t46,f8.0).
>     end if.
>     execute.
> !enddefine.
>
> * Call the macro.
> !Undoc INTSTAT (1,2,3,4) .
>
>
> Thanks for any help you can provide!
>
> Elliott Smith
>
>
> --
> Elliott G. Smith, PhD
> Associate Director
> National Data Archive on Child Abuse and Neglect at Cornell University
> [hidden email] | 607.255.8104 | www.ndacan.cornell.edu
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

--
Elliott G. Smith, PhD
Associate Director
National Data Archive on Child Abuse and Neglect at Cornell University
[hidden email] | 607.255.8104 | www.ndacan.cornell.edu

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Conditional output for a data quality report

Richard Ristow
In reply to this post by Elliott G. Smith
At 12:40 PM 10/25/2007, Elliott G. Smith wrote:

>I have written an SPSS program that scans a data file for various data
>problems and generates a data quality report in the SPSS Output
>window. In the report, the case number, ID variables, and the value of
>the offending variable are printed.
>
>The program, as you can see in the sample output below, creates output
>even if there are no data problems. Can anyone suggest how to modify
>my syntax so that the output is printed only when a data problem is
>detected?

You print the header lines with TITLE statements. Instead, use PRINT
statements, and make them conditional. (I'm afraid this logic that's
much cleaner in SAS, using LINK/RETURN to print the headers.)

Something like this, UNTESTED.

define !Undoc( !pos !tokens(1) / !pos !enclose('(',')') ).
    title !concat("Undocumented values for ",!1).

    if $casenum EQ 1 #NeedHdr = 1.
    do if ( not any(value(!1),!2 ) ).
        DO IF #NeedHdr.
           print
           / 'Site'   5 'CaseNum' 12 'FamilyID' 22
             'Group' 33 'Time'    41 'Value'    49.
           COMPUTE   #NeedHdr = 0.
        END IF.
        print / site     (t6,a3)  #case (t14,f5)
                familyid (t22,f8) group (t34,a5)
                time     (t41,f4) !1    (t46,f8.0).
    end if.
    execute.
!enddefine.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Conditional output for a data quality report

Marks, Jim
In reply to this post by Elliott G. Smith
 Elliott:
Can you nest your IF statements-- put the title and header information
within the test for missing/ out of range information?

Not tested (try this with your data):

**  Missing ID variables  **.
compute #case = $casenum.
do if ( site="" or missing(familyID) or missing(time) ).
   DO IF $casenum = 1.
     print RECORDs = 2 /1 "**  Missing ID variables  **"
               /2 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time'
41.
    end if.
    print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41.
end if.
execute.

** UNDOC Macro to identify the records that have undocumented values **.
* The first argument is the variable name.
* The second argument is the set of allowed values enclosed in
parentheses and separated by commas.

define !Undoc( !pos !tokens(1) / !pos !enclose('(',')') ).
**    title !concat("Undocumented values for ",!1).
compute #case = $casenum.

do if ( not any(value(!1),!2 ) ).
    do if $casenum = 1.
        print RECORDS = 2 /1  !concat("Undocumented values for ",!1)
          /2 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41
'Value' 49.
    end if.
        print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group
(t34,a5) time (t41,f4) !1 (t46,f8.0).
    end if.
    execute.
!enddefine.

* Call the macro.
!Undoc INTSTAT (1,2,3,4) .



--jim


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Elliott G. Smith
Sent: Thursday, October 25, 2007 12:19 PM
To: [hidden email]
Subject: Re: Conditional output for a data quality report

Hi ViAnn,

Thanks for the suggestion. Unfortunately, the sites I'm working with
don't have the resources to purchase additional modules and are usually
a version or two behind the latest. I'm trying to keep the syntax
compatible with Base SPSS.

Thanks,
Elliott



ViAnn Beadle wrote:
> Have you looked at the Data Validation module (now called Data
> Preparation in SPSS 16). It will do this and much more, presenting the

> violations in a nice compact way.
>
> Here's the spec sheet from the SPSS web site:
>
> http://www.spss.com/PDFs/SDP16SPChr.pdf
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
> Of Elliott G. Smith
> Sent: Thursday, October 25, 2007 10:40 AM
> To: [hidden email]
> Subject: Conditional output for a data quality report
>
> I have written an SPSS program that scans a data file for various data

> problems and generates a data quality report in the SPSS Output
window.
> In the report, the case number, ID variables, and the value of the
> offending variable are printed.
>
> Here's my question. The program, as you can see in the sample output
> below, creates output even if there are no data problems. When I'm
> testing many variables, this can result in a really long set of output

> and make it hard for the reader to pick out the problems. Can anyone
> suggest how to modify my syntax so that the output is printed only
> when a data problem is detected? In the sample output below, one of
> the records in the data are missing an ID variable, but there are no
> problems with undocumented values. I would like to avoid printing the
> undocumented values lines.
>
>
> SAMPLE OUTPUT
> **  Missing ID variables  **
>
>     Site   CaseNum   FamilyID   Group   Time
>      BFD        25         13    BFD3      .
>
> Undocumented values for INTSTAT
>
>     Site   CaseNum   FamilyID   Group   Time    Value
>
>
>
> SPSS SYNTAX
> *****  This file creates a data quality report card for each research
> site *****.
>
> sort cases by SITE FAMILYID TIME.
>
> **  Missing ID variables  **.
> title "**  Missing ID variables  **".
> compute #case = $casenum.
> do if $casenum = 1.
>     print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time' 41.
> end if.
> do if ( site="" or missing(familyID) or missing(time) ).
>     print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group
> (t34,a5) time (t41,f4).
> end if.
> execute.
>
> ** UNDOC Macro to identify the records that have undocumented values
**.

> * The first argument is the variable name.
> * The second argument is the set of allowed values enclosed in
> parentheses and separated by commas.
>
> define !Undoc( !pos !tokens(1) / !pos !enclose('(',')') ).
>     title !concat("Undocumented values for ",!1).
>     compute #case = $casenum.
>     do if $casenum = 1.
>         print / 'Site' 5 'CaseNum' 12 'FamilyID' 22 'Group' 33 'Time'
> 41 'Value' 49.
>     end if.
>     do if ( not any(value(!1),!2 ) ).
>         print / site (t6,a3) #case (t14,f5) familyid (t22,f8) group
> (t34,a5) time (t41,f4) !1 (t46,f8.0).
>     end if.
>     execute.
> !enddefine.
>
> * Call the macro.
> !Undoc INTSTAT (1,2,3,4) .
>
>
> Thanks for any help you can provide!
>
> Elliott Smith
>
>
> --
> Elliott G. Smith, PhD
> Associate Director
> National Data Archive on Child Abuse and Neglect at Cornell University

> [hidden email] | 607.255.8104 | www.ndacan.cornell.edu
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a

> list of commands to manage subscriptions, send the command INFO
> REFCARD
>
>

--
Elliott G. Smith, PhD
Associate Director
National Data Archive on Child Abuse and Neglect at Cornell University
[hidden email] | 607.255.8104 | www.ndacan.cornell.edu

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD