syntax to find any missing values pattern

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

syntax to find any missing values pattern

J McClure
I have PASW 18 grad pack. My data are 50 variables from a survey and are nominal, ordinal and interval with most variables ordinal. I don't have the SPSS program for missing values pattern and would like to know the syntax for finding any nonrandom patterns. (I'm a beginner and so obvious things may escape me).  Thanks, Jan
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: syntax to find any missing values pattern

John F Hall
Jan
 
missing values is a standard command in all SPSS packages.  What precisely do you mean by "missing values pattern".  Did you look at my tutorials yet? 
 
General format
 
missing values <varlist1> (<value list1>)
                            / <varlist2> (<value list2>)
                            etc.
 
The syntax for declaring missing values is just like that in tutorial 2.2.1.5 Specimen answer for homework exercise which goes :
 

missing values  

                         V1408 (8,9)

                        /V1409 v1412 (98,99) .

 
John
----- Original Message -----
Sent: Sunday, August 01, 2010 10:39 PM
Subject: syntax to find any missing values pattern

I have PASW 18 grad pack. My data are 50 variables from a survey and are nominal, ordinal and interval with most variables ordinal. I don't have the SPSS program for missing values pattern and would like to know the syntax for finding any nonrandom patterns. (I'm a beginner and so obvious things may escape me).  Thanks, Jan
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: syntax to find any missing values pattern

lts1
Thanks everyone!
 
 
 
----- Original Message -----
Sent: Sunday, August 01, 2010 6:13 PM
Subject: Re: syntax to find any missing values pattern

Jan
 
missing values is a standard command in all SPSS packages.  What precisely do you mean by "missing values pattern".  Did you look at my tutorials yet? 
 
General format
 
missing values <varlist1> (<value list1>)
                            / <varlist2> (<value list2>)
                            etc.
 
The syntax for declaring missing values is just like that in tutorial 2.2.1.5 Specimen answer for homework exercise which goes :
 

missing values  

                         V1408 (8,9)

                        /V1409 v1412 (98,99) .

 
John
----- Original Message -----
Sent: Sunday, August 01, 2010 10:39 PM
Subject: syntax to find any missing values pattern

I have PASW 18 grad pack. My data are 50 variables from a survey and are nominal, ordinal and interval with most variables ordinal. I don't have the SPSS program for missing values pattern and would like to know the syntax for finding any nonrandom patterns. (I'm a beginner and so obvious things may escape me).  Thanks, Jan
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: syntax to find any missing values pattern

John F Hall
In reply to this post by John F Hall

Jan
 
I'm still not sure what you mean by "pattern" or "random"?
 
You could try counting the number of variables with missing values within any one case:
 
count xx = <v1 to vn> (missing) .
 
but that will only yield something approximating a normal curve (the more things you add together, the closer they get to this).
 
If you run a frequency count on all the variables, you will get a count of cases missing for each one.
 
Have you declared missing values for all variables?  If so, a shorthand trick would be something like:
 
recode <v1 to vn> (missing = 1)(else = 0) into <y1 to yn>.
mult response groups =  (<y1 to yn> (1))
    /freq <y1 to yn>.
 
I just tested this on one of my data sets.
 
title 'Sample run for Jan McClure' .
*generate count of missing values across arbitrary ssetof variables .

count x1 = var126 to var150 var152 to var 155 var157 to var171 (missing) .
freq x1 .
var lab x1 'Total missing values across vars in list' .

x1 Total missing values across vars in list

 

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

0 Valid

861

92.4

92.4

92.4

1 Missing

65

7.0

7.0

99.4

2

5

.5

.5

99.9

14

1

.1

.1

100.0

Total

932

100.0

100.0

 
 
[Not sure where 2 and 14 came from!]
 
* Dichotomise above vars into dummy set
+  x126 to x150 x152 to x155 x157 to x171 .
*Make sure there are no variables missing in the sequence, otherwise x126  . . . x171 won't have
+ the same number of variables as var126 . . . var171 .

recode var126 to var150 var152 to var155 var157 to var171
     (missing = 1)(else = 0) into x126 to x150 x152 to x155 x157 to x171 .
val lab x126 to x171
     0 'Valid' 1 'Missing' .

*Use mult response in dichotomous mode to see how many cases are missing for each var in the list
+ but some high values may be for questions not asked of whole sample .
mult response groups = x2  (x126 to  x150 x152 to x155 x157 x171 (1))
    /freq x2 .

x2 Frequencies

 

Responses

Percent of Cases

N

Percent

x2a

x126

1

.1%

.1%

x127

1

.1%

.1%

x128

1

.1%

.1%

x129

3

.2%

.3%

x130

1

.1%

.1%

x131

1

.1%

.1%

x133

1

.1%

.1%

x136

3

.2%

.3%

x137

1

.1%

.1%

x138

1

.1%

.1%

x139

3

.2%

.3%

x140

1

.1%

.1%

x141

1

.1%

.1%

x142

1

.1%

.1%

x143

1

.1%

.1%

x144

2

.1%

.2%

x145

2

.1%

.2%

x146

34

2.2%

3.9%

x147

1

.1%

.1%

x148

4

.3%

.5%

x149

8

.5%

.9%

x150

17

1.1%

2.0%

x152

203

12.9%

23.4%

x154

24

1.5%

2.8%

x155

841

53.3%

97.0%

x157

6

.4%

.7%

x158

8

.5%

.9%

x159

33

2.1%

3.8%

x160

142

9.0%

16.4%

x161

4

.3%

.5%

x162

106

6.7%

12.2%

x163

7

.4%

.8%

x164

29

1.8%

3.3%

x165

12

.8%

1.4%

x166

4

.3%

.5%

x167

11

.7%

1.3%

x168

20

1.3%

2.3%

x169

9

.6%

1.0%

x170

2

.1%

.2%

x171

27

1.7%

3.1%

Total

1577

100.0%

181.9%

a. Dichotomy group tabulated at value 1.

 
The low counts for many variables derive from respondents giving scores on all or most 0-10 satisfaction scales. 
 
If you want to live dangerously, use the originial vars, but make sure the recode is only temporary, otherwise it will be permanent.
 
temp .
recode var126 to var150 var152 to var155 var157 to var171
 (missing = 1)(else = 0)  .
mult resp groups x3 (var126 to var150 var152 to var155 var157 to var171 (1))
  /freq x3 .
 

x3 Frequencies

 

Responses

Percent of Cases

N

Percent

x3a

var126 QA12A NOISE FROM TRAFFIC OR TRAINS

1

.1%

.1%

var127 QA12B NOISE FROM AEROPLANES

1

.1%

.1%

var128 QA12C NOISE FROM CHILDREN

1

.1%

.1%

var129 QA12D NOISE FROM NEIGHBOURS

3

.2%

.3%

var130 QA12E NOISE FROM INDUSTRY

1

.1%

.1%

var131 QA12F OUTSIDE POLLUTION

1

.1%

.1%

var133 QA12H INSECTS GETTING INTO THE HOUSE

1

.1%

.1%

var136 QA13A KITCHEN

3

.2%

.3%

var137 QA13B NUMBER OF ROOMS

1

.1%

.1%

var138 QA13C SHAPE AND SIZE OF ROOMS

1

.1%

.1%

var139 QA13D KEEPING WARM IN WINTER

3

.2%

.3%

var140 QA13E KEEPING IT CLEAN AND TIDY

1

.1%

.1%

var141 QA13F BATHS OR SHOWERS

1

.1%

.1%

var142 QA13G FREEDOM FROM NOISE

1

.1%

.1%

var143 QA13H FREEDOM FROM DAMP

1

.1%

.1%

var144 QA13I VIEW FROM WINDOWS

2

.1%

.2%

var145 QA13J PRIVACY FROM NEIGHBOURS

2

.1%

.2%

var146 QA13K COST OF RATES,REPAIRS

34

2.2%

3.9%

var147 QA13L STATE OF REPAIR

1

.1%

.1%

var148 QA13M APPEARANCE FROM OUTSIDE

4

.3%

.5%

var149 QA14 OVERALL HOUSE SATISFACTION

8

.5%

.9%

var150 QA15A CHANGE most wanted IN HOUSE

17

1.1%

2.0%

var152 QA15B SATISFACTION WITH HOUSE CHANGE

203

12.9%

23.4%

var154 QA16B ON HOUSE WAITING LIST

24

1.5%

2.8%

var155 QA16C YEARS ON HOUSE WAITING LIST

841

53.3%

97.0%

var157 QA17 ATTACHMENT TO DISTRICT

6

.4%

.7%

var158 QA18A SHOPS

8

.5%

.9%

var159 QA18B BUS AND TRAIN SERVICES

33

2.1%

3.8%

var160 QA18C CONVENIENCE FOR WORK TRAVEL

142

9.0%

16.4%

var161 QA18D CLEAN AIR

4

.3%

.5%

var162 QA18E SCHOOLS

106

6.7%

12.2%

var163 QA18F PARKS

7

.4%

.8%

var164 QA18G PLACES OF ENTERTAINMENT

29

1.8%

3.3%

var165 QA18H FREEDOM FROM CRIME

12

.8%

1.4%

var166 QA18I GENERAL APPEARANCE OF DISTRICT

4

.3%

.5%

var167 QA18J SORT OF PEOPLE LIVING IN DISTRICT

11

.7%

1.3%

var168 QA18K BEING NEAR FAMILY

20

1.3%

2.3%

var169 QA18L BEING NEAR FRIENDS

9

.6%

1.0%

var170 QA19 OVERALL SATISFACTION WITH DISTRICT

2

.1%

.2%

var171 QA20 DESIRE TO MOVE FROM DISTRICT

27

1.7%

3.1%

Total

1577

100.0%

181.9%

a. Dichotomy group tabulated at value 1.

 
Someone else will be better qualified to advise you on random vs non-random patterns, but you could try running tabulations against demographic variables such as gender, agegroup, educational level etc.
 
Don't forget to delete the dummy variables (unless you want to keep them).
 
John
 
 
----- Original Message -----
Sent: Monday, August 02, 2010 1:12 AM
Subject: Re: syntax to find any missing values pattern

I did look at the tutorial but didn't see how to find patterns. In other words, is the missing data random or is there a pattern. Maybe I missed that part.
Jan

On 8/1/2010 3:13 PM, John F Hall wrote:
Jan
 
missing values is a standard command in all SPSS packages.  What precisely do you mean by "missing values pattern".  Did you look at my tutorials yet? 
 
General format
 
missing values <varlist1> (<value list1>)
                            / <varlist2> (<value list2>)
                            etc.
 
The syntax for declaring missing values is just like that in tutorial 2.2.1.5 Specimen answer for homework exercise which goes :
 

missing values  

                         V1408 (8,9)

                        /V1409 v1412 (98,99) .

 
John
----- Original Message -----
Sent: Sunday, August 01, 2010 10:39 PM
Subject: syntax to find any missing values pattern

I have PASW 18 grad pack. My data are 50 variables from a survey and are nominal, ordinal and interval with most variables ordinal. I don't have the SPSS program for missing values pattern and would like to know the syntax for finding any nonrandom patterns. (I'm a beginner and so obvious things may escape me).  Thanks, Jan
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: syntax to find any missing values pattern

Ruben Geert van den Berg
Dear John,
 
I think the question may be whether the missing values comply to the MCAR, MAR or MNAR condition (I've no time to look this up but I think Little and/or Rubin (not RubEn ;-)) invented a test for this. Something like replacing all missings by 0 and valids by 1 and checking whether the resulting correlation matrix deviates statistically significantly from an identity matrix or something. "Does missingness on v1 'say anything' about missingness on v2?" IIRC something like this is present in the MISSING VALUES option I think the OP referred to.
 
Best,

Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: [hidden email]
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com



 

Date: Mon, 2 Aug 2010 07:49:10 +0200
From: [hidden email]
Subject: Re: syntax to find any missing values pattern
To: [hidden email]

Jan
 
I'm still not sure what you mean by "pattern" or "random"?
 
You could try counting the number of variables with missing values within any one case:
 
count xx = <v1 to vn> (missing) .
 
but that will only yield something approximating a normal curve (the more things you add together, the closer they get to this).
 
If you run a frequency count on all the variables, you will get a count of cases missing for each one.
 
Have you declared missing values for all variables?  If so, a shorthand trick would be something like:
 
recode <v1 to vn> (missing = 1)(else = 0) into <y1 to yn>.
mult response groups =  (<y1 to yn> (1))
    /freq <y1 to yn>.
 
I just tested this on one of my data sets.
 
title 'Sample run for Jan McClure' .
*generate count of missing values across arbitrary ssetof variables .

count x1 = var126 to var150 var152 to var 155 var157 to var171 (missing) .
freq x1 .
var lab x1 'Total missing values across vars in list' .

x1 Total missing values across vars in list

$B!! (B

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

0 Valid

861

92.4

92.4

92.4

1 Missing

65

7.0

7.0

99.4

2

5

.5

.5

99.9

14

1

.1

.1

100.0

Total

932

100.0

100.0

$B!! (B
 
[Not sure where 2 and 14 came from!]
 
* Dichotomise above vars into dummy set
+  x126 to x150 x152 to x155 x157 to x171 .
*Make sure there are no variables missing in the sequence, otherwise x126  . . . x171 won't have
+ the same number of variables as var126 . . . var171 .

recode var126 to var150 var152 to var155 var157 to var171
     (missing = 1)(else = 0) into x126 to x150 x152 to x155 x157 to x171 .
val lab x126 to x171
     0 'Valid' 1 'Missing' .

*Use mult response in dichotomous mode to see how many cases are missing for each var in the list
+ but some high values may be for questions not asked of whole sample .
mult response groups = x2  (x126 to  x150 x152 to x155 x157 x171 (1))
    /freq x2 .

x2 Frequencies

$B!! (B

Responses

Percent of Cases

N

Percent

x2a

x126

1

.1%

.1%

x127

1

.1%

.1%

x128

1

.1%

.1%

x129

3

.2%

.3%

x130

1

.1%

.1%

x131

1

.1%

.1%

x133

1

.1%

.1%

x136

3

.2%

.3%

x137

1

.1%

.1%

x138

1

.1%

.1%

x139

3

.2%

.3%

x140

1

.1%

.1%

x141

1

.1%

.1%

x142

1

.1%

.1%

x143

1

.1%

.1%

x144

2

.1%

.2%

x145

2

.1%

.2%

x146

34

2.2%

3.9%

x147

1

.1%

.1%

x148

4

.3%

.5%

x149

8

.5%

.9%

x150

17

1.1%

2.0%

x152

203

12.9%

23.4%

x154

24

1.5%

2.8%

x155

841

53.3%

97.0%

x157

6

.4%

.7%

x158

8

.5%

.9%

x159

33

2.1%

3.8%

x160

142

9.0%

16.4%

x161

4

.3%

.5%

x162

106

6.7%

12.2%

x163

7

.4%

.8%

x164

29

1.8%

3.3%

x165

12

.8%

1.4%

x166

4

.3%

.5%

x167

11

.7%

1.3%

x168

20

1.3%

2.3%

x169

9

.6%

1.0%

x170

2

.1%

.2%

x171

27

1.7%

3.1%

Total

1577

100.0%

181.9%

a. Dichotomy group tabulated at value 1.

 
The low counts for many variables derive from respondents giving scores on all or most 0-10 satisfaction scales. 
 
If you want to live dangerously, use the originial vars, but make sure the recode is only temporary, otherwise it will be permanent.
 
temp .
recode var126 to var150 var152 to var155 var157 to var171
 (missing = 1)(else = 0)  .
mult resp groups x3 (var126 to var150 var152 to var155 var157 to var171 (1))
  /freq x3 .
 

x3 Frequencies

$B!! (B

Responses

Percent of Cases

N

Percent

x3a

var126 QA12A NOISE FROM TRAFFIC OR TRAINS

1

.1%

.1%

var127 QA12B NOISE FROM AEROPLANES

1

.1%

.1%

var128 QA12C NOISE FROM CHILDREN

1

.1%

.1%

var129 QA12D NOISE FROM NEIGHBOURS

3

.2%

.3%

var130 QA12E NOISE FROM INDUSTRY

1

.1%

.1%

var131 QA12F OUTSIDE POLLUTION

1

.1%

.1%

var133 QA12H INSECTS GETTING INTO THE HOUSE

1

.1%

.1%

var136 QA13A KITCHEN

3

.2%

.3%

var137 QA13B NUMBER OF ROOMS

1

.1%

.1%

var138 QA13C SHAPE AND SIZE OF ROOMS

1

.1%

.1%

var139 QA13D KEEPING WARM IN WINTER

3

.2%

.3%

var140 QA13E KEEPING IT CLEAN AND TIDY

1

.1%

.1%

var141 QA13F BATHS OR SHOWERS

1

.1%

.1%

var142 QA13G FREEDOM FROM NOISE

1

.1%

.1%

var143 QA13H FREEDOM FROM DAMP

1

.1%

.1%

var144 QA13I VIEW FROM WINDOWS

2

.1%

.2%

var145 QA13J PRIVACY FROM NEIGHBOURS

2

.1%

.2%

var146 QA13K COST OF RATES,REPAIRS

34

2.2%

3.9%

var147 QA13L STATE OF REPAIR

1

.1%

.1%

var148 QA13M APPEARANCE FROM OUTSIDE

4

.3%

.5%

var149 QA14 OVERALL HOUSE SATISFACTION

8

.5%

.9%

var150 QA15A CHANGE most wanted IN HOUSE

17

1.1%

2.0%

var152 QA15B SATISFACTION WITH HOUSE CHANGE

203

12.9%

23.4%

var154 QA16B ON HOUSE WAITING LIST

24

1.5%

2.8%

var155 QA16C YEARS ON HOUSE WAITING LIST

841

53.3%

97.0%

var157 QA17 ATTACHMENT TO DISTRICT

6

.4%

.7%

var158 QA18A SHOPS

8

.5%

.9%

var159 QA18B BUS AND TRAIN SERVICES

33

2.1%

3.8%

var160 QA18C CONVENIENCE FOR WORK TRAVEL

142

9.0%

16.4%

var161 QA18D CLEAN AIR

4

.3%

.5%

var162 QA18E SCHOOLS

106

6.7%

12.2%

var163 QA18F PARKS

7

.4%

.8%

var164 QA18G PLACES OF ENTERTAINMENT

29

1.8%

3.3%

var165 QA18H FREEDOM FROM CRIME

12

.8%

1.4%

var166 QA18I GENERAL APPEARANCE OF DISTRICT

4

.3%

.5%

var167 QA18J SORT OF PEOPLE LIVING IN DISTRICT

11

.7%

1.3%

var168 QA18K BEING NEAR FAMILY

20

1.3%

2.3%

var169 QA18L BEING NEAR FRIENDS

9

.6%

1.0%

var170 QA19 OVERALL SATISFACTION WITH DISTRICT

2

.1%

.2%

var171 QA20 DESIRE TO MOVE FROM DISTRICT

27

1.7%

3.1%

Total

1577

100.0%

181.9%

a. Dichotomy group tabulated at value 1.

 
Someone else will be better qualified to advise you on random vs non-random patterns, but you could try running tabulations against demographic variables such as gender, agegroup, educational level etc.
 
Don't forget to delete the dummy variables (unless you want to keep them).
 
John
 
 
----- Original Message -----
Sent: Monday, August 02, 2010 1:12 AM
Subject: Re: syntax to find any missing values pattern

I did look at the tutorial but didn't see how to find patterns. In other words, is the missing data random or is there a pattern. Maybe I missed that part.
Jan

On 8/1/2010 3:13 PM, John F Hall wrote:
Jan
 
missing values is a standard command in all SPSS packages.  What precisely do you mean by "missing values pattern".  Did you look at my tutorials yet? 
 
General format
 
missing values <varlist1> (<value list1>)
                            / <varlist2> (<value list2>)
                            etc.
 
The syntax for declaring missing values is just like that in tutorial 2.2.1.5 Specimen answer for homework exercise which goes :
 

missing values  

                         V1408 (8,9)

                        /V1409 v1412 (98,99) .

 
John
----- Original Message -----
Sent: Sunday, August 01, 2010 10:39 PM
Subject: syntax to find any missing values pattern

I have PASW 18 grad pack. My data are 50 variables from a survey and are nominal, ordinal and interval with most variables ordinal. I don't have the SPSS program for missing values pattern and would like to know the syntax for finding any nonrandom patterns. (I'm a beginner and so obvious things may escape me).  Thanks, Jan
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: syntax to find any missing values pattern

Bruce Weaver
Administrator
Ruben van den Berg wrote
Dear John,

I think the question may be whether the missing values comply to the MCAR, MAR or MNAR condition (I've no time to look this up but I think Little and/or Rubin (not RubEn ;-)) invented a test for this. Something like replacing all missings by 0 and valids by 1 and checking whether the resulting correlation matrix deviates statistically significantly from an identity matrix or something. "Does missingness on v1 'say anything' about missingness on v2?" IIRC something like this is present in the MISSING VALUES option I think the OP referred to.
Yes, I read it that way too.  I don't have SPSS on this machine, but here's an example of how to use the MULTIPLE IMPUTATION command to explore the pattern of "missingness".  

MULTIPLE IMPUTATION { variable list here }
  /IMPUTE METHOD=NONE
  /MISSINGSUMMARIES OVERALL VARIABLES (MINPCTMISSING=0) PATTERNS.

IIRC, the MVA command can also be used for this purpose.

HTH.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).