SPSSX Discussion

Missing data, with DATA LIST FREE or LIST

Classic

List

Threaded

4 messages Options

Richard Ristow

Missing data, with DATA LIST FREE or LIST

Here's a little data-reading code, from another posting:

DATA LIST LIST
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).

BEGIN DATA
1220 08/17/05 1 2 .
1220 03/09/06 3 2 2
1390 11/09/05 1 1.67 .
1390 02/08/06 1 1.67 0
END DATA.

Notice the system-missing values for ESDiff.

They're read as desired, but by a backwards route: SPSS doesn't
recognize "." as a code for "missing", but as an invalid numeric field.
Since the field is invalid, SPSS makes the result system-missing.

And there are lengthy warnings for every one, until MXWARNS is reached:

DATA LIST LIST SKIP=2
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).

BEGIN DATA
caseID date_emplymnt ES ESmean ESDiff

1220 08/17/05 1 2 .

>Warning # 1111
>A numeric field contained no digits. The result has been set to the
>system-missing value.

>Command line: 414 Current case: 1 Current splitfile group: 1
>Field contents: '.'
>Record number: 3 Starting column: 48 Record length: 48

Does anybody have advice how to read the missing values, without all
the warning messages?

One solution, often suggested, is to replace the '.' fields by '-1', or
some other value that can't occur in real data. When the data has been
read, either declare that value user-missing, or recode it to
system-missing.

I don't like that, very much. It means an extra data-preparation step
preceding SPSS, to change '.' to '-1' globally.
(Or rather, ' . ' or ' .<CR>' to '-1', so you won't change legitimate
decimal points.)

And I think it makes the file less readable. A '.' looks missing. A
'-1' stands out less, visually; and unless you know the project well,
it's hard to be sure that it isn't a data value.

Oliver, Richard

Re: [BULK] Missing data, with DATA LIST FREE or LIST

My apologies in advance if this is not relevant. I'm not sure how this thread started; so I'm not sure how the periods got into the data source in the first place, but Data List is now far more flexible with reading delimited files that contain missing data than in the older versions (prior to SPSS 10, I think), provided there is a consistent delimiter between values:

A simple example:

data list list (",") /var1 var2 var3.
begin data
,12,13
21,,23
31,32,,
,,43
end data.

In this context, spaces as delimiters can be a bit problematic since multiple spaces will be interpreted as multiple missing values.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Tuesday, March 20, 2007 12:15 PM
To: [hidden email]
Subject: [BULK] Missing data, with DATA LIST FREE or LIST
Importance: Low

Here's a little data-reading code, from another posting:

DATA LIST LIST
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).

BEGIN DATA
1220 08/17/05 1 2 .
1220 03/09/06 3 2 2
1390 11/09/05 1 1.67 .
1390 02/08/06 1 1.67 0
END DATA.

Notice the system-missing values for ESDiff.

They're read as desired, but by a backwards route: SPSS doesn't
recognize "." as a code for "missing", but as an invalid numeric field.
Since the field is invalid, SPSS makes the result system-missing.

And there are lengthy warnings for every one, until MXWARNS is reached:

DATA LIST LIST SKIP=2
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).

BEGIN DATA
caseID date_emplymnt ES ESmean ESDiff

1220 08/17/05 1 2 .

>Warning # 1111
>A numeric field contained no digits. The result has been set to the
>system-missing value.

>Command line: 414 Current case: 1 Current splitfile group: 1
>Field contents: '.'
>Record number: 3 Starting column: 48 Record length: 48

Does anybody have advice how to read the missing values, without all
the warning messages?

One solution, often suggested, is to replace the '.' fields by '-1', or
some other value that can't occur in real data. When the data has been
read, either declare that value user-missing, or recode it to
system-missing.

I don't like that, very much. It means an extra data-preparation step
preceding SPSS, to change '.' to '-1' globally.
(Or rather, ' . ' or ' .<CR>' to '-1', so you won't change legitimate
decimal points.)

And I think it makes the file less readable. A '.' looks missing. A
'-1' stands out less, visually; and unless you know the project well,
it's hard to be sure that it isn't a data value.

meljr

Re: Missing data, with DATA LIST FREE or LIST

In reply to this post by Richard Ristow

Try this:

set errors = off.

DATA LIST LIST
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).

meljr

Richard Ristow

Re: Missing data, with DATA LIST FREE or LIST

In reply to this post by Oliver, Richard

At 02:06 PM 3/20/2007, Oliver, Richard wrote:

>I'm not sure how this thread started; so I'm not sure how the periods
>got into the data source in the first place,

Ah, I started the thread. And the periods got in there, in test data
posted to the list. (This is read with SKIP=2 on the DATA LIST):

>BEGIN DATA
> caseID date_emplymnt ES ESmean ESDiff
>
> 1220 08/17/05 1 2 .
> 1220 03/09/06 3 2 2
> 1390 11/09/05 1 1.67 .
> 1390 02/08/06 1 1.67 0
> 1390 05/19/06 3 1.67 2
> 1445 08/15/05 3 2 .
> 1445 12/13/05 1 2 -2
> 1518 11/09/05 1 1.67 .
> 1518 02/08/06 3 1.67 2
> 1518 05/19/06 1 1.67 -2
>END DATA.

It's not too rare too see data like this. It's what LIST output looks
like, so maybe that's how it was generated.

To answer my question, with your answer, there's using an editor that
supports regular expressions, to get into a form like this:

>data list list (",") /var1 var2 var3.
>begin data
>,12,13
>end data.

Or, since a comma is a field delimiter by default, if you're using LIST
OR FREE input, changing periods to commas seems to work. (Use context
matching, so as not to change decimal points to commas.) Thank you!
Demo, after the change; SPSS 15 draft output:

DATA LIST LIST SKIP=2
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).

BEGIN DATA
caseID date_emplymnt ES ESmean ESDiff

1220 08/17/05 1 2 ,
1220 03/09/06 3 2 2
1390 11/09/05 1 1.67 ,
1390 02/08/06 1 1.67 0
1390 05/19/06 3 1.67 2
1445 08/15/05 3 2 ,
1445 12/13/05 1 2 -2
1518 11/09/05 1 1.67 ,
1518 02/08/06 3 1.67 2
1518 05/19/06 1 1.67 -2
END DATA.
FORMATS caseID (N4)
/DateEmpl (ADATE10)
/ES ESmean ESDiff (F3).

LIST.

List
|-----------------------------|---------------------------|
|Output Created |21-MAR-2007 15:33:55 |
|-----------------------------|---------------------------|
caseID DateEmpl ES ESmean ESDiff

1220 08/17/2005 1 2 .
1220 03/09/2006 3 2 2
1390 11/09/2005 1 2 .
1390 02/08/2006 1 2 0
1390 05/19/2006 3 2 2
1445 08/15/2005 3 2 .
1445 12/13/2005 1 2 -2
1518 11/09/2005 1 2 .
1518 02/08/2006 3 2 2
1518 05/19/2006 1 2 -2

Number of cases read: 10 Number of cases listed: 10