Missing data, with DATA LIST FREE or LIST

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Missing data, with DATA LIST FREE or LIST

Richard Ristow
Here's a little data-reading code, from another posting:

DATA LIST LIST
    /caseID (N) DateEmpl (ADATE) ES  ESmean  ESDiff (3F).

BEGIN DATA
     1220        08/17/05        1     2        .
     1220        03/09/06        3     2        2
     1390        11/09/05        1     1.67     .
     1390        02/08/06        1     1.67     0
END DATA.

Notice the system-missing values for ESDiff.

They're read as desired, but by a backwards route: SPSS doesn't
recognize "." as a code for "missing", but as an invalid numeric field.
Since the field is invalid, SPSS makes the result system-missing.

And there are lengthy warnings for every one, until MXWARNS is reached:


DATA LIST LIST SKIP=2
    /caseID (N) DateEmpl (ADATE) ES  ESmean  ESDiff (3F).

BEGIN DATA
     caseID     date_emplymnt    ES  ESmean   ESDiff

     1220        08/17/05        1     2        .

>Warning # 1111
>A numeric field contained no digits.  The result has been set to the
>system-missing value.

>Command line: 414  Current case: 1  Current splitfile group: 1
>Field contents: '.'
>Record number: 3  Starting column: 48  Record length: 48

Does anybody have advice how to read the missing values, without all
the warning messages?

One solution, often suggested, is to replace the '.' fields by '-1', or
some other value that can't occur in real data. When the data has been
read, either declare that value user-missing, or recode it to
system-missing.

I don't like that, very much. It means an extra data-preparation step
preceding SPSS, to change '.' to '-1' globally.
(Or rather, ' . ' or ' .<CR>' to '-1', so you won't change legitimate
decimal points.)

And I think it makes the file less readable. A '.' looks missing. A
'-1' stands out less, visually; and unless you know the project well,
it's hard to be sure that it isn't a data value.
Reply | Threaded
Open this post in threaded view
|

Re: [BULK] Missing data, with DATA LIST FREE or LIST

Oliver, Richard
My apologies in advance if this is not relevant. I'm not sure how this thread started; so I'm not sure how the periods got into the data source in the first place, but Data List is now far more flexible with reading delimited files that contain missing data than in the older versions (prior to SPSS 10, I think), provided there is a consistent delimiter between values:

A simple example:

data list list (",") /var1 var2 var3.
begin data
,12,13
21,,23
31,32,,
,,43
end data.

In this context, spaces as delimiters can be a bit problematic since multiple spaces will be interpreted as multiple missing values.


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Tuesday, March 20, 2007 12:15 PM
To: [hidden email]
Subject: [BULK] Missing data, with DATA LIST FREE or LIST
Importance: Low

Here's a little data-reading code, from another posting:

DATA LIST LIST
    /caseID (N) DateEmpl (ADATE) ES  ESmean  ESDiff (3F).

BEGIN DATA
     1220        08/17/05        1     2        .
     1220        03/09/06        3     2        2
     1390        11/09/05        1     1.67     .
     1390        02/08/06        1     1.67     0
END DATA.

Notice the system-missing values for ESDiff.

They're read as desired, but by a backwards route: SPSS doesn't
recognize "." as a code for "missing", but as an invalid numeric field.
Since the field is invalid, SPSS makes the result system-missing.

And there are lengthy warnings for every one, until MXWARNS is reached:


DATA LIST LIST SKIP=2
    /caseID (N) DateEmpl (ADATE) ES  ESmean  ESDiff (3F).

BEGIN DATA
     caseID     date_emplymnt    ES  ESmean   ESDiff

     1220        08/17/05        1     2        .

>Warning # 1111
>A numeric field contained no digits.  The result has been set to the
>system-missing value.

>Command line: 414  Current case: 1  Current splitfile group: 1
>Field contents: '.'
>Record number: 3  Starting column: 48  Record length: 48

Does anybody have advice how to read the missing values, without all
the warning messages?

One solution, often suggested, is to replace the '.' fields by '-1', or
some other value that can't occur in real data. When the data has been
read, either declare that value user-missing, or recode it to
system-missing.

I don't like that, very much. It means an extra data-preparation step
preceding SPSS, to change '.' to '-1' globally.
(Or rather, ' . ' or ' .<CR>' to '-1', so you won't change legitimate
decimal points.)

And I think it makes the file less readable. A '.' looks missing. A
'-1' stands out less, visually; and unless you know the project well,
it's hard to be sure that it isn't a data value.
Reply | Threaded
Open this post in threaded view
|

Re: Missing data, with DATA LIST FREE or LIST

meljr
In reply to this post by Richard Ristow
Try this:

set errors = off.

DATA LIST LIST
    /caseID (N) DateEmpl (ADATE) ES  ESmean  ESDiff (3F).

meljr

Reply | Threaded
Open this post in threaded view
|

Re: Missing data, with DATA LIST FREE or LIST

Richard Ristow
In reply to this post by Oliver, Richard
At 02:06 PM 3/20/2007, Oliver, Richard wrote:

>I'm not sure how this thread started; so I'm not sure how the periods
>got into the data source in the first place,

Ah, I started the thread. And the periods got in there, in test data
posted to the list. (This is read with SKIP=2 on the DATA LIST):

>BEGIN DATA
>     caseID     date_emplymnt    ES  ESmean   ESDiff
>
>     1220        08/17/05        1     2        .
>     1220        03/09/06        3     2        2
>     1390        11/09/05        1     1.67     .
>     1390        02/08/06        1     1.67     0
>     1390        05/19/06        3     1.67     2
>     1445        08/15/05        3     2        .
>     1445        12/13/05        1     2       -2
>     1518        11/09/05        1     1.67     .
>     1518        02/08/06        3     1.67     2
>     1518        05/19/06        1     1.67    -2
>END DATA.

It's not too rare too see data like this. It's what LIST output looks
like, so maybe that's how it was generated.

To answer my question, with your answer, there's using an editor that
supports regular expressions, to get into a form like this:

>data list list (",") /var1 var2 var3.
>begin data
>,12,13
>end data.

Or, since a comma is a field delimiter by default, if you're using LIST
OR FREE input, changing periods to commas seems to work. (Use context
matching, so as not to change decimal points to commas.) Thank you!
Demo, after the change; SPSS 15 draft output:

DATA LIST LIST SKIP=2
    /caseID (N) DateEmpl (ADATE) ES  ESmean  ESDiff (3F).

BEGIN DATA
     caseID     date_emplymnt    ES  ESmean   ESDiff

     1220        08/17/05        1     2        ,
     1220        03/09/06        3     2        2
     1390        11/09/05        1     1.67     ,
     1390        02/08/06        1     1.67     0
     1390        05/19/06        3     1.67     2
     1445        08/15/05        3     2        ,
     1445        12/13/05        1     2       -2
     1518        11/09/05        1     1.67     ,
     1518        02/08/06        3     1.67     2
     1518        05/19/06        1     1.67    -2
END DATA.
FORMATS caseID             (N4)
        /DateEmpl           (ADATE10)
        /ES  ESmean  ESDiff (F3).

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |21-MAR-2007 15:33:55       |
|-----------------------------|---------------------------|
caseID   DateEmpl  ES ESmean ESDiff

  1220  08/17/2005   1     2      .
  1220  03/09/2006   3     2      2
  1390  11/09/2005   1     2      .
  1390  02/08/2006   1     2      0
  1390  05/19/2006   3     2      2
  1445  08/15/2005   3     2      .
  1445  12/13/2005   1     2     -2
  1518  11/09/2005   1     2      .
  1518  02/08/2006   3     2      2
  1518  05/19/2006   1     2     -2

Number of cases read:  10    Number of cases listed:  10