SPSSX Discussion

How to deal with missing values for date and time variable?

Classic

List

Threaded

1 message

Eero Olli

How to deal with missing values for date and time variable?

Dear list members,

How do you deal with a date and time variables where a few cases have
missing information? I have not managed to figure out how to define
dates or times as missing values.

I have two string variables DateCompleted_string and
TimeCompleted_string for each schema. Some people never complete the
schema. I need to get those people to complete the schemas they have
started. Therefore I need to create lists of these schemas with dates
printed, and to be able to select schemas based on dates.

My strategy for solving this so far has been to create two sets of
variables, the original string variables and two numeric variables
created on the basis of these to be used for selecting cases:
DateCompleted and TimeCompleted.
I also create two dummies Completed_dummy and NOTcompleted_dummy, which
I use to filter cases.

DATA LIST FREE /id (F8.0) DateCompleted_string(A10)
TimeCompeted_string(A5) DateCompleted(EDATE10) TimeCompleted(TIME5).
BEGIN DATA
1 01/01/2001 00:00 01.01.2001 00:00
2 29/06/2009 12:34 29.06.2009 12:34
3 28/06/2009 13:35 28.06.2009 13:35
END DATA.

IF DateCompleted = DATE.DMY(01, 01, 2001) NOTCompleted_dummy = 1.
IF DateCompleted > DATE.DMY(01, 01, 2001) Completed_dummy = 1.

LIST.

* these do not work.
* MISSING VALUES DateCompleted (0, DATE.DMY(01, 01, 2001)).
* MISSING VALUES TimeCompleted (0).

VARIABLE LABELS DateCompleted 'DateCompleted with no missing values'.
VARIABLE LABELS TimeCompleted 'TimeCompleted with no missing values'.

* Remove date from the non-completed cases.
COMPUTE DateCompleted2 = DateCompleted.
IF DateCompleted2 = DATE.DMY(01, 01, 2001) DateCompleted2 = 0.
FORMATS DateCompleted2 (EDATE10).
VARIABLE LABELS DateCompleted2 'DateCompleted with zero as missing'.
EXECUTE.

DESCRIPTIVES
VARIABLES=DateCompleted DateCompleted2 TimeCompleted
/STATISTICS=MEAN STDDEV MIN MAX .

FILTER BY Completed_dummy.
SUMMARIZE
/TABLES=ID DateCompleted DateCompleted2 TimeCompleted
/FORMAT=VALIDLIST NOCASENUM TOTAL
/TITLE='completed'
/MISSING=VARIABLE
/CELLS=NONE .

FILTER BY NOTCompleted_dummy.
SUMMARIZE
/TABLES=ID DateCompleted DateCompleted2 TimeCompleted
/FORMAT=VALIDLIST NOCASENUM TOTAL
/TITLE='not-completed'
/MISSING=VARIABLE
/CELLS=NONE .

However, there is something unelegant about this approach, even if it
almost works. I would like to assign missing values or something like
it to the dates. When a schema is not-completed, ideally it should not
have a value for DateCompleted or TimeCompleted.

Now the non-completed schemas are assigned the date 01/01/2001. Whereas
I'd like to show them as missing. Therefore, I run following code:
IF DateCompleted = DATE.DMY(01, 01, 2001) DatoCompleted = 0.
This will assign a uacceptable value for the case which is shown as a
blank on listings. However, this will influence the DESCRIPTIVES
results, as it is no longer able to determine the lowest value, even if
SUMMARIZE works the way I would like it to.

I have not succeeded in doing anything similar for my time variable,
where the missing cases have the value 0 which is acceptable value and
displays as 00:00.

Is there a way to assign a unacceptable values for a time variable, or
to assign some particular times as missing or non-valid? Or perhaps
there is a good technique that also works for dates?

I am running SPSS v15.

Sincerely,

Eero Olli

________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
[hidden email] +47 2405 5951
POB 8048 Dep, N-0031 Oslo, Norway

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD