Dear list,
I am having trouble converting a string variable (A45) to a SPSS date format. It is longer as usual, because it has a leading weekday-name AND I have different .sav files from different countries, and each weekday-name and month-name is localized! So I have dutch strings woensdag 29 november 2006 20:48:59 woensdag 29 november 2006 20:54:54 woensdag 29 november 2006 20:56:02 woensdag 29 november 2006 20:58:48 zondag 3 december 2006 11:51:14 zondag 3 december 2006 12:19:40 zondag 3 december 2006 12:31:26 and also spanish strings lunes, 27 de noviembre de 2006 14:26:20 Martes, 28 de Noviembre de 2006 02:30:58 a.m. lunes, 27 de noviembre de 2006 14:38:55 martes, 28 de noviembre de 2006 14:41:43 martes, 28 de noviembre de 2006 15:13:18 Martes, 28 de Noviembre de 2006 03:22:16 a.m. lunes, 27 de noviembre de 2006 15:33:39 martes, 28 de noviembre de 2006 15:31:47 lunes, 27 de noviembre de 2006 15:49:17 and also french, italian, .... and, some have the 24-hour format, some work with a.m./p.m. (as you can see above). 1. I would like to convert them all to the same time format and because I have 3 of those 45-Character String-Variables for each resondent (time1, time2, time3), 2. I would like to calculate differences between time1, time2, and time3. Usually all three times belong to one day, and it would be OK, if I loose the dayname and just calculate the differences in hours/minutes. I searched in the last 20000 emails from this group and also Mr.Levesque's homepage, but I did not find any time-conversion-examples that dealed with those long time-stamps and the differrent country-languages + partial am/pm problems. I confess, I come undone with this problem! Any help would be much appreciated. Werner Mueller (University of Duesseldorf, Germany) |
If you have SPSS 15 with programmability, you can handle all this with the strtodate function in the extendedTransforms module, which you use with the trans module. This function can handle localized calendar terms as well as having very flexible patterns for describing the pattern of the date/time.
These modules are downloadable from SPSS Developer Central at www.spss.com/devcentral. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Werner Mueller Sent: Tuesday, December 05, 2006 9:45 AM To: [hidden email] Subject: [SPSSX-L] Converting international long time-strings to valid SPSS 11 date-formats Dear list, I am having trouble converting a string variable (A45) to a SPSS date format. It is longer as usual, because it has a leading weekday-name AND I have different .sav files from different countries, and each weekday-name and month-name is localized! So I have dutch strings woensdag 29 november 2006 20:48:59 woensdag 29 november 2006 20:54:54 woensdag 29 november 2006 20:56:02 woensdag 29 november 2006 20:58:48 zondag 3 december 2006 11:51:14 zondag 3 december 2006 12:19:40 zondag 3 december 2006 12:31:26 and also spanish strings lunes, 27 de noviembre de 2006 14:26:20 Martes, 28 de Noviembre de 2006 02:30:58 a.m. lunes, 27 de noviembre de 2006 14:38:55 martes, 28 de noviembre de 2006 14:41:43 martes, 28 de noviembre de 2006 15:13:18 Martes, 28 de Noviembre de 2006 03:22:16 a.m. lunes, 27 de noviembre de 2006 15:33:39 martes, 28 de noviembre de 2006 15:31:47 lunes, 27 de noviembre de 2006 15:49:17 and also french, italian, .... and, some have the 24-hour format, some work with a.m./p.m. (as you can see above). 1. I would like to convert them all to the same time format and because I have 3 of those 45-Character String-Variables for each resondent (time1, time2, time3), 2. I would like to calculate differences between time1, time2, and time3. Usually all three times belong to one day, and it would be OK, if I loose the dayname and just calculate the differences in hours/minutes. I searched in the last 20000 emails from this group and also Mr.Levesque's homepage, but I did not find any time-conversion-examples that dealed with those long time-stamps and the differrent country-languages + partial am/pm problems. I confess, I come undone with this problem! Any help would be much appreciated. Werner Mueller (University of Duesseldorf, Germany) |
Dear Jon,
thank you for your hint. I am sorry, but it does not help, because I use/own only the version 11.5. I have asked my colleagues, and they say, that version 14 (not 15) will be next for us. So I am still looking for a solution. Werner ----- Original Message ----- From: "Peck, Jon" <[hidden email]> To: <[hidden email]> Sent: Tuesday, December 05, 2006 7:09 PM Subject: Re: Converting international long time-strings to valid SPSS 11 date-formats | If you have SPSS 15 with programmability, you can handle all this with the strtodate function in the extendedTransforms module, which you use with the trans module. This function can handle localized calendar terms as well as having very flexible patterns for describing the pattern of the date/time. | | These modules are downloadable from SPSS Developer Central at www.spss.com/devcentral. | | HTH, | Jon Peck | | -----Original Message----- | From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Werner Mueller | Sent: Tuesday, December 05, 2006 9:45 AM | To: [hidden email] | Subject: [SPSSX-L] Converting international long time-strings to valid SPSS 11 date-formats | | Dear list, | | I am having trouble converting a string variable (A45) to a SPSS date format. | It is longer as usual, because it has a leading weekday-name | AND I have different .sav files from different countries, | and each weekday-name and month-name is localized! | | So I have dutch strings | woensdag 29 november 2006 20:48:59 | woensdag 29 november 2006 20:54:54 | woensdag 29 november 2006 20:56:02 | woensdag 29 november 2006 20:58:48 | zondag 3 december 2006 11:51:14 | zondag 3 december 2006 12:19:40 | zondag 3 december 2006 12:31:26 | | and also spanish strings | lunes, 27 de noviembre de 2006 14:26:20 | Martes, 28 de Noviembre de 2006 02:30:58 a.m. | lunes, 27 de noviembre de 2006 14:38:55 | martes, 28 de noviembre de 2006 14:41:43 | martes, 28 de noviembre de 2006 15:13:18 | Martes, 28 de Noviembre de 2006 03:22:16 a.m. | lunes, 27 de noviembre de 2006 15:33:39 | martes, 28 de noviembre de 2006 15:31:47 | lunes, 27 de noviembre de 2006 15:49:17 | | and also french, italian, .... | and, some have the 24-hour format, some work with a.m./p.m. (as you can see above). | | 1. I would like to convert them all to the same time format and | because I have 3 of those 45-Character String-Variables for each resondent (time1, time2, time3), | 2. I would like to calculate differences between time1, time2, and time3. | Usually all three times belong to one day, and it would be OK, | if I loose the dayname and just calculate the differences in hours/minutes. | | I searched in the last 20000 emails from this group and also Mr.Levesque's homepage, | but I did not find any time-conversion-examples that dealed with those long time-stamps | and the differrent country-languages + partial am/pm problems. | | I confess, I come undone with this problem! | | Any help would be much appreciated. | | Werner Mueller | (University of Duesseldorf, Germany) | |
In reply to this post by Werner Mueller
This is a fun programming exercise.
(AFTER: Me and my arrogance! Parsing programs are fun, but even simple ones are a *lot* of work.) At 10:44 AM 12/5/2006, Werner Mueller wrote: >I am having trouble converting a string variable (A45) to a SPSS date >format. It is longer as usual, because it has a leading weekday-name >AND I have different .sav files from different countries, and each >weekday-name and month-name is localized! Here are some assumptions: . Your records don't give the language explicitly (it will be easier and more accurate, if they do). . The weekday name, counting a trailing comma or other punctuation, is the first 'word' (string of characters containing no blanks) in the string. (There may be leading blanks preceding the weekday name.) For conversion, the weekday name may, and will, be ignored. (Alternatively, you could check the weekday against the final date, for consistency.) . The form of the date, following the weekday name, is always day (numeric), month name (character), year (four-digit numeric), time (hh:mm:ss), am/pm (character, optional), EXCEPT that prepositions may be interspersed (see Spanish). [Test data omitted; see original posting] >1. I would like to convert them all to the same time format The highly recommended format is an SPSS date-time variable. Those are numeric, and can have various character presentations. >I have 3 of those 45-Character String-Variables for each >respondent (time1, time2, time3), For that, wrap the conversion code in a DO REPEAT: NUMERIC time1_CV time2_CV time3_CV (DATETIME20). DO REPEAT Time_str = time1 time2 time3 /Time_Cvt = time1_CV time2_CV time3_CV. * Insert conversion syntax, with input character . * variable named "Time_str" and output date . * variable named "Time_CVT". . END REPEAT. >2. I would like to calculate differences between time1, time2, and >time3. The arithmetic difference between two SPSS date-time variables is their time difference in seconds. To get the difference in other units, divide by the appropriate constant, or (recommended) use the CTIME functions. OK, here's the parsing code, and output. It's tested (took a while, too); this is SPSS draft output. . Input is not case-sensitive. . You must add ALL valid month names and abbreviations (in a RECODE); that are valid; ALL abbreviations that are valid forms of 'am' and 'pm' (in another RECCODE); and ALL prepositions and other words to ignore, like "de" (assigned to the variable "#IGNORE"). All these must be entered in UPPER CASE. . As any good parser should, it includes extensive error checking. Conceivably, some valid combinations will be flagged as errors. Any words that should be in the above lists, but aren't, will be flagged as errors. . It's not fast. It's not easy to make an SPSS program compute-bound, but this one manages it. . This listing includes tracing the parsing path. To run without tracing, change string "/**/" to "/*--" throughout; however, I haven't tested it that way. * ........................................................... . * "I am having trouble converting a string variable (A45) to . * a SPSS date format. It has a leading weekday-name AND I have. * different files from different countries, and each weekday- . * name and month-name is localized!" . LIST. Notes |-----------------------------|---------------------------| |Output Created |06-DEC-2006 15:54:58 | |-----------------------------|---------------------------| TIME_STR woensdag 29 november 2006 20:48:59 woensdag 29 november 2006 20:54:54 woensdag 29 november 2006 20:56:02 woensdag 29 november 2006 20:58:48 zondag 3 december 2006 11:51:14 zondag 3 december 2006 12:19:40 zondag 3 december 2006 12:31:26 lunes, 27 de noviembre de 2006 14:26:20 Martes, 28 de Noviembre de 2006 02:30:58 a.m. lunes, 27 de noviembre de 2006 14:38:55 martes, 28 de noviembre de 2006 14:41:43 martes, 28 de noviembre de 2006 15:13:18 Martes, 28 de Noviembre de 2006 03:22:16 a.m. lunes, 27 de noviembre de 2006 15:33:39 martes, 28 de noviembre de 2006 15:31:47 lunes, 27 de noviembre de 2006 15:49:17 Number of cases read: 16 Number of cases listed: 16 * .......... Declare the output variables .......... . STRING ERROR (A5). NUMERIC Time_Cvt (DATETIME20). STRING LANGUAGE (A10) /* Not implemented here */. * .......... The computations .......... . * ---- Set up . * .. Scratch variables . STRING #BUFFER (A45) /#WORD (A20) /#IGNORE (A45). NUMERIC #POINTER (F03). NUMERIC #DAY #MONTH #YEAR (F04) /#TOD (TIME8). STRING #AM_PM (A02). NUMERIC #LoclErr/*Error this word*/(F02). STRING #TimeWrd/*Char. time val.*/(A20). * .. Prepositions and other words to ignore . COMPUTE #IGNORE = 'of de'. * (Insure the list is in uppercase, and has . * one leading blank.) . COMPUTE #IGNORE = CONCAT(' ',LTRIM(UPCASE(#IGNORE))). * .. Initialize . RECODE #DAY #MONTH #YEAR #TOD #POINTER (ELSE=SYSMIS). COMPUTE #WORD = ''. COMPUTE ERROR = '' /* Not actually needed . * Next "part" of date-time that is needed: . * 1 Weekday . * 2 Day . * 3 Month . * 4 Year . * 5 Time . * 6 AM/PM . NUMERIC #NxtPart (F02). * .. Load the input for processing, & trace. . COMPUTE #BUFFER = LTRIM(UPCASE(TIME_STR)). . /**/ PRINT // ' Input: ' TIME_STR. * ---- Parse the string . LOOP #NxtPart = 1 TO 6. * Find the next word in the input, ignoring words . * on the "#IGNORE" list . . COMPUTE #WORD = ''. . LOOP #TRY = 1 TO 10 IF ERROR EQ ''. . COMPUTE #POINTER = INDEX (#BUFFER,' '). . COMPUTE #WORD = SUBSTR(#BUFFER,1,#POINTER). . COMPUTE #BUFFER = LTRIM(SUBSTR(#BUFFER,#POINTER)). . DO IF #WORD NE '' AND INDEX(#IGNORE, CONCAT(' ',#WORD)) GT 0. . /**/ PRINT / ' Ignore: ' #WORD. . COMPUTE #WORD = ''. . END IF. . END LOOP IF #POINTER EQ 1 /* Failed search */ OR #WORD NE '' /* Successful search */. * Parse the next required part of the date-time . . DO IF #NxtPart EQ 1 /* Weekday name */. * /* ------------ */. . /**/ PRINT / ' Weekday:' #WORD. . ELSE IF #NxtPart EQ 2 /* Day of month */. . COMPUTE #LoclErr =0 /* ------------ */. . DO IF #WORD EQ''. . COMPUTE #LoclErr = 1. . END IF. . COMPUTE #DAY = NUMBER(#WORD,F2). . IF SYSMIS (#DAY) #LoclErr = 1. . IF SYSMIS (#DAY) #LoclErr = 1. . DO IF #LoclErr EQ 0. . /**/ PRINT / ' Day: ' #WORD. . ELSE. . PRINT / ' Day: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . ELSE IF #NxtPart EQ 3 /* Month name */. . COMPUTE #LoclErr =0 /* ------------ */. . DO IF #WORD EQ ''. . COMPUTE #LoclErr = 1. . END IF. * In the following RECODE, give the month names, . * and accepted month abbreviations, for all the . * languages used. They must be given in upper case . . RECODE #WORD ('JAN','JANUARY' = 1) ('FEB','FEBRUARY' = 2) ('MAR','MARCH' = 3) ('APR','APRIL' = 4) ('MAY' = 5) ('JUN','JUNE' = 6) ('JUL','JULY' = 7) ('AUG','AUGUST' = 8) ('SEP','SEPTEMBER' = 9) ('OCT','OCTOBER' = 10) ('NOV','NOVEMBER' 'NOVIEMBRE' = 11) ('DEC','DECEMBER' = 12) (ELSE = SYSMIS) INTO #MONTH. IF MISSING(#MONTH) #LoclErr = 1. . DO IF #LoclErr EQ 0. . /**/ PRINT / ' Month: ' #WORD. . ELSE. . PRINT / ' Month: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . ELSE IF #NxtPart EQ 4 /* Year */. . COMPUTE #LoclErr =0 /* ------------ */. . DO IF #WORD EQ ''. . COMPUTE #LoclErr = 1. . END IF. . COMPUTE #YEAR = NUMBER(#WORD,F4). . IF SYSMIS (#YEAR) #LoclErr = 1. . IF NOT RANGE(#YEAR,1900,2100) #LoclErr = 1. . DO IF #LoclErr EQ 0. . /**/ PRINT / ' Year: ' #WORD. . ELSE. . PRINT / ' Year: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . ELSE IF #NxtPart EQ 5 /* Time of day */. . COMPUTE #LoclErr =0 /* ------------ */. . DO IF #WORD EQ ''. . COMPUTE #LoclErr = 1. . END IF. . COMPUTE #TimeWrd = #WORD. . COMPUTE #TOD = NUMBER(#WORD,TIME8). . IF SYSMIS (#YEAR) #LoclErr = 1. . IF NOT RANGE(#YEAR,1900,2100) #LoclErr = 1. . DO IF #LoclErr EQ 0. . /**/ PRINT / ' Time: ' #WORD. . ELSE. . PRINT / ' Time: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . ELSE IF #NxtPart EQ 6 /* AM/PM */. . COMPUTE #LoclErr =0 /* ------------ */. * In the following RECODE, as for the month names, . * give all accepted abbreviations, in upper case. . . RECODE #WORD ('' = ' ') /* Not an error ('AM','A.M.' = 'AM') ('PM','P.M.' = 'PM') (ELSE = 'XX') INTO #AM_PM. . IF #AM_PM EQ 'XX' #LocalErr = 1. . DO IF #LoclErr EQ 0. . DO IF #AM_PM NE ''. . /**/ PRINT / ' AM/PM: ' #WORD. . END IF. . ELSE. . PRINT / ' AM/PM: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . END IF. END LOOP. * ---- Assemble the date-time value . * .. Apply AM/PM correction: . COMPUTE #LoclErr = 0. DO IF MISSING(#TOD) /* Skip computation */. ELSE IF #AM_PM EQ 'AM'. . IF #TOD GE TIME.HMS(13,0,0) #LoclErr = 1. . IF #TOD GE TIME.HMS(12,0,0) #TOD = #TOD - TIME.HMS(12,0,0). ELSE IF #AM_PM EQ 'PM'. . IF #TOD GE TIME.HMS(13,0,0) #LoclErr = 1. . IF #TOD LT TIME.HMS(12) #TOD = #TOD + TIME.HMS(12,0,0). END IF. COMPUTE #WORD = CONCAT(RTRIM(#TimeWrd),' ',#AM_PM). DO IF #LoclErr EQ 0. . DO IF #AM_PM NE ''. . /**/ PRINT / ' AM/PM: ' #WORD. . END IF. ELSE. . PRINT / ' Time: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. END IF. * .. Combine all the parts: . IF ERROR EQ '' Time_Cvt = DATE.DMY(#DAY,#MONTH,#YEAR) + #TOD. . /**/ PRINT / ' Output: ' Time_Cvt. . /**/ EXECUTE. Input: woensdag 29 november 2006 20:48:59 Weekday:WOENSDAG Day: 29 Month: NOVEMBER Year: 2006 Time: 20:48:59 Output: 29-NOV-2006 20:48:59 Input: woensdag 29 november 2006 20:54:54 Weekday:WOENSDAG Day: 29 Month: NOVEMBER Year: 2006 Time: 20:54:54 Output: 29-NOV-2006 20:54:54 Input: woensdag 29 november 2006 20:56:02 Weekday:WOENSDAG Day: 29 Month: NOVEMBER Year: 2006 Time: 20:56:02 Output: 29-NOV-2006 20:56:02 Input: woensdag 29 november 2006 20:58:48 Weekday:WOENSDAG Day: 29 Month: NOVEMBER Year: 2006 Time: 20:58:48 Output: 29-NOV-2006 20:58:48 Input: zondag 3 december 2006 11:51:14 Weekday:ZONDAG Day: 3 Month: DECEMBER Year: 2006 Time: 11:51:14 Output: 03-DEC-2006 11:51:14 Input: zondag 3 december 2006 12:19:40 Weekday:ZONDAG Day: 3 Month: DECEMBER Year: 2006 Time: 12:19:40 Output: 03-DEC-2006 12:19:40 Input: zondag 3 december 2006 12:31:26 Weekday:ZONDAG Day: 3 Month: DECEMBER Year: 2006 Time: 12:31:26 Output: 03-DEC-2006 12:31:26 Input: lunes, 27 de noviembre de 2006 14:26:20 Weekday:LUNES, Day: 27 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 14:26:20 Output: 27-NOV-2006 14:26:20 Input: Martes, 28 de Noviembre de 2006 02:30:58 a.m. Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 02:30:58 AM/PM: A.M. AM/PM: 02:30:58 AM Output: 28-NOV-2006 02:30:58 Input: lunes, 27 de noviembre de 2006 14:38:55 Weekday:LUNES, Day: 27 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 14:38:55 Output: 27-NOV-2006 14:38:55 Input: martes, 28 de noviembre de 2006 14:41:43 Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 14:41:43 Output: 28-NOV-2006 14:41:43 Input: martes, 28 de noviembre de 2006 15:13:18 Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 15:13:18 Output: 28-NOV-2006 15:13:18 Input: Martes, 28 de Noviembre de 2006 03:22:16 a.m. Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 03:22:16 AM/PM: A.M. AM/PM: 03:22:16 AM Output: 28-NOV-2006 03:22:16 Input: lunes, 27 de noviembre de 2006 15:33:39 Weekday:LUNES, Day: 27 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 15:33:39 Output: 27-NOV-2006 15:33:39 Input: martes, 28 de noviembre de 2006 15:31:47 Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 15:31:47 Output: 28-NOV-2006 15:31:47 Input: lunes, 27 de noviembre de 2006 15:49:17 Weekday:LUNES, Day: 27 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 15:49:17 Output: 27-NOV-2006 15:49:17 * .......... List the results .......... . TEMPORARY. STRING SPACE (A10). STRING WEEKDAY (A10). COMPUTE #WKDYNUM = XDATE.WKDAY(Time_Cvt). RECODE #WKDYNUM (1 = 'Sunday' ) (2 = 'Monday' ) (3 = 'Tuesday' ) (4 = 'Wednesday') (5 = 'Thursday' ) (6 = 'Friday' ) (7 = 'Saturday' ) INTO WEEKDAY. LIST /VARIABLES = Time_Str Error SPACE Weekday Time_Cvt. Notes |-----------------------------|---------------------------| |Output Created |06-DEC-2006 15:55:04 | |-----------------------------|---------------------------| The variables are listed in the following order: LINE 1: TIME_STR ERROR SPACE LINE 2: WEEKDAY Time_Cvt TIME_STR: woensdag 29 november 2006 20:48:59 WEEKDAY: Wednesday 29-NOV-2006 20:48:59 TIME_STR: woensdag 29 november 2006 20:54:54 WEEKDAY: Wednesday 29-NOV-2006 20:54:54 TIME_STR: woensdag 29 november 2006 20:56:02 WEEKDAY: Wednesday 29-NOV-2006 20:56:02 TIME_STR: woensdag 29 november 2006 20:58:48 WEEKDAY: Wednesday 29-NOV-2006 20:58:48 TIME_STR: zondag 3 december 2006 11:51:14 WEEKDAY: Sunday 03-DEC-2006 11:51:14 TIME_STR: zondag 3 december 2006 12:19:40 WEEKDAY: Sunday 03-DEC-2006 12:19:40 TIME_STR: zondag 3 december 2006 12:31:26 WEEKDAY: Sunday 03-DEC-2006 12:31:26 TIME_STR: lunes, 27 de noviembre de 2006 14:26:20 WEEKDAY: Monday 27-NOV-2006 14:26:20 TIME_STR: Martes, 28 de Noviembre de 2006 02:30:58 a.m. WEEKDAY: Tuesday 28-NOV-2006 02:30:58 TIME_STR: lunes, 27 de noviembre de 2006 14:38:55 WEEKDAY: Monday 27-NOV-2006 14:38:55 TIME_STR: martes, 28 de noviembre de 2006 14:41:43 WEEKDAY: Tuesday 28-NOV-2006 14:41:43 TIME_STR: martes, 28 de noviembre de 2006 15:13:18 WEEKDAY: Tuesday 28-NOV-2006 15:13:18 TIME_STR: Martes, 28 de Noviembre de 2006 03:22:16 a.m. WEEKDAY: Tuesday 28-NOV-2006 03:22:16 TIME_STR: lunes, 27 de noviembre de 2006 15:33:39 WEEKDAY: Monday 27-NOV-2006 15:33:39 TIME_STR: martes, 28 de noviembre de 2006 15:31:47 WEEKDAY: Tuesday 28-NOV-2006 15:31:47 TIME_STR: lunes, 27 de noviembre de 2006 15:49:17 WEEKDAY: Monday 27-NOV-2006 15:49:17 Number of cases read: 16 Number of cases listed: 16 =================== APPENDIX: TEST DATA =================== DATA LIST FIXED / TIME_STR 01-45 (A). Data List will read 1 records from the command file Variable Rec Start End Format TIME_STR 1 1 45 A45 BEGIN DATA woensdag 29 november 2006 20:48:59 woensdag 29 november 2006 20:54:54 woensdag 29 november 2006 20:56:02 woensdag 29 november 2006 20:58:48 zondag 3 december 2006 11:51:14 zondag 3 december 2006 12:19:40 zondag 3 december 2006 12:31:26 lunes, 27 de noviembre de 2006 14:26:20 Martes, 28 de Noviembre de 2006 02:30:58 a.m. lunes, 27 de noviembre de 2006 14:38:55 martes, 28 de noviembre de 2006 14:41:43 martes, 28 de noviembre de 2006 15:13:18 Martes, 28 de Noviembre de 2006 03:22:16 a.m. lunes, 27 de noviembre de 2006 15:33:39 martes, 28 de noviembre de 2006 15:31:47 lunes, 27 de noviembre de 2006 15:49:17 END DATA. |
Richard has given a given a good example of what a wizard can do with the power of SPSS if you have the skills and really work at it.
Here is a solution using programmability that requires a lot less work. This example assumes Dutch. You would change the locale setting appropriately for each sav file and ultimately merge them together. begin program. import spss, spssdata, trans, extendedTransforms, locale locale.setlocale(locale.LC_ALL, "dutch") tp = trans.Tfunction() tp.append(extendedTransforms.strtodatetime, "datevar", "DATETIME20", ["dates", trans.const("%A %d %B %Y %H:%M:%S ")]) tp.execute() end program. I know that Python is not an option for Werner Mueller right now, but I couldn't resist posting this. To explain: this loops over all the cases applying the strdodatetime function in the extendedTransforms module to create a new variable, datevar, with an SPSS datetime format. It decodes the string variable, dates, according to the pattern above. The pattern means day of week, day number, month name, year, hour, minute, second. By setting the locale to Dutch, the pattern recognizes the day name and month name in the corresponding language. woensdag 29 november 2006 20:48:59 29-NOV-2006 20:48:59 woensdag 29 november 2006 20:54:54 29-NOV-2006 20:54:54 woensdag 29 november 2006 20:56:02 29-NOV-2006 20:56:02 woensdag 29 november 2006 20:58:48 29-NOV-2006 20:58:48 zondag 3 december 2006 11:51:14 03-DEC-2006 11:51:14 zondag 3 december 2006 12:19:40 03-DEC-2006 12:19:40 zondag 3 december 2006 12:31:26 03-DEC-2006 12:31:26 Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, December 06, 2006 3:15 PM To: [hidden email] Subject: Re: [SPSSX-L] Converting international long time-strings This is a fun programming exercise. (AFTER: Me and my arrogance! Parsing programs are fun, but even simple ones are a *lot* of work.) At 10:44 AM 12/5/2006, Werner Mueller wrote: >I am having trouble converting a string variable (A45) to a SPSS date >format. It is longer as usual, because it has a leading weekday-name >AND I have different .sav files from different countries, and each >weekday-name and month-name is localized! Here are some assumptions: . Your records don't give the language explicitly (it will be easier and more accurate, if they do). . The weekday name, counting a trailing comma or other punctuation, is the first 'word' (string of characters containing no blanks) in the string. (There may be leading blanks preceding the weekday name.) For conversion, the weekday name may, and will, be ignored. (Alternatively, you could check the weekday against the final date, for consistency.) . The form of the date, following the weekday name, is always day (numeric), month name (character), year (four-digit numeric), time (hh:mm:ss), am/pm (character, optional), EXCEPT that prepositions may be interspersed (see Spanish). [Test data omitted; see original posting] >1. I would like to convert them all to the same time format The highly recommended format is an SPSS date-time variable. Those are numeric, and can have various character presentations. >I have 3 of those 45-Character String-Variables for each >respondent (time1, time2, time3), For that, wrap the conversion code in a DO REPEAT: NUMERIC time1_CV time2_CV time3_CV (DATETIME20). DO REPEAT Time_str = time1 time2 time3 /Time_Cvt = time1_CV time2_CV time3_CV. * Insert conversion syntax, with input character . * variable named "Time_str" and output date . * variable named "Time_CVT". . END REPEAT. >2. I would like to calculate differences between time1, time2, and >time3. The arithmetic difference between two SPSS date-time variables is their time difference in seconds. To get the difference in other units, divide by the appropriate constant, or (recommended) use the CTIME functions. OK, here's the parsing code, and output. It's tested (took a while, too); this is SPSS draft output. . Input is not case-sensitive. . You must add ALL valid month names and abbreviations (in a RECODE); that are valid; ALL abbreviations that are valid forms of 'am' and 'pm' (in another RECCODE); and ALL prepositions and other words to ignore, like "de" (assigned to the variable "#IGNORE"). All these must be entered in UPPER CASE. . As any good parser should, it includes extensive error checking. Conceivably, some valid combinations will be flagged as errors. Any words that should be in the above lists, but aren't, will be flagged as errors. . It's not fast. It's not easy to make an SPSS program compute-bound, but this one manages it. . This listing includes tracing the parsing path. To run without tracing, change string "/**/" to "/*--" throughout; however, I haven't tested it that way. * ........................................................... . * "I am having trouble converting a string variable (A45) to . * a SPSS date format. It has a leading weekday-name AND I have. * different files from different countries, and each weekday- . * name and month-name is localized!" . LIST. Notes |-----------------------------|---------------------------| |Output Created |06-DEC-2006 15:54:58 | |-----------------------------|---------------------------| TIME_STR woensdag 29 november 2006 20:48:59 woensdag 29 november 2006 20:54:54 woensdag 29 november 2006 20:56:02 woensdag 29 november 2006 20:58:48 zondag 3 december 2006 11:51:14 zondag 3 december 2006 12:19:40 zondag 3 december 2006 12:31:26 lunes, 27 de noviembre de 2006 14:26:20 Martes, 28 de Noviembre de 2006 02:30:58 a.m. lunes, 27 de noviembre de 2006 14:38:55 martes, 28 de noviembre de 2006 14:41:43 martes, 28 de noviembre de 2006 15:13:18 Martes, 28 de Noviembre de 2006 03:22:16 a.m. lunes, 27 de noviembre de 2006 15:33:39 martes, 28 de noviembre de 2006 15:31:47 lunes, 27 de noviembre de 2006 15:49:17 Number of cases read: 16 Number of cases listed: 16 * .......... Declare the output variables .......... . STRING ERROR (A5). NUMERIC Time_Cvt (DATETIME20). STRING LANGUAGE (A10) /* Not implemented here */. * .......... The computations .......... . * ---- Set up . * .. Scratch variables . STRING #BUFFER (A45) /#WORD (A20) /#IGNORE (A45). NUMERIC #POINTER (F03). NUMERIC #DAY #MONTH #YEAR (F04) /#TOD (TIME8). STRING #AM_PM (A02). NUMERIC #LoclErr/*Error this word*/(F02). STRING #TimeWrd/*Char. time val.*/(A20). * .. Prepositions and other words to ignore . COMPUTE #IGNORE = 'of de'. * (Insure the list is in uppercase, and has . * one leading blank.) . COMPUTE #IGNORE = CONCAT(' ',LTRIM(UPCASE(#IGNORE))). * .. Initialize . RECODE #DAY #MONTH #YEAR #TOD #POINTER (ELSE=SYSMIS). COMPUTE #WORD = ''. COMPUTE ERROR = '' /* Not actually needed . * Next "part" of date-time that is needed: . * 1 Weekday . * 2 Day . * 3 Month . * 4 Year . * 5 Time . * 6 AM/PM . NUMERIC #NxtPart (F02). * .. Load the input for processing, & trace. . COMPUTE #BUFFER = LTRIM(UPCASE(TIME_STR)). . /**/ PRINT // ' Input: ' TIME_STR. * ---- Parse the string . LOOP #NxtPart = 1 TO 6. * Find the next word in the input, ignoring words . * on the "#IGNORE" list . . COMPUTE #WORD = ''. . LOOP #TRY = 1 TO 10 IF ERROR EQ ''. . COMPUTE #POINTER = INDEX (#BUFFER,' '). . COMPUTE #WORD = SUBSTR(#BUFFER,1,#POINTER). . COMPUTE #BUFFER = LTRIM(SUBSTR(#BUFFER,#POINTER)). . DO IF #WORD NE '' AND INDEX(#IGNORE, CONCAT(' ',#WORD)) GT 0. . /**/ PRINT / ' Ignore: ' #WORD. . COMPUTE #WORD = ''. . END IF. . END LOOP IF #POINTER EQ 1 /* Failed search */ OR #WORD NE '' /* Successful search */. * Parse the next required part of the date-time . . DO IF #NxtPart EQ 1 /* Weekday name */. * /* ------------ */. . /**/ PRINT / ' Weekday:' #WORD. . ELSE IF #NxtPart EQ 2 /* Day of month */. . COMPUTE #LoclErr =0 /* ------------ */. . DO IF #WORD EQ''. . COMPUTE #LoclErr = 1. . END IF. . COMPUTE #DAY = NUMBER(#WORD,F2). . IF SYSMIS (#DAY) #LoclErr = 1. . IF SYSMIS (#DAY) #LoclErr = 1. . DO IF #LoclErr EQ 0. . /**/ PRINT / ' Day: ' #WORD. . ELSE. . PRINT / ' Day: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . ELSE IF #NxtPart EQ 3 /* Month name */. . COMPUTE #LoclErr =0 /* ------------ */. . DO IF #WORD EQ ''. . COMPUTE #LoclErr = 1. . END IF. * In the following RECODE, give the month names, . * and accepted month abbreviations, for all the . * languages used. They must be given in upper case . . RECODE #WORD ('JAN','JANUARY' = 1) ('FEB','FEBRUARY' = 2) ('MAR','MARCH' = 3) ('APR','APRIL' = 4) ('MAY' = 5) ('JUN','JUNE' = 6) ('JUL','JULY' = 7) ('AUG','AUGUST' = 8) ('SEP','SEPTEMBER' = 9) ('OCT','OCTOBER' = 10) ('NOV','NOVEMBER' 'NOVIEMBRE' = 11) ('DEC','DECEMBER' = 12) (ELSE = SYSMIS) INTO #MONTH. IF MISSING(#MONTH) #LoclErr = 1. . DO IF #LoclErr EQ 0. . /**/ PRINT / ' Month: ' #WORD. . ELSE. . PRINT / ' Month: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . ELSE IF #NxtPart EQ 4 /* Year */. . COMPUTE #LoclErr =0 /* ------------ */. . DO IF #WORD EQ ''. . COMPUTE #LoclErr = 1. . END IF. . COMPUTE #YEAR = NUMBER(#WORD,F4). . IF SYSMIS (#YEAR) #LoclErr = 1. . IF NOT RANGE(#YEAR,1900,2100) #LoclErr = 1. . DO IF #LoclErr EQ 0. . /**/ PRINT / ' Year: ' #WORD. . ELSE. . PRINT / ' Year: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . ELSE IF #NxtPart EQ 5 /* Time of day */. . COMPUTE #LoclErr =0 /* ------------ */. . DO IF #WORD EQ ''. . COMPUTE #LoclErr = 1. . END IF. . COMPUTE #TimeWrd = #WORD. . COMPUTE #TOD = NUMBER(#WORD,TIME8). . IF SYSMIS (#YEAR) #LoclErr = 1. . IF NOT RANGE(#YEAR,1900,2100) #LoclErr = 1. . DO IF #LoclErr EQ 0. . /**/ PRINT / ' Time: ' #WORD. . ELSE. . PRINT / ' Time: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . ELSE IF #NxtPart EQ 6 /* AM/PM */. . COMPUTE #LoclErr =0 /* ------------ */. * In the following RECODE, as for the month names, . * give all accepted abbreviations, in upper case. . . RECODE #WORD ('' = ' ') /* Not an error ('AM','A.M.' = 'AM') ('PM','P.M.' = 'PM') (ELSE = 'XX') INTO #AM_PM. . IF #AM_PM EQ 'XX' #LocalErr = 1. . DO IF #LoclErr EQ 0. . DO IF #AM_PM NE ''. . /**/ PRINT / ' AM/PM: ' #WORD. . END IF. . ELSE. . PRINT / ' AM/PM: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. . END IF. . END IF. END LOOP. * ---- Assemble the date-time value . * .. Apply AM/PM correction: . COMPUTE #LoclErr = 0. DO IF MISSING(#TOD) /* Skip computation */. ELSE IF #AM_PM EQ 'AM'. . IF #TOD GE TIME.HMS(13,0,0) #LoclErr = 1. . IF #TOD GE TIME.HMS(12,0,0) #TOD = #TOD - TIME.HMS(12,0,0). ELSE IF #AM_PM EQ 'PM'. . IF #TOD GE TIME.HMS(13,0,0) #LoclErr = 1. . IF #TOD LT TIME.HMS(12) #TOD = #TOD + TIME.HMS(12,0,0). END IF. COMPUTE #WORD = CONCAT(RTRIM(#TimeWrd),' ',#AM_PM). DO IF #LoclErr EQ 0. . DO IF #AM_PM NE ''. . /**/ PRINT / ' AM/PM: ' #WORD. . END IF. ELSE. . PRINT / ' Time: ' #WORD ' Error'. . COMPUTE ERROR = 'Error'. END IF. * .. Combine all the parts: . IF ERROR EQ '' Time_Cvt = DATE.DMY(#DAY,#MONTH,#YEAR) + #TOD. . /**/ PRINT / ' Output: ' Time_Cvt. . /**/ EXECUTE. Input: woensdag 29 november 2006 20:48:59 Weekday:WOENSDAG Day: 29 Month: NOVEMBER Year: 2006 Time: 20:48:59 Output: 29-NOV-2006 20:48:59 Input: woensdag 29 november 2006 20:54:54 Weekday:WOENSDAG Day: 29 Month: NOVEMBER Year: 2006 Time: 20:54:54 Output: 29-NOV-2006 20:54:54 Input: woensdag 29 november 2006 20:56:02 Weekday:WOENSDAG Day: 29 Month: NOVEMBER Year: 2006 Time: 20:56:02 Output: 29-NOV-2006 20:56:02 Input: woensdag 29 november 2006 20:58:48 Weekday:WOENSDAG Day: 29 Month: NOVEMBER Year: 2006 Time: 20:58:48 Output: 29-NOV-2006 20:58:48 Input: zondag 3 december 2006 11:51:14 Weekday:ZONDAG Day: 3 Month: DECEMBER Year: 2006 Time: 11:51:14 Output: 03-DEC-2006 11:51:14 Input: zondag 3 december 2006 12:19:40 Weekday:ZONDAG Day: 3 Month: DECEMBER Year: 2006 Time: 12:19:40 Output: 03-DEC-2006 12:19:40 Input: zondag 3 december 2006 12:31:26 Weekday:ZONDAG Day: 3 Month: DECEMBER Year: 2006 Time: 12:31:26 Output: 03-DEC-2006 12:31:26 Input: lunes, 27 de noviembre de 2006 14:26:20 Weekday:LUNES, Day: 27 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 14:26:20 Output: 27-NOV-2006 14:26:20 Input: Martes, 28 de Noviembre de 2006 02:30:58 a.m. Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 02:30:58 AM/PM: A.M. AM/PM: 02:30:58 AM Output: 28-NOV-2006 02:30:58 Input: lunes, 27 de noviembre de 2006 14:38:55 Weekday:LUNES, Day: 27 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 14:38:55 Output: 27-NOV-2006 14:38:55 Input: martes, 28 de noviembre de 2006 14:41:43 Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 14:41:43 Output: 28-NOV-2006 14:41:43 Input: martes, 28 de noviembre de 2006 15:13:18 Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 15:13:18 Output: 28-NOV-2006 15:13:18 Input: Martes, 28 de Noviembre de 2006 03:22:16 a.m. Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 03:22:16 AM/PM: A.M. AM/PM: 03:22:16 AM Output: 28-NOV-2006 03:22:16 Input: lunes, 27 de noviembre de 2006 15:33:39 Weekday:LUNES, Day: 27 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 15:33:39 Output: 27-NOV-2006 15:33:39 Input: martes, 28 de noviembre de 2006 15:31:47 Weekday:MARTES, Day: 28 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 15:31:47 Output: 28-NOV-2006 15:31:47 Input: lunes, 27 de noviembre de 2006 15:49:17 Weekday:LUNES, Day: 27 Ignore: DE Month: NOVIEMBRE Ignore: DE Year: 2006 Time: 15:49:17 Output: 27-NOV-2006 15:49:17 * .......... List the results .......... . TEMPORARY. STRING SPACE (A10). STRING WEEKDAY (A10). COMPUTE #WKDYNUM = XDATE.WKDAY(Time_Cvt). RECODE #WKDYNUM (1 = 'Sunday' ) (2 = 'Monday' ) (3 = 'Tuesday' ) (4 = 'Wednesday') (5 = 'Thursday' ) (6 = 'Friday' ) (7 = 'Saturday' ) INTO WEEKDAY. LIST /VARIABLES = Time_Str Error SPACE Weekday Time_Cvt. Notes |-----------------------------|---------------------------| |Output Created |06-DEC-2006 15:55:04 | |-----------------------------|---------------------------| The variables are listed in the following order: LINE 1: TIME_STR ERROR SPACE LINE 2: WEEKDAY Time_Cvt TIME_STR: woensdag 29 november 2006 20:48:59 WEEKDAY: Wednesday 29-NOV-2006 20:48:59 TIME_STR: woensdag 29 november 2006 20:54:54 WEEKDAY: Wednesday 29-NOV-2006 20:54:54 TIME_STR: woensdag 29 november 2006 20:56:02 WEEKDAY: Wednesday 29-NOV-2006 20:56:02 TIME_STR: woensdag 29 november 2006 20:58:48 WEEKDAY: Wednesday 29-NOV-2006 20:58:48 TIME_STR: zondag 3 december 2006 11:51:14 WEEKDAY: Sunday 03-DEC-2006 11:51:14 TIME_STR: zondag 3 december 2006 12:19:40 WEEKDAY: Sunday 03-DEC-2006 12:19:40 TIME_STR: zondag 3 december 2006 12:31:26 WEEKDAY: Sunday 03-DEC-2006 12:31:26 TIME_STR: lunes, 27 de noviembre de 2006 14:26:20 WEEKDAY: Monday 27-NOV-2006 14:26:20 TIME_STR: Martes, 28 de Noviembre de 2006 02:30:58 a.m. WEEKDAY: Tuesday 28-NOV-2006 02:30:58 TIME_STR: lunes, 27 de noviembre de 2006 14:38:55 WEEKDAY: Monday 27-NOV-2006 14:38:55 TIME_STR: martes, 28 de noviembre de 2006 14:41:43 WEEKDAY: Tuesday 28-NOV-2006 14:41:43 TIME_STR: martes, 28 de noviembre de 2006 15:13:18 WEEKDAY: Tuesday 28-NOV-2006 15:13:18 TIME_STR: Martes, 28 de Noviembre de 2006 03:22:16 a.m. WEEKDAY: Tuesday 28-NOV-2006 03:22:16 TIME_STR: lunes, 27 de noviembre de 2006 15:33:39 WEEKDAY: Monday 27-NOV-2006 15:33:39 TIME_STR: martes, 28 de noviembre de 2006 15:31:47 WEEKDAY: Tuesday 28-NOV-2006 15:31:47 TIME_STR: lunes, 27 de noviembre de 2006 15:49:17 WEEKDAY: Monday 27-NOV-2006 15:49:17 Number of cases read: 16 Number of cases listed: 16 =================== APPENDIX: TEST DATA =================== DATA LIST FIXED / TIME_STR 01-45 (A). Data List will read 1 records from the command file Variable Rec Start End Format TIME_STR 1 1 45 A45 BEGIN DATA woensdag 29 november 2006 20:48:59 woensdag 29 november 2006 20:54:54 woensdag 29 november 2006 20:56:02 woensdag 29 november 2006 20:58:48 zondag 3 december 2006 11:51:14 zondag 3 december 2006 12:19:40 zondag 3 december 2006 12:31:26 lunes, 27 de noviembre de 2006 14:26:20 Martes, 28 de Noviembre de 2006 02:30:58 a.m. lunes, 27 de noviembre de 2006 14:38:55 martes, 28 de noviembre de 2006 14:41:43 martes, 28 de noviembre de 2006 15:13:18 Martes, 28 de Noviembre de 2006 03:22:16 a.m. lunes, 27 de noviembre de 2006 15:33:39 martes, 28 de noviembre de 2006 15:31:47 lunes, 27 de noviembre de 2006 15:49:17 END DATA. |
Free forum by Nabble | Edit this page |