|
Dear co-listers.
Thanks for the answers to the sub-total procedures that I asked some days ago. Im working on it. I have a more complicated question (from my point of view): I have a variable who has dates in this format (note: the people who made the DB didnt follow a common procedure, the example I show has day-month-year, but is possible to have also month-day-year and even month/day/year) 12.7.2004 0:00:00 10.12.2004 0:00:00 27.1.2005 0:00:00 21.4.2005 0:00:00 19.11.2005 0:00:00 21.3.2005 0:00:00 5.5.2005 0:00:00 Since the database has a lot of numbers (800,000) I need to figure out how to extract the year from this variable. I tried to extract with substr and specifying which fields to extract, but since there are one and two digits for days or months, the year sometimes is extracted with just 3 digits). The thing is that finally I want to calculate age although I know there are many mistakes with typing. What can I do? ___________________________ Rodrigo Briceño Project Manager www.sanigest.com Costa Rica Cell: (506) 886-1177 Fax:(506) 232-0830 Office: 291-1200, ext.113 MSN: [hidden email] SKYPE: rbriceno1087 ____________________________ |
|
Try this:
*** sample data-- use your own. DATA LIST FREE (',') /id (F8.0) funnydate (a20). BEGIN DATA 1,12.7.2004 0:00:00, 2 ,2.1.2005 0:00:00, 3 ,12.27.2006 0:00:00, 4,12/26/2007 0:00:00, 5 ,1/1/2008 0:00:00, 6 ,1/11/2009 0:00:00 END DATA. ****** replace "funnydate" with your string variable name. *** extract the substring and convert into a number. Starting positions and delimers for year. DO IF substr(funnydate,4,1)= '.' . COMPUTE year = number(substr(funnydate,5,4),F8.0). ELSE IF substr(funnydate,5,1)= '.' . COMPUTE year = number(substr(funnydate,6,4),F8.0). ELSE IF substr(funnydate,6,1)= '.' . COMPUTE year = number(substr(funnydate,7,4),F8.0). ELSE IF substr(funnydate,4,1)= '/' . COMPUTE year = number(substr(funnydate,5,4),F8.0). ELSE IF substr(funnydate,5,1)= '/' . COMPUTE year = number(substr(funnydate,6,4),F8.0). ELSE IF substr(funnydate,6,1)= '/' . COMPUTE year = number(substr(funnydate,7,4),F8.0). END IF. *** to trigger data pass. EXECUTE. For this question, the order of month/ day doesn't matter-- all that matters the # of characters before the year. It is either 4,5, or 6: 4 (1/7/ or 7/1/ or 1.5. or 5.1.) 5 (25/1/ or 1/25/ or 15.9. or 9.15.) 6 (12/15/ or 15/12 or 12.15. or 15.12) Extracting the month and day will be impossible with the given data-- is 3/9 the 3rd of Sept or is it the 9th or March? Good luck --jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rodrigo Briceno Sent: Friday, July 06, 2007 10:51 AM To: [hidden email] Subject: extracting year from a string variable Dear co-listers. Thanks for the answers to the sub-total procedures that I asked some days ago. I'm working on it. I have a more complicated question (from my point of view): I have a variable who has dates in this format (note: the people who made the DB didn't follow a common procedure, the example I show has day-month-year, but is possible to have also month-day-year and even month/day/year) 12.7.2004 0:00:00 10.12.2004 0:00:00 27.1.2005 0:00:00 21.4.2005 0:00:00 19.11.2005 0:00:00 21.3.2005 0:00:00 5.5.2005 0:00:00 Since the database has a lot of numbers (800,000) I need to figure out how to extract the year from this variable. I tried to extract with substr and specifying which fields to extract, but since there are one and two digits for days or months, the year sometimes is extracted with just 3 digits). The thing is that finally I want to calculate age although I know there are many mistakes with typing. What can I do? ___________________________ Rodrigo Briceño Project Manager www.sanigest.com Costa Rica Cell: (506) 886-1177 Fax:(506) 232-0830 Office: 291-1200, ext.113 MSN: [hidden email] SKYPE: rbriceno1087 ____________________________ |
|
In reply to this post by Rodrigo Briceño
As long as all the dates are in day-month-year order, SPSS will accept any of the separators you mention. Here is your example with a little variety added.
data list list/#date (date11) #time(time8). compute d = #date+#time. format d(datetime20). begin data 12.7.2004 0:00:00 10.12.2004 0:00:00 27-1-2005 1:23:00 21.4.2005 0:00:00 19/11/2005 0:00:24 21.3.2005 0:00:00 5.5.2005 0:00:00 end data list. List produced d 12-JUL-2004 00:00:00 10-DEC-2004 00:00:00 27-JAN-2005 01:23:00 21-APR-2005 00:00:00 19-NOV-2005 00:00:24 21-MAR-2005 00:00:00 05-MAY-2005 00:00:00 Number of cases read: 7 Number of cases listed: 7 Jonathan Fry SPSS Inc -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rodrigo Briceno Sent: Friday, July 06, 2007 10:51 AM To: [hidden email] Subject: extracting year from a string variable Dear co-listers. Thanks for the answers to the sub-total procedures that I asked some days ago. I'm working on it. I have a more complicated question (from my point of view): I have a variable who has dates in this format (note: the people who made the DB didn't follow a common procedure, the example I show has day-month-year, but is possible to have also month-day-year and even month/day/year) 12.7.2004 0:00:00 10.12.2004 0:00:00 27.1.2005 0:00:00 21.4.2005 0:00:00 19.11.2005 0:00:00 21.3.2005 0:00:00 5.5.2005 0:00:00 Since the database has a lot of numbers (800,000) I need to figure out how to extract the year from this variable. I tried to extract with substr and specifying which fields to extract, but since there are one and two digits for days or months, the year sometimes is extracted with just 3 digits). The thing is that finally I want to calculate age although I know there are many mistakes with typing. What can I do? ___________________________ Rodrigo Briceño Project Manager www.sanigest.com Costa Rica Cell: (506) 886-1177 Fax:(506) 232-0830 Office: 291-1200, ext.113 MSN: [hidden email] SKYPE: rbriceno1087 ____________________________ |
|
In reply to this post by Rodrigo Briceño
Rodrigo,
You know, string fields can convert to date fields without any loss of data. I just copied your text into a string field, changed the format to date dd/mm/yyyy and while the the years remained. From there you can use the year extraction function to get your year. XDATE.YEAR. XDATE.YEAR(datevalue). Numeric. Returns the year (as a four-digit integer) from a numeric value in date format, as created by the DATE.xxx functions or read by one of the DATE input formats. Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marks, Jim Sent: Friday, July 06, 2007 12:58 PM To: [hidden email] Subject: Re: [SPSSX-L] extracting year from a string variable Try this: *** sample data-- use your own. DATA LIST FREE (',') /id (F8.0) funnydate (a20). BEGIN DATA 1,12.7.2004 0:00:00, 2 ,2.1.2005 0:00:00, 3 ,12.27.2006 0:00:00, 4,12/26/2007 0:00:00, 5 ,1/1/2008 0:00:00, 6 ,1/11/2009 0:00:00 END DATA. ****** replace "funnydate" with your string variable name. *** extract the substring and convert into a number. Starting positions and delimers for year. DO IF substr(funnydate,4,1)= '.' . COMPUTE year = number(substr(funnydate,5,4),F8.0). ELSE IF substr(funnydate,5,1)= '.' . COMPUTE year = number(substr(funnydate,6,4),F8.0). ELSE IF substr(funnydate,6,1)= '.' . COMPUTE year = number(substr(funnydate,7,4),F8.0). ELSE IF substr(funnydate,4,1)= '/' . COMPUTE year = number(substr(funnydate,5,4),F8.0). ELSE IF substr(funnydate,5,1)= '/' . COMPUTE year = number(substr(funnydate,6,4),F8.0). ELSE IF substr(funnydate,6,1)= '/' . COMPUTE year = number(substr(funnydate,7,4),F8.0). END IF. *** to trigger data pass. EXECUTE. For this question, the order of month/ day doesn't matter-- all that matters the # of characters before the year. It is either 4,5, or 6: 4 (1/7/ or 7/1/ or 1.5. or 5.1.) 5 (25/1/ or 1/25/ or 15.9. or 9.15.) 6 (12/15/ or 15/12 or 12.15. or 15.12) Extracting the month and day will be impossible with the given data-- is 3/9 the 3rd of Sept or is it the 9th or March? Good luck --jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rodrigo Briceno Sent: Friday, July 06, 2007 10:51 AM To: [hidden email] Subject: extracting year from a string variable Dear co-listers. Thanks for the answers to the sub-total procedures that I asked some days ago. I'm working on it. I have a more complicated question (from my point of view): I have a variable who has dates in this format (note: the people who made the DB didn't follow a common procedure, the example I show has day-month-year, but is possible to have also month-day-year and even month/day/year) 12.7.2004 0:00:00 10.12.2004 0:00:00 27.1.2005 0:00:00 21.4.2005 0:00:00 19.11.2005 0:00:00 21.3.2005 0:00:00 5.5.2005 0:00:00 Since the database has a lot of numbers (800,000) I need to figure out how to extract the year from this variable. I tried to extract with substr and specifying which fields to extract, but since there are one and two digits for days or months, the year sometimes is extracted with just 3 digits). The thing is that finally I want to calculate age although I know there are many mistakes with typing. What can I do? ___________________________ Rodrigo Briceño Project Manager www.sanigest.com Costa Rica Cell: (506) 886-1177 Fax:(506) 232-0830 Office: 291-1200, ext.113 MSN: [hidden email] SKYPE: rbriceno1087 ____________________________ PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
|
In reply to this post by Rodrigo Briceño
At 11:50 AM 7/6/2007, Rodrigo Briceno wrote:
>I have a variable who has dates in this format (note: the people who >made the DB didn't follow a common procedure, the example I show has >day-month-year, but is possible to have also month-day-year and even >month/day/year) As Jim Marks wrote, >Extracting the month and day will be [ambiguous] with the given data-- >is 3/9 the 3rd of Sept or is it the 9th or March? As we used to say, there's more than one way to skin a cat. The following does what Melissa suggests: it converts your input string first into an SPSS date, dropping the time portion; but does it in syntax, rather than from the data editor. The date, as such, may be useful. (The logic here assumes mm/dd/yyyy) Therefore, >Dashes, periods, commas, slashes, or blanks can be used as delimiters >in the input values. For example, with the DATE format, the following >input forms are all acceptable: >28-OCT-90 28/10/1990 28.OCT.90 28 October, 1990 (SPSS 15 Command >Syntax Reference p.46) except that the parse below disallows blanks. (A fancier parse could allow them.) It will accept 2-digit years, and 4-digit years in the first position. (Jim's syntax could support all delimiters, by changing DO IF substr(funnydate,4,1)= '.' . to -- not tested -- DO IF ANY(substr(funnydate,4,1),'-', '.', '/', ' '). The '@' variables, below, could be scratch variables, names beginning with '#', so as not to clutter the output file. >The thing is that finally I want to calculate age although I know >there are many mistakes with typing. Those, unfortunately, can't be corrected by syntax. Finally, here we go. Again, this is SPSS 15 draft output (WRR-not saved separately): *** sample data-- use your own. DATA LIST FREE (',') /id (F8.0) funnydate (a20). BEGIN DATA 1,12.7.2004 0:00:00, 2 ,2.1.2005 0:00:00, 3 ,12.27.2006 0:00:00, 4,12/26/2007 0:00:00, 5 ,1/1/2008 0:00:00, 6 ,1/11/2009 0:00:00 END DATA. LIST. List |-----------------------------|---------------------------| |Output Created |06-JUL-2007 18:18:26 | |-----------------------------|---------------------------| id funnydate 1 12.7.2004 0:00:00 2 2.1.2005 0:00:00 3 12.27.2006 0:00:00 4 12/26/2007 0:00:00 5 1/1/2008 0:00:00 6 1/11/2009 0:00:00 Number of cases read: 6 Number of cases listed: 6 FORMATS ID (N02). STRING @Buffer (A18). NUMERIC @Break (F04). STRING @DT_Strn @TM_Strn (A10). NUMERIC @DateVal (DATE11). NUMERIC YEAR (F06). COMPUTE @Buffer = LTRIM(funnydate). COMPUTE @Break = INDEX(@Buffer,' '). COMPUTE @DT_Strn = SUBSTR(@Buffer,1,@Break). COMPUTE @TM_Strn = LTRIM(SUBSTR(@Buffer,@Break)). COMPUTE @DateVal = NUMBER(@DT_Strn,ADATE10). IF MISSING(@DateVal) @DateVal = NUMBER(@DT_Strn,EDATE10). IF MISSING(@DateVal) @DateVal = NUMBER(@DT_Strn,SDATE10). COMPUTE YEAR = XDATE.YEAR(@DateVal). TEMPORARY. STRING SPACE (A18) /CONVERT (A02). FORMATS YEAR (F12). LIST VARIABLES = ID TO @Break SPACE CONVERT @DT_Strn TO YEAR. List |-----------------------------|---------------------------| |Output Created |06-JUL-2007 18:18:27 | |-----------------------------|---------------------------| The variables are listed in the following order: LINE 1: id funnydate @Buffer @Break SPACE LINE 2: CONVERT @DT_Strn @TM_Strn @DateVal YEAR id: 01 12.7.2004 0:00:00 12.7.2004 0:00:00 10 CONVERT: 12.7.2004 0:00:00 07-DEC-2004 2004 id: 02 2.1.2005 0:00:00 2.1.2005 0:00:00 9 CONVERT: 2.1.2005 0:00:00 01-FEB-2005 2005 id: 03 12.27.2006 0:00:00 12.27.2006 0:00:00 11 CONVERT: 12.27.2006 0:00:00 27-DEC-2006 2006 id: 04 12/26/2007 0:00:00 12/26/2007 0:00:00 11 CONVERT: 12/26/2007 0:00:00 26-DEC-2007 2007 id: 05 1/1/2008 0:00:00 1/1/2008 0:00:00 9 CONVERT: 1/1/2008 0:00:00 01-JAN-2008 2008 id: 06 1/11/2009 0:00:00 1/11/2009 0:00:00 10 CONVERT: 1/11/2009 0:00:00 11-JAN-2009 2009 Number of cases read: 6 Number of cases listed: 6 |
| Free forum by Nabble | Edit this page |
