extracting year from a string variable

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

extracting year from a string variable

Rodrigo Briceño
Dear co-listers.

 

Thanks for the answers to the sub-total procedures that I asked some days
ago. I’m working on it.

 

I have a more complicated question (from my point of view):

 

I have a variable who has dates in this format (note: the people who made
the DB didn’t follow a common procedure, the example I show has
day-month-year, but is possible to have also month-day-year and even
month/day/year)

12.7.2004 0:00:00      

10.12.2004 0:00:00      

27.1.2005 0:00:00      

21.4.2005 0:00:00      

19.11.2005 0:00:00      

21.3.2005 0:00:00      

5.5.2005 0:00:00        

 

Since the database has a lot of numbers (800,000) I need to figure out how
to extract the year from this variable. I tried to extract with substr and
specifying which fields to extract, but since there are one and two digits
for days or months, the year sometimes is extracted with just 3 digits).

The thing is that finally I want to calculate age although I know there are
many mistakes with typing.

 

What can I do?

 

___________________________
Rodrigo Briceño
Project Manager
www.sanigest.com
Costa Rica Cell: (506) 886-1177

Fax:(506) 232-0830
Office: 291-1200, ext.113
MSN: [hidden email]
SKYPE: rbriceno1087
____________________________

 
Reply | Threaded
Open this post in threaded view
|

Re: extracting year from a string variable

Marks, Jim
Try this:

*** sample data-- use your own.
DATA LIST FREE (',') /id (F8.0) funnydate (a20).
BEGIN DATA
1,12.7.2004 0:00:00, 2 ,2.1.2005 0:00:00, 3 ,12.27.2006 0:00:00,
4,12/26/2007 0:00:00, 5 ,1/1/2008 0:00:00, 6 ,1/11/2009 0:00:00
END DATA.

****** replace "funnydate" with your string variable name.

*** extract the substring and convert into a number.
Starting positions and delimers for year.
DO IF substr(funnydate,4,1)= '.' .
COMPUTE year = number(substr(funnydate,5,4),F8.0).
ELSE IF substr(funnydate,5,1)= '.' .
COMPUTE year = number(substr(funnydate,6,4),F8.0).
ELSE IF substr(funnydate,6,1)= '.' .
COMPUTE year = number(substr(funnydate,7,4),F8.0).

ELSE IF substr(funnydate,4,1)= '/' .
COMPUTE year = number(substr(funnydate,5,4),F8.0).
ELSE IF substr(funnydate,5,1)= '/' .
COMPUTE year = number(substr(funnydate,6,4),F8.0).
ELSE IF substr(funnydate,6,1)= '/' .
COMPUTE year = number(substr(funnydate,7,4),F8.0).

END IF.

*** to trigger data pass.
EXECUTE.

For this question, the order of month/ day doesn't matter-- all that matters the # of characters before the year. It is either 4,5, or 6:
        4 (1/7/ or 7/1/ or 1.5. or 5.1.)
        5 (25/1/ or 1/25/ or 15.9. or 9.15.)
        6 (12/15/ or 15/12 or 12.15.  or 15.12)

Extracting the month and day will be impossible with the given data-- is 3/9 the 3rd of Sept or is it the 9th or March?

Good luck
--jim


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rodrigo Briceno
Sent: Friday, July 06, 2007 10:51 AM
To: [hidden email]
Subject: extracting year from a string variable

Dear co-listers.



Thanks for the answers to the sub-total procedures that I asked some days ago. I'm working on it.



I have a more complicated question (from my point of view):



I have a variable who has dates in this format (note: the people who made the DB didn't follow a common procedure, the example I show has day-month-year, but is possible to have also month-day-year and even
month/day/year)

12.7.2004 0:00:00

10.12.2004 0:00:00

27.1.2005 0:00:00

21.4.2005 0:00:00

19.11.2005 0:00:00

21.3.2005 0:00:00

5.5.2005 0:00:00



Since the database has a lot of numbers (800,000) I need to figure out how to extract the year from this variable. I tried to extract with substr and specifying which fields to extract, but since there are one and two digits for days or months, the year sometimes is extracted with just 3 digits).

The thing is that finally I want to calculate age although I know there are many mistakes with typing.



What can I do?



___________________________
Rodrigo Briceño
Project Manager
www.sanigest.com
Costa Rica Cell: (506) 886-1177

Fax:(506) 232-0830
Office: 291-1200, ext.113
MSN: [hidden email]
SKYPE: rbriceno1087
____________________________
Reply | Threaded
Open this post in threaded view
|

Re: extracting year from a string variable

Fry, Jonathan B.
In reply to this post by Rodrigo Briceño
As long as all the dates are in day-month-year order, SPSS will accept any of the separators you mention.  Here is your example with a little variety added.

data list list/#date (date11) #time(time8).
compute d = #date+#time.
format d(datetime20).
begin data
12.7.2004 0:00:00
10.12.2004 0:00:00
27-1-2005 1:23:00
21.4.2005 0:00:00
19/11/2005 0:00:24
21.3.2005 0:00:00
5.5.2005 0:00:00
end data
list.

List produced

                   d

12-JUL-2004 00:00:00
10-DEC-2004 00:00:00
27-JAN-2005 01:23:00
21-APR-2005 00:00:00
19-NOV-2005 00:00:24
21-MAR-2005 00:00:00
05-MAY-2005 00:00:00


Number of cases read:  7    Number of cases listed:  7

Jonathan Fry
SPSS Inc

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rodrigo Briceno
Sent: Friday, July 06, 2007 10:51 AM
To: [hidden email]
Subject: extracting year from a string variable

Dear co-listers.



Thanks for the answers to the sub-total procedures that I asked some days
ago. I'm working on it.



I have a more complicated question (from my point of view):



I have a variable who has dates in this format (note: the people who made
the DB didn't follow a common procedure, the example I show has
day-month-year, but is possible to have also month-day-year and even
month/day/year)

12.7.2004 0:00:00

10.12.2004 0:00:00

27.1.2005 0:00:00

21.4.2005 0:00:00

19.11.2005 0:00:00

21.3.2005 0:00:00

5.5.2005 0:00:00



Since the database has a lot of numbers (800,000) I need to figure out how
to extract the year from this variable. I tried to extract with substr and
specifying which fields to extract, but since there are one and two digits
for days or months, the year sometimes is extracted with just 3 digits).

The thing is that finally I want to calculate age although I know there are
many mistakes with typing.



What can I do?



___________________________
Rodrigo Briceño
Project Manager
www.sanigest.com
Costa Rica Cell: (506) 886-1177

Fax:(506) 232-0830
Office: 291-1200, ext.113
MSN: [hidden email]
SKYPE: rbriceno1087
____________________________
Reply | Threaded
Open this post in threaded view
|

Re: extracting year from a string variable

Melissa Ives
In reply to this post by Rodrigo Briceño
Rodrigo,
You know, string fields can convert to date fields without any loss of data.
I just copied your text into a string field, changed the format to date dd/mm/yyyy and while the the years remained.
From there you can use the year extraction function to get your year.

XDATE.YEAR. XDATE.YEAR(datevalue). Numeric. Returns the year (as a four-digit integer) from a numeric value in date format, as created by the DATE.xxx functions or read by one of the DATE input formats.


Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marks, Jim
Sent: Friday, July 06, 2007 12:58 PM
To: [hidden email]
Subject: Re: [SPSSX-L] extracting year from a string variable

Try this:

*** sample data-- use your own.
DATA LIST FREE (',') /id (F8.0) funnydate (a20).
BEGIN DATA
1,12.7.2004 0:00:00, 2 ,2.1.2005 0:00:00, 3 ,12.27.2006 0:00:00,
4,12/26/2007 0:00:00, 5 ,1/1/2008 0:00:00, 6 ,1/11/2009 0:00:00 END DATA.

****** replace "funnydate" with your string variable name.

*** extract the substring and convert into a number.
Starting positions and delimers for year.
DO IF substr(funnydate,4,1)= '.' .
COMPUTE year = number(substr(funnydate,5,4),F8.0).
ELSE IF substr(funnydate,5,1)= '.' .
COMPUTE year = number(substr(funnydate,6,4),F8.0).
ELSE IF substr(funnydate,6,1)= '.' .
COMPUTE year = number(substr(funnydate,7,4),F8.0).

ELSE IF substr(funnydate,4,1)= '/' .
COMPUTE year = number(substr(funnydate,5,4),F8.0).
ELSE IF substr(funnydate,5,1)= '/' .
COMPUTE year = number(substr(funnydate,6,4),F8.0).
ELSE IF substr(funnydate,6,1)= '/' .
COMPUTE year = number(substr(funnydate,7,4),F8.0).

END IF.

*** to trigger data pass.
EXECUTE.

For this question, the order of month/ day doesn't matter-- all that matters the # of characters before the year. It is either 4,5, or 6:
        4 (1/7/ or 7/1/ or 1.5. or 5.1.)
        5 (25/1/ or 1/25/ or 15.9. or 9.15.)
        6 (12/15/ or 15/12 or 12.15.  or 15.12)

Extracting the month and day will be impossible with the given data-- is 3/9 the 3rd of Sept or is it the 9th or March?

Good luck
--jim


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rodrigo Briceno
Sent: Friday, July 06, 2007 10:51 AM
To: [hidden email]
Subject: extracting year from a string variable

Dear co-listers.



Thanks for the answers to the sub-total procedures that I asked some days ago. I'm working on it.



I have a more complicated question (from my point of view):



I have a variable who has dates in this format (note: the people who made the DB didn't follow a common procedure, the example I show has day-month-year, but is possible to have also month-day-year and even
month/day/year)

12.7.2004 0:00:00

10.12.2004 0:00:00

27.1.2005 0:00:00

21.4.2005 0:00:00

19.11.2005 0:00:00

21.3.2005 0:00:00

5.5.2005 0:00:00



Since the database has a lot of numbers (800,000) I need to figure out how to extract the year from this variable. I tried to extract with substr and specifying which fields to extract, but since there are one and two digits for days or months, the year sometimes is extracted with just 3 digits).

The thing is that finally I want to calculate age although I know there are many mistakes with typing.



What can I do?



___________________________
Rodrigo Briceño
Project Manager
www.sanigest.com
Costa Rica Cell: (506) 886-1177

Fax:(506) 232-0830
Office: 291-1200, ext.113
MSN: [hidden email]
SKYPE: rbriceno1087
____________________________


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: extracting year from a string variable

Richard Ristow
In reply to this post by Rodrigo Briceño
At 11:50 AM 7/6/2007, Rodrigo Briceno wrote:

>I have a variable who has dates in this format (note: the people who
>made the DB didn't follow a common procedure, the example I show has
>day-month-year, but is possible to have also month-day-year and even
>month/day/year)

As Jim Marks wrote,

>Extracting the month and day will be [ambiguous] with the given data--
>is 3/9 the 3rd of Sept or is it the 9th or March?

As we used to say, there's more than one way to skin a cat. The
following does what Melissa suggests: it converts your input string
first into an SPSS date, dropping the time portion; but does it in
syntax, rather than from the data editor.  The date, as such, may be
useful.  (The logic here assumes mm/dd/yyyy) Therefore,

>Dashes, periods, commas, slashes, or blanks can be used as delimiters
>in the input values. For example, with the DATE format, the following
>input forms are all acceptable:
>28-OCT-90 28/10/1990 28.OCT.90 28 October, 1990  (SPSS 15 Command
>Syntax Reference p.46)

except that the parse below disallows blanks. (A fancier parse could
allow them.) It will accept 2-digit years, and 4-digit years in the
first position.

(Jim's syntax could support all delimiters, by changing

DO IF substr(funnydate,4,1)= '.' .
    to -- not tested --
DO IF ANY(substr(funnydate,4,1),'-', '.', '/', ' ').

The '@' variables, below, could be scratch variables, names beginning
with '#', so as not to clutter the output file.

>The thing is that finally I want to calculate age although I know
>there are many mistakes with typing.

Those, unfortunately, can't be corrected by syntax.

Finally, here we go. Again, this is SPSS 15 draft output (WRR-not saved
separately):

*** sample data-- use your own.
DATA LIST FREE (',') /id (F8.0) funnydate (a20).
BEGIN DATA
1,12.7.2004 0:00:00, 2 ,2.1.2005 0:00:00, 3 ,12.27.2006 0:00:00,
4,12/26/2007 0:00:00, 5 ,1/1/2008 0:00:00, 6 ,1/11/2009 0:00:00
END DATA.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |06-JUL-2007 18:18:26       |
|-----------------------------|---------------------------|
       id funnydate

        1 12.7.2004 0:00:00
        2 2.1.2005 0:00:00
        3 12.27.2006 0:00:00
        4 12/26/2007 0:00:00
        5 1/1/2008 0:00:00
        6 1/11/2009 0:00:00

Number of cases read:  6    Number of cases listed:  6


FORMATS ID       (N02).

STRING  @Buffer  (A18).
NUMERIC @Break   (F04).
STRING  @DT_Strn
         @TM_Strn (A10).
NUMERIC @DateVal (DATE11).

NUMERIC YEAR     (F06).

COMPUTE @Buffer  = LTRIM(funnydate).
COMPUTE @Break   = INDEX(@Buffer,' ').
COMPUTE @DT_Strn = SUBSTR(@Buffer,1,@Break).
COMPUTE @TM_Strn = LTRIM(SUBSTR(@Buffer,@Break)).
COMPUTE @DateVal = NUMBER(@DT_Strn,ADATE10).
IF MISSING(@DateVal)
         @DateVal = NUMBER(@DT_Strn,EDATE10).
IF MISSING(@DateVal)
         @DateVal = NUMBER(@DT_Strn,SDATE10).

COMPUTE YEAR     = XDATE.YEAR(@DateVal).

TEMPORARY.
STRING  SPACE    (A18)
        /CONVERT  (A02).
FORMATS YEAR     (F12).
LIST VARIABLES =
      ID TO @Break SPACE
      CONVERT
      @DT_Strn TO  YEAR.
List
|-----------------------------|---------------------------|
|Output Created               |06-JUL-2007 18:18:27       |
|-----------------------------|---------------------------|
The variables are listed in the following order:

LINE   1: id funnydate @Buffer @Break SPACE
LINE   2: CONVERT @DT_Strn @TM_Strn @DateVal YEAR

           id: 01 12.7.2004 0:00:00    12.7.2004 0:00:00    10
      CONVERT:    12.7.2004  0:00:00    07-DEC-2004         2004

           id: 02 2.1.2005 0:00:00     2.1.2005 0:00:00      9
      CONVERT:    2.1.2005   0:00:00    01-FEB-2005         2005

           id: 03 12.27.2006 0:00:00   12.27.2006 0:00:00   11
      CONVERT:    12.27.2006 0:00:00    27-DEC-2006         2006

           id: 04 12/26/2007 0:00:00   12/26/2007 0:00:00   11
      CONVERT:    12/26/2007 0:00:00    26-DEC-2007         2007

           id: 05 1/1/2008 0:00:00     1/1/2008 0:00:00      9
      CONVERT:    1/1/2008   0:00:00    01-JAN-2008         2008

           id: 06 1/11/2009 0:00:00    1/11/2009 0:00:00    10
      CONVERT:    1/11/2009  0:00:00    11-JAN-2009         2009

Number of cases read:  6    Number of cases listed:  6