I am trying to parse a CSV tile with a time and date variable that looks
like this: 2011-01-18 00:54:19.000 What format do I use in my GET DATA command to read this? Thanks in advance! JOHN John Fiedler [hidden email] Oreon Inc. 195 Wilderness Way Boise ID 83716-3383 www.oreon.net Please visit our evolving website! +1.208.344.3255 (v) +1.208.381.0944 (f) +1.208.344.9979 (h +1.208.863.3727 (m) No cellular service at or near place of business nor at home. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Read the variable in as a string. Not sure if other folks here would
agree but as far as I can tell best practice for reading in data is to read everything in initially as strings. You can always change a string to a number but you can't always change a number to a string. Make sense? Then you can extract info from the string variable. For example you could create an SPSS readable date variable. You can run the following (datevar is the new version of the variable, and datestr is the original string version of the variables): numeric datevar(adate10). compute datevar = date.mdy(number(char.substr(ltrim(rtrim(datestr)),6,2),f2.0), number(char.substr(ltrim(rtrim(datestr)), 9,2), f2.0), number(char.substr(ltrim(rtrim(datestr)),1,4), f4.0)). exe. This will create a variable that looks like 01/18/2011. That's the American Date format. Don't forget about the rule that you don't overwrite a variable, create a new revised version. Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John Fiedler Sent: Friday, February 11, 2011 7:47 AM To: [hidden email] Subject: Reading Date & Time from CSV File I am trying to parse a CSV tile with a time and date variable that looks like this: 2011-01-18 00:54:19.000 What format do I use in my GET DATA command to read this? Thanks in advance! JOHN John Fiedler [hidden email] Oreon Inc. 195 Wilderness Way Boise ID 83716-3383 www.oreon.net Please visit our evolving website! +1.208.344.3255 (v) +1.208.381.0944 (f) +1.208.344.9979 (h +1.208.863.3727 (m) No cellular service at or near place of business nor at home. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Matthew,
I must respectfully disagree with everything you have just said ;-) data list / dt (sdate) tm (time). begin data 2011-01-18 00:54:19.000 end data. COMPUTE datetime=dt+tm. FORMATS datetime (DATETIME). list. DT TM DATETIME 2011/01/18 0:54:01 18-JAN-2011 00:54:01 Number of cases read: 1 Number of cases listed: 1 BTW: If the data were formatted as follows DD-MM-YYYY HH:MM:SS you could read it as a DATETIME variable. data list / dttm (datetime) . begin data 18-01-2011 00:54:19 end data. list. DTTM 18-JAN-2011 00:54:19 Number of cases read: 2 Number of cases listed: 2
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by John Fiedler
At 10:46 AM 2/11/2011, John Fiedler wrote:
>I am trying to parse a CSV tile with a time and date variable that looks >like this: >2011-01-18 00:54:19.000 >What format do I use in my GET DATA command to read this? Matthew Pirritano's quite right that you have to read it as a text field and parse it. To read the file ID,Text,Timestamp 101,Posted date,2011-01-18 00:54:19.000 I used this code: GET DATA TYPE=TXT /FILE='C:\Documents and Settings\Richard\My Documents' + '\Technical\spssx-l\Z-2011\' + '2011-02-11 Fiedler-Reading Date & Time from CSV File' + '-' + 'TEST DATA edited.csv' /ARRANGEMENT=DELIMITED /DELIMITERS=',' /FIRSTCASE =2 /VARIABLES =ID F3 Text A12 TimeString A25. FORMATS ID (N3). DATASET NAME TestData WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |11-FEB-2011 13:16:36 | |-----------------------------|---------------------------| [TestData] ID Text TimeString 101 Posted date 2011-01-18 00:54:19.000 Number of cases read: 1 Number of cases listed: 1 Second, I parsed it like this -- it's easier than suggested. You can't read the whole string as a date-time value, but you can read the date and time in one step each. Like this (assuming there are no leading blanks in the input): NUMERIC Date (SDATE10) Time (TIME12.3) TimeStamp (DATETIME25.3). COMPUTE #BREAK = INDEX(TimeString,' '). COMPUTE Date = NUMBER(SUBSTR(TimeString,1,#BREAK),SDATE10). COMPUTE Time = NUMBER(SUBSTR(TimeString, #BREAK),TIME20.3). COMPUTE TimeStamp = Date + Time. LIST /VARIABLES=ID TimeString Date Time TimeStamp. List |-----------------------------|---------------------------| |Output Created |11-FEB-2011 13:16:36 | |-----------------------------|---------------------------| [TestData] ID TimeString Date Time TimeStamp 101 2011-01-18 00:54:19.000 2011/01/18 0:54:19.000 18-JAN-2011 00:54:19.000 Number of cases read: 1 Number of cases listed: 1 ============================== APPENDIX: Test data, then code ============================== ID,Text,Timestamp 101,Posted date,2011-01-18 00:54:19.000 ============================== * C:\Documents and Settings\Richard\My Documents . * \Technical\spssx-l\Z-2011\ . * 2011-02-11 Fiedler-Reading Date & Time from CSV File.SPS . * In response to posting . * Date: Fri, 11 Feb 2011 08:46:34 -0700 . * From: John Fiedler <[hidden email]> . * Subject: Reading Date & Time from CSV File . * To: [hidden email] . * "I am trying to parse a CSV tile with a time and date variable . * that looks like this: . * 2011-01-18 00:54:19.000 . * What format do I use in my GET DATA command to read this?" . * . * There'll be only a few SPSS statements, but they're complex . * enough -- notably, giving the full path for the test file -- . * that it's worth keeping the code in more than just a scratch . * file. . NEW FILE. GET DATA TYPE=TXT /FILE='C:\Documents and Settings\Richard\My Documents' + '\Technical\spssx-l\Z-2011\' + '2011-02-11 Fiedler-Reading Date & Time from CSV File' + '-' + 'TEST DATA edited.csv' /ARRANGEMENT=DELIMITED /DELIMITERS=',' /FIRSTCASE =2 /VARIABLES =ID F3 Text A12 TimeString A25. FORMATS ID (N3). DATASET NAME TestData WINDOW=FRONT. LIST. NUMERIC Date (SDATE10) Time (TIME12.3) TimeStamp (DATETIME25.3). COMPUTE #BREAK = INDEX(TimeString,' '). COMPUTE Date = NUMBER(SUBSTR(TimeString,1,#BREAK),SDATE10). COMPUTE Time = NUMBER(SUBSTR(TimeString, #BREAK),TIME20.3). COMPUTE TimeStamp = Date + Time. LIST /VARIABLES=ID TimeString Date Time TimeStamp. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
<quote author="Richard Ristow">
At 10:46 AM 2/11/2011, John Fiedler wrote: >I am trying to parse a CSV tile with a time and date variable that looks >like this: >2011-01-18 00:54:19.000 >What format do I use in my GET DATA command to read this? "Matthew Pirritano's quite right that you have to read it as a text field and parse it." You *DO NOT* have to it read as a text field ;-) . Read directly as two separate fields (a DATE and a TIME) then sum them. ----------- <SNIP> Why not just forget about GET DATA and just use good "old fashioned" DATA LIST. Read the date field as SDATE and the time field as TIME formats then sum them to get the DATETIME representation. Why bother with a STRING variable at all? In this case I read the time field as a scratch variable and add it to the date field, reformatting it later from SDATE to DATETIME. Personally I have NEVER been all that fond of GET DATA. data list FILE "C:\DATETIMETEST.txt" / dttm (sdate) #tm (time). COMPUTE dttm=dttm+#tm. FORMATS dttm (DATETIME). LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by Richard Ristow
"Matthew Pirritano's quite right that you have to read it as a text
field and parse it." You *DO NOT* have to it read as a text field ;-) . Read directly as two separate fields (a DATE and a TIME) then sum them. ----------- <SNIP> Why not just forget about GET DATA and just use good "old fashioned" DATA LIST. Read the date field as SDATE and the time field as TIME formats then sum them to get the DATETIME representation. Why bother with a STRING variable at all? In this case I read the time field as a scratch variable and add it to the date field, reformatting it later from SDATE to DATETIME. Personally I have NEVER been all that fond of GET DATA. data list FILE "C:\DATETIMETEST.txt" / dttm (sdate) #tm (time). COMPUTE dttm=dttm+#tm. FORMATS dttm (DATETIME). LIST. On Fri, Feb 11, 2011 at 1:36 PM, Richard Ristow [via SPSSX Discussion] <[hidden email]> wrote: > At 10:46 AM 2/11/2011, John Fiedler wrote: > >>I am trying to parse a CSV tile with a time and date variable that looks >>like this: >>2011-01-18 00:54:19.000 >>What format do I use in my GET DATA command to read this? > > Matthew Pirritano's quite right that you have to read it as a text > field and parse it. > > To read the file > >      ID,Text,Timestamp >      101,Posted date,2011-01-18 00:54:19.000 > > I used this code: > GET DATA TYPE=TXT >      /FILE='C:\Documents and Settings\Richard\My Documents'     + >          '\Technical\spssx-l\Z-2011\'              + >          '2011-02-11 Fiedler-Reading Date & Time from CSV File' + >          '-'                           + >          'TEST DATA edited.csv' >      /ARRANGEMENT=DELIMITED /DELIMITERS=',' >      /FIRSTCASE  =2 >      /VARIABLES  =ID      F3 >            Text     A12 >            TimeString  A25. > > FORMATS ID  (N3). > DATASET NAME TestData WINDOW=FRONT. > LIST. > List > |-----------------------------|---------------------------| > |Output Created        |11-FEB-2011 13:16:36    | > |-----------------------------|---------------------------| > [TestData] > >  ID Text     TimeString > > 101 Posted date  2011-01-18 00:54:19.000 > > Number of cases read:  1   Number of cases listed:  1 > > > Second, I parsed it like this -- it's easier than suggested. You > can't read the whole string as a date-time value, but you can read > the date and time in one step each. Like this (assuming there are no > leading blanks in the input): > > NUMERIC Date    (SDATE10) >      Time    (TIME12.3) >      TimeStamp (DATETIME25.3). > > COMPUTE #BREAK   = INDEX(TimeString,' '). > COMPUTE Date    = NUMBER(SUBSTR(TimeString,1,#BREAK),SDATE10). > COMPUTE Time    = NUMBER(SUBSTR(TimeString,  #BREAK),TIME20.3). > COMPUTE TimeStamp = Date + Time. > > LIST /VARIABLES=ID TimeString Date Time TimeStamp. > > List > |-----------------------------|---------------------------| > |Output Created        |11-FEB-2011 13:16:36    | > |-----------------------------|---------------------------| > [TestData] > >  ID > TimeString            Date     Time         TimeStamp > > 101 2011-01-18 00:54:19.000  2011/01/18  0:54:19.000  18-JAN-2011 > 00:54:19.000 > > Number of cases read:  1   Number of cases listed:  1 > > ============================== > APPENDIX: Test data, then code > ============================== > ID,Text,Timestamp > 101,Posted date,2011-01-18 00:54:19.000 > ============================== > *  C:\Documents and Settings\Richard\My Documents           . > *   \Technical\spssx-l\Z-2011\                    . > *   2011-02-11 Fiedler-Reading Date & Time from CSV File.SPS     . > > *  In response to posting                       . > *  Date:   Fri, 11 Feb 2011 08:46:34 -0700              . > *  From:   John Fiedler <[hidden email]>           . > *  Subject: Reading Date & Time from CSV File             . > *  To: [hidden email]                    . > > *  "I am trying to parse a CSV tile with a time and date variable   . > *  that looks like this:                       . > *  2011-01-18 00:54:19.000                      . > *  What format do I use in my GET DATA command to read this?"     . > *                                   . > *  There'll be only a few SPSS statements, but they're complex    . > *  enough -- notably, giving the full path for the test file --    . > *  that it's worth keeping the code in more than just a scratch    . > *  file.                               . > > NEW FILE. > GET DATA TYPE=TXT >      /FILE='C:\Documents and Settings\Richard\My Documents'     + >          '\Technical\spssx-l\Z-2011\'              + >          '2011-02-11 Fiedler-Reading Date & Time from CSV File' + >          '-'                           + >          'TEST DATA edited.csv' >      /ARRANGEMENT=DELIMITED /DELIMITERS=',' >      /FIRSTCASE  =2 >      /VARIABLES  =ID      F3 >            Text     A12 >            TimeString  A25. > > FORMATS ID  (N3). > DATASET NAME TestData WINDOW=FRONT. > > LIST. > > NUMERIC Date    (SDATE10) >      Time    (TIME12.3) >      TimeStamp (DATETIME25.3). > > COMPUTE #BREAK   = INDEX(TimeString,' '). > COMPUTE Date    = NUMBER(SUBSTR(TimeString,1,#BREAK),SDATE10). > COMPUTE Time    = NUMBER(SUBSTR(TimeString,  #BREAK),TIME20.3). > COMPUTE TimeStamp = Date + Time. > > LIST /VARIABLES=ID TimeString Date Time TimeStamp. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ________________________________ > If you reply to this email, your message will be added to the discussion > below: > http://spssx-discussion.1045642.n5.nabble.com/Reading-Date-Time-from-CSV-File-tp3381430p3381781.html > To unsubscribe from Reading Date & Time from CSV File, click here.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by David Marso
David, I can't get your method to work for a CSV file that contains lines like this:
101,Posted date,2011-01-18 00:54:19.000,more text out here If I allow both commas and blank spaces as field separators, the string "Posted date" gets split into two variables, and everything gets messed up. So it looks to me like your approach will work in a CSV file only if all blank spaces can be treated as field separators (in addition to commas). Or am I missing something? Bruce <quote author="David Marso">
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
GACK!!!! My bad!!! I missed the (vital) CSV part of the question.
I suspect that if either the string literals are quoted it would be fine. One reason I completely deplore free-formatted data in favor of fixed record fields. OTOH, if the DATETIME field were in DD-MM-YY HH:MM:SS it would probably work perfectly fine reading as a single DATETIME formatted variable. ----- Good catch Bruce! ... <quote author="Bruce Weaver"> David, I can't get your method to work for a CSV file that contains lines like this: 101,Posted date,2011-01-18 00:54:19.000,more text out here If I allow both commas and blank spaces as field separators, the string "Posted date" gets split into two variables, and everything gets messed up. So it looks to me like your approach will work in a CSV file only if all blank spaces can be treated as field separators (in addition to commas). Or am I missing something? Bruce
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
David is right to be queasy about this
format. It's not that it is free format. A good mantra here
is "don't guess". It would be very easy for this file to
be read incorrectly. There is no text delimiter such as " used,
so a comma in the text would split a field. Obviously spaces don't
work as separators either. If you read the file into Excel, the date/time
field displays as 54:19.0 although the field value contains everything
between the commas. (Excel inferred a format of mm:ss.0) CSV
format is not standardized. I would be careful to validate any data
read in this way.
Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: David Marso <[hidden email]> To: [hidden email] Date: 02/11/2011 12:43 PM Subject: Re: [SPSSX-L] Reading Date & Time from CSV File Sent by: "SPSSX(r) Discussion" <[hidden email]> GACK!!!! My bad!!! I missed the (vital) CSV part of the question. I suspect that if either the string literals are quoted it would be fine. One reason I completely deplore free-formatted data in favor of fixed record fields. OTOH, if the DATETIME field were in DD-MM-YY HH:MM:SS it would probably work perfectly fine reading as a single DATETIME formatted variable. ----- Good catch Bruce! ... David, I can't get your method to work for a CSV file that contains lines like this: 101,Posted date,2011-01-18 00:54:19.000,more text out here If I allow both commas and blank spaces as field separators, the string "Posted date" gets split into two variables, and everything gets messed up. So it looks to me like your approach will work in a CSV file only if all blank spaces can be treated as field separators (in addition to commas). Or am I missing something? Bruce David Marso wrote: > > > Richard Ristow wrote: >> >> At 10:46 AM 2/11/2011, John Fiedler wrote: >> >>>I am trying to parse a CSV tile with a time and date variable that looks >>>like this: >>>2011-01-18 00:54:19.000 >>>What format do I use in my GET DATA command to read this? >> >> "Matthew Pirritano's quite right that you have to read it as a text >> field and parse it." >> >> You *DO NOT* have to it read as a text field ;-) . >> Read directly as two separate fields (a DATE and a TIME) then sum them. >> ----------- >> <SNIP> >> Why not just forget about GET DATA and just use good "old fashioned" DATA >> LIST. >> Read the date field as SDATE and the time field as TIME formats then sum >> them to get the DATETIME representation. Why bother with a STRING >> variable at all? >> In this case I read the time field as a scratch variable and add it to >> the date field, >> reformatting it later from SDATE to DATETIME. >> Personally I have NEVER been all that fond of GET DATA. >> >> data list FILE "C:\DATETIMETEST.txt" / dttm (sdate) #tm (time). >> COMPUTE dttm=dttm+#tm. >> FORMATS dttm (DATETIME). >> LIST. >> >> >> > > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Reading-Date-Time-from-CSV-File-tp3381430p3381902.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks everyone! Being a cautious type, I read it as alpha and parsed the date and time NUMERIC date(ADATE10). COMPUTE date = DATE.MDY(NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)),6,2),f2.0),NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)), 9,2), f2.0), NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)),1,4), f4.0)). NUMERIC time(TIME12.3). COMPUTE time = TIME.HMS(NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)),12,2),F2.0),NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)), 15,2), f2.0), NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)),18,6), f6.3)). From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck David is right to be queasy about this format. It's not that it is free format. A good mantra here is "don't guess". It would be very easy for this file to be read incorrectly. There is no text delimiter such as " used, so a comma in the text would split a field. Obviously spaces don't work as separators either. If you read the file into Excel, the date/time field displays as 54:19.0 although the field value contains everything between the commas. (Excel inferred a format of mm:ss.0) CSV format is not standardized. I would be careful to validate any data read in this way.
|
At 04:01 PM 2/11/2011, John Fiedler wrote:
>Being a cautious type, I read it as alpha and parsed the date and time > >NUMERIC date(ADATE10). >COMPUTE date = >DATE.MDY(NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)),6,2),f2.0),NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)), >9,2), f2.0), NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)),1,4), f4.0)). >NUMERIC time(TIME12.3). >COMPUTE time = >TIME.HMS(NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)),12,2),F2.0),NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)), >15,2), f2.0), NUMBER(CHAR.SUBSTR(LTRIM(RTRIM(date_time)),18,6), f6.3)). Right. But remember that you can also convert the date and time with one NUMBER function each, which also saves having to count character positions. See my posting of Friday, 11 Feb. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
Hi
all,
I have
2 questions regarding string fields and i'm not too familar with how to tell
SPSS to look at a string field and pull information from it.
1.
I just recieved an excel file and imported into SPSS
and noticed that my dates came in as numbers versus dates. Upon further
inspection, i discovered that this was because if a date was unknown, 99 was
used as a placeholder in a MMDDYYYY field. For example, if the month was
unknown, the date was entered as 99/22/2011. If the day was unknown, the date
was 02/99/2011. Because these dates were not legitimate, the data were assumed
to be numeric.
After
discussion with colleagues, we decided to use the median DD or MM for these
unknown values. So our missing MM would be entered as 06 and our missing DD
would be 15 so that we can use these cases in our analysis.
I
converted the excel fields to text so i have the dates in SPSS in a string field
and have computed a date field called "newdate". i want to tell
SPSS:
a.
Look at oldDate. If it has no 99s in it, then the newdate =
olddate
b. If
MM in olddate=99, then newdate =06/oldMM/oldYYYY.
c. If
DD in olddate =99, then newdate = oldDD/15/oldYYYY.
2. In
the same file, I have a variable with a string of text that indicates whether a
location is urban or rural. I want to create a new numeric variable called
Location. i want to tell SPSS:
Look
at the string variable and if it has the word "urban" anywhere in the field, the
location = 1. If the word "rural" is anywhere in the field,
Location=2.
I
think the whole world would run more smoothly if everything were numeric.
Thanks
for any assistance.
Carol
|
Well, the second part is easy.
do if index(lower(stringvar), "urban")>0. compute numvar=1. else if index(lower(stringvar), "rural")>0. compute numvar=2. end if. Note that any values of stringvar that do not contain either "urban" or "rural" will be system-missing for numvar. If all values of stringvar contain either "urban" or "rural" this could be simplified to: compute numvar=(index(lower(stringvar), "rural")>0)+1. This time, all values that do not contain "rural" will be 1 for numvar, even if they don't contain "urban". From: "Parise, Carol A." <[hidden email]> To: [hidden email] Date: 02/25/2011 01:54 PM Subject: searching string fields Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi all, I have 2 questions regarding string fields and i'm not too familar with how to tell SPSS to look at a string field and pull information from it. 1. I just recieved an excel file and imported into SPSS and noticed that my dates came in as numbers versus dates. Upon further inspection, i discovered that this was because if a date was unknown, 99 was used as a placeholder in a MMDDYYYY field. For example, if the month was unknown, the date was entered as 99/22/2011. If the day was unknown, the date was 02/99/2011. Because these dates were not legitimate, the data were assumed to be numeric. After discussion with colleagues, we decided to use the median DD or MM for these unknown values. So our missing MM would be entered as 06 and our missing DD would be 15 so that we can use these cases in our analysis. I converted the excel fields to text so i have the dates in SPSS in a string field and have computed a date field called "newdate". i want to tell SPSS: a. Look at oldDate. If it has no 99s in it, then the newdate = olddate b. If MM in olddate=99, then newdate =06/oldMM/oldYYYY. c. If DD in olddate =99, then newdate = oldDD/15/oldYYYY. 2. In the same file, I have a variable with a string of text that indicates whether a location is urban or rural. I want to create a new numeric variable called Location. i want to tell SPSS: Look at the string variable and if it has the word "urban" anywhere in the field, the location = 1. If the word "rural" is anywhere in the field, Location=2. I think the whole world would run more smoothly if everything were numeric. Thanks for any assistance. Carol |
In reply to this post by parisec
Hi Carol,
Question 1. It looks as if you want to convert a string date
to a date format numeric. NewDate is A10 and structured as MM/DD/YYYY. Leading
zeros need not be included, i.e., '04/05/2006' = '4/5/2006'. So, it's
just a string to number (date) conversion.
compute
numdate=number(stringdate,adate10).
Question 2. I'm assuming that urban and rural can not appear
together in the field. Look at the index function, as
in
compute urban=0.
if (index(stringvar,'urban') ge 1)
urban=1.
compute rural=0.
if (index(stringvar,'rural') ge 1)
rural=1. Be aware that the above code should pick up 'suburban' or
'exurban' or 'urban-oasis'. So you may want to use ' urban ' as the test string.
Same for rural.
Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Parise, Carol A. Sent: Friday, February 25, 2011 2:54 PM To: [hidden email] Subject: searching string fields Hi
all,
I have
2 questions regarding string fields and i'm not too familar with how to tell
SPSS to look at a string field and pull information from it.
1.
I just recieved an excel file and imported into SPSS
and noticed that my dates came in as numbers versus dates. Upon further
inspection, i discovered that this was because if a date was unknown, 99 was
used as a placeholder in a MMDDYYYY field. For example, if the month was
unknown, the date was entered as 99/22/2011. If the day was unknown, the date
was 02/99/2011. Because these dates were not legitimate, the data were assumed
to be numeric.
After
discussion with colleagues, we decided to use the median DD or MM for these
unknown values. So our missing MM would be entered as 06 and our missing DD
would be 15 so that we can use these cases in our analysis.
I
converted the excel fields to text so i have the dates in SPSS in a string field
and have computed a date field called "newdate". i want to tell
SPSS:
a.
Look at oldDate. If it has no 99s in it, then the newdate =
olddate
b. If
MM in olddate=99, then newdate =06/oldMM/oldYYYY.
c. If
DD in olddate =99, then newdate = oldDD/15/oldYYYY.
2. In
the same file, I have a variable with a string of text that indicates whether a
location is urban or rural. I want to create a new numeric variable called
Location. i want to tell SPSS:
Look
at the string variable and if it has the word "urban" anywhere in the field, the
location = 1. If the word "rural" is anywhere in the field,
Location=2.
I
think the whole world would run more smoothly if everything were numeric.
Thanks
for any assistance.
Carol
|
In reply to this post by parisec
This could probably be shortened (and the exe & list commands
removed), but it works:
-- data list fixed/datevar 1-10 (a) locvar 12-17 (a). begin data 11/02/2009 urban 99/10/2008 rural 03/99/2007 urban end data. exe. list. numeric newdate (adate). if index(datevar,'99') = 0 newdate=number(datevar,adate). if index(datevar,'99') = 1 newdate=number(concat('06',substr(datevar,3,10)),adate). if index(datevar,'99') = 4 newdate=number(concat(substr(datevar,1,3),'15',substr(datevar,6,10)),adate). exe. list. numeric location (f1). if index(lower(locvar),'urban') > 0 location = 1. if index(lower(locvar),'rural') > 0 location = 2. exe. list. -- Mark “That which can be
asserted without evidence, can be dismissed without evidence.”
- Christopher Hitchens Mark Casazza Director of Academic Information The City University of New York 555 West 57th Street, Suite 1240 New York, NY 10019 Phone: 212.541.0396 Fax: 212.541.0392 email: [hidden email] On 2011-02-25 14:54, Parise, Carol A. wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rick Oliver-3
Thanks all,
This solution worked perfectly and will come in handy
for future projects until i can find a way to ban string
fields.
Carol
data list fixed/datevar 1-10 (a) locvar 12-17 (a).
begin data 11/02/2009 urban 99/10/2008 rural 03/99/2007 urban end data. exe. list. numeric newdate (adate). if index(datevar,'99') = 0 newdate=number(datevar,adate). if index(datevar,'99') = 1 newdate=number(concat('06',substr(datevar,3,10)),adate). if index(datevar,'99') = 4 newdate=number(concat(substr(datevar,1,3),'15',substr(datevar,6,10)),adate). exe. list. numeric location (f1). if index(lower(locvar),'urban') > 0 location = 1. if index(lower(locvar),'rural') > 0 location = 2. exe. list. -- From: "Parise, Carol A." <[hidden email]> To: [hidden email] Date: 02/25/2011 01:54 PM Subject: searching string fields Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi all, I have 2 questions regarding string fields and i'm not too familar with how to tell SPSS to look at a string field and pull information from it. 1. I just recieved an excel file and imported into SPSS and noticed that my dates came in as numbers versus dates. Upon further inspection, i discovered that this was because if a date was unknown, 99 was used as a placeholder in a MMDDYYYY field. For example, if the month was unknown, the date was entered as 99/22/2011. If the day was unknown, the date was 02/99/2011. Because these dates were not legitimate, the data were assumed to be numeric. After discussion with colleagues, we decided to use the median DD or MM for these unknown values. So our missing MM would be entered as 06 and our missing DD would be 15 so that we can use these cases in our analysis. I converted the excel fields to text so i have the dates in SPSS in a string field and have computed a date field called "newdate". i want to tell SPSS: a. Look at oldDate. If it has no 99s in it, then the newdate = olddate b. If MM in olddate=99, then newdate =06/oldMM/oldYYYY. c. If DD in olddate =99, then newdate = oldDD/15/oldYYYY. 2. In the same file, I have a variable with a string of text that indicates whether a location is urban or rural. I want to create a new numeric variable called Location. i want to tell SPSS: Look at the string variable and if it has the word "urban" anywhere in the field, the location = 1. If the word "rural" is anywhere in the field, Location=2. I think the whole world would run more smoothly if everything were numeric. Thanks for any assistance. Carol |
Administrator
|
OUCH!. Even a cursory glance reveals this code to be a time bomb waiting to happen!
It may be fine for dates after 1999, but is wrong for an entire decade. 1990, 1991... 1999.(your first IF will fail). I would at minimum suggest something like index(datevar,'99/') but if you ever get another format such as SDATE (YY-MM-DD) the second one will be trouble too. Gotta rush... HTH, David ----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Good to know! Fortunately, all of my dates ranged from 2004-2007 but will explore other options to make this work for pre 2000 since this may come up again.
Thanks ________________________________________ From: SPSSX(r) Discussion [[hidden email]] On Behalf Of David Marso [[hidden email]] Sent: Friday, February 25, 2011 4:47 PM To: [hidden email] Subject: Re: searching string fields OUCH!. Even a cursory glance reveals this code to be a time bomb waiting to happen! It may be fine for dates after 1999, but is wrong for an entire decade. 1990, 1991... 1999.(your first IF will fail). I would at minimum suggest something like index(datevar,'99/') but if you ever get another format such as SDATE (YY-MM-DD) the second one will be trouble too. Gotta rush... HTH, David ---- Parise, Carol A. wrote: > > Thanks all, > > This solution worked perfectly and will come in handy for future projects > until i can find a way to ban string fields. > > Carol > > data list fixed/datevar 1-10 (a) locvar 12-17 (a). > begin data > 11/02/2009 urban > 99/10/2008 rural > 03/99/2007 urban > end data. > exe. > list. > > numeric newdate (adate). > if index(datevar,'99') = 0 newdate=number(datevar,adate). > if index(datevar,'99') = 1 > newdate=number(concat('06',substr(datevar,3,10)),adate). > if index(datevar,'99') = 4 > newdate=number(concat(substr(datevar,1,3),'15',substr(datevar,6,10)),adate). > exe. > list. > > numeric location (f1). > if index(lower(locvar),'urban') > 0 location = 1. > if index(lower(locvar),'rural') > 0 location = 2. > exe. > list. > -- > > > > From: "Parise, Carol A." <[hidden email]> > To: [hidden email] > Date: 02/25/2011 01:54 PM > Subject: searching string fields > Sent by: "SPSSX(r) Discussion" <[hidden email]> > ________________________________ > > > > Hi all, > > I have 2 questions regarding string fields and i'm not too familar with > how to tell SPSS to look at a string field and pull information from it. > > 1. I just recieved an excel file and imported into SPSS and noticed that > my dates came in as numbers versus dates. Upon further inspection, i > discovered that this was because if a date was unknown, 99 was used as a > placeholder in a MMDDYYYY field. For example, if the month was unknown, > the date was entered as 99/22/2011. If the day was unknown, the date was > 02/99/2011. Because these dates were not legitimate, the data were assumed > to be numeric. > > After discussion with colleagues, we decided to use the median DD or MM > for these unknown values. So our missing MM would be entered as 06 and our > missing DD would be 15 so that we can use these cases in our analysis. > > I converted the excel fields to text so i have the dates in SPSS in a > string field and have computed a date field called "newdate". i want to > tell SPSS: > > a. Look at oldDate. If it has no 99s in it, then the newdate = olddate > > b. If MM in olddate=99, then newdate =06/oldMM/oldYYYY. > > c. If DD in olddate =99, then newdate = oldDD/15/oldYYYY. > > > 2. In the same file, I have a variable with a string of text that > indicates whether a location is urban or rural. I want to create a new > numeric variable called Location. i want to tell SPSS: > > Look at the string variable and if it has the word "urban" anywhere in the > field, the location = 1. If the word "rural" is anywhere in the field, > Location=2. > > > I think the whole world would run more smoothly if everything were > numeric. > > Thanks for any assistance. > Carol > > > > > > > > > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Reading-Date-Time-from-CSV-File-tp3381430p3401039.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
The "99/" amendment should work unless you get string "dates" as YY/MM/DD. OTOH, YMMV.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Rick Oliver-3
compute numvar = -1. do if index(lower(stringvar), "urban")>0. compute numvar=1. else if index(lower(stringvar), "rural")>0. compute numvar=2. end if. value labels numvar 1 'urban' 2 'rural' -1 'other'. missing values numvar (lo thru 0). You would know why the result was missing, ergo a User missing value. It would not be missing because SPSS software could not follow your directions. Art Kendall Social Research Consultants On 2/25/2011 3:15 PM, Rick Oliver wrote: Well, the second part is easy. |
Free forum by Nabble | Edit this page |