|
Hello all,
I've been trying to chop up a string variable called ICD9_1 whose values look like this: 3051-005-67 4779-000-134 7999-050-259 8470-010-232 0780-000-7 61171- -167 *The number of digits (and, sometimes, blank spaces) before the second hyphen varies. * There are always three spaces between 1st and 2nd hyphen - if these don't have numbers, they are padded with blanks. * There are one to three digits after second hyphen. I want to set up a new variable that has all the digits before the second hyphen, so I went: string ICD9_1cd (A9). compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). and I get my variable but also an error message that says >Warning # 606 >The third argument to SUBSTR (the length) is missing or is otherwise >invalid. The argument must be a non-negative integer. The result has been >set to the null string. In fact, I get a whole slew of these warnings, one for each case in my file, up to the point where I am warned that the limit of mxwarnings in the data pass has been reached and I won't get any more. I've seen similar syntax posted here before using INDEX and RINDEX in the third argument in a substring expression. What's the problem here? Is there a better way to do this? Thanks in advance, Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. |
|
I wiped the tears from my eyes and came up with this (~sniff):
* sample data *. data list free /icd9 (a13). begin data '3051-005-67 ' '4779-000-134 ' '7999-050-259 ' '8470-010-232 ' '0780-000-7 ' '61171- -167' end data. string a (a5) b (a7) icd9_1 (a8). compute b = substr(icd9, index(icd9, '-') + 1). compute a = substr(icd9, 1, index(icd9, '-') - 1). compute icd9_1 = concat(a, '-', substr(b, 1, index(b, '-') - 1)). exe. Gary On 10/2/07, Tanya Temkin <[hidden email]> wrote: > Hello all, > > I've been trying to chop up a string variable called ICD9_1 whose values > look like this: > > 3051-005-67 > 4779-000-134 > 7999-050-259 > 8470-010-232 > 0780-000-7 > 61171- -167 > > *The number of digits (and, sometimes, blank spaces) before the second > hyphen varies. > * There are always three spaces between 1st and 2nd hyphen - if these > don't have numbers, they are padded with blanks. > * There are one to three digits after second hyphen. > > I want to set up a new variable that has all the digits before the second > hyphen, so I went: > string ICD9_1cd (A9). > compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). > > and I get my variable but also an error message that says > >Warning # 606 > >The third argument to SUBSTR (the length) is missing or is otherwise > >invalid. The argument must be a non-negative integer. The result has > been > >set to the null string. > > In fact, I get a whole slew of these warnings, one for each case in my > file, up to the point where I am warned that the limit of mxwarnings in > the data pass has been reached and I won't get any more. > > I've seen similar syntax posted here before using INDEX and RINDEX in the > third argument in a substring expression. What's the problem here? Is > there a better way to do this? > > Thanks in advance, > > Tanya Temkin > Research Associate > AACC Reporting > Northern California Regional Office > The Permanente Medical Group > (510) 625-6680 > > NOTICE TO RECIPIENT: If you are not the intended recipient of this > e-mail, you are prohibited from sharing, copying, or otherwise using or > disclosing its contents. If you have received this e-mail in error, > please notify the sender immediately by reply e-mail and permanently > delete this e-mail and any attachments without reading, forwarding or > saving them. Thank you. > |
|
oops, better make this:
> compute icd9_1 = concat(a, '-', substr(b, 1, index(b, '-') - 1)). this: compute icd9_1 = concat(rtrim(a), '-', rtrim(substr(b, 1, index(b, '-') - 1))). ;) On 10/2/07, Hal 9000 <[hidden email]> wrote: > I wiped the tears from my eyes and came up with this (~sniff): > > * sample data *. > data list free /icd9 (a13). > begin data > '3051-005-67 ' > '4779-000-134 ' > '7999-050-259 ' > '8470-010-232 ' > '0780-000-7 ' > '61171- -167' > end data. > > string a (a5) b (a7) icd9_1 (a8). > compute b = substr(icd9, index(icd9, '-') + 1). > compute a = substr(icd9, 1, index(icd9, '-') - 1). > compute icd9_1 = concat(a, '-', substr(b, 1, index(b, '-') - 1)). > exe. > > Gary > > > On 10/2/07, Tanya Temkin <[hidden email]> wrote: > > Hello all, > > > > I've been trying to chop up a string variable called ICD9_1 whose values > > look like this: > > > > 3051-005-67 > > 4779-000-134 > > 7999-050-259 > > 8470-010-232 > > 0780-000-7 > > 61171- -167 > > > > *The number of digits (and, sometimes, blank spaces) before the second > > hyphen varies. > > * There are always three spaces between 1st and 2nd hyphen - if these > > don't have numbers, they are padded with blanks. > > * There are one to three digits after second hyphen. > > > > I want to set up a new variable that has all the digits before the second > > hyphen, so I went: > > string ICD9_1cd (A9). > > compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). > > > > and I get my variable but also an error message that says > > >Warning # 606 > > >The third argument to SUBSTR (the length) is missing or is otherwise > > >invalid. The argument must be a non-negative integer. The result has > > been > > >set to the null string. > > > > In fact, I get a whole slew of these warnings, one for each case in my > > file, up to the point where I am warned that the limit of mxwarnings in > > the data pass has been reached and I won't get any more. > > > > I've seen similar syntax posted here before using INDEX and RINDEX in the > > third argument in a substring expression. What's the problem here? Is > > there a better way to do this? > > > > Thanks in advance, > > > > Tanya Temkin > > Research Associate > > AACC Reporting > > Northern California Regional Office > > The Permanente Medical Group > > (510) 625-6680 > > > > NOTICE TO RECIPIENT: If you are not the intended recipient of this > > e-mail, you are prohibited from sharing, copying, or otherwise using or > > disclosing its contents. If you have received this e-mail in error, > > please notify the sender immediately by reply e-mail and permanently > > delete this e-mail and any attachments without reading, forwarding or > > saving them. Thank you. > > > |
|
In reply to this post by Tanya Temkin
Hi Tanya
Your code is not doing what you intended because you are extracting everything from the first character (position 1) to the last character before the second dash. So you need to fix both parameters in the SUBSTR expression - the start position and the length. Start position will be: index(ICD9_1,"-")+1 Length is (in pseudo-code) : Last_Dash_Posn - First_Dash_Posn - 1 You've already got the Last_Dash_Posn worked out in your code sample below, so it's a straightforward matter to translate the above into real code by generalizing from that. Sometimes it is easier to test and debug complicated string computations by breaking it up into separate steps in which you calculate and store the components of the final expression so that it doesn't get too long and difficult to read. Regards Adrian Barnett -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Temkin Sent: Wednesday, 3 October 2007 8:44 AM To: [hidden email] Subject: breaking up (a string variable) is hard to do Hello all, I've been trying to chop up a string variable called ICD9_1 whose values look like this: 3051-005-67 4779-000-134 7999-050-259 8470-010-232 0780-000-7 61171- -167 *The number of digits (and, sometimes, blank spaces) before the second hyphen varies. * There are always three spaces between 1st and 2nd hyphen - if these don't have numbers, they are padded with blanks. * There are one to three digits after second hyphen. I want to set up a new variable that has all the digits before the second hyphen, so I went: string ICD9_1cd (A9). compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). and I get my variable but also an error message that says >Warning # 606 >The third argument to SUBSTR (the length) is missing or is otherwise >invalid. The argument must be a non-negative integer. The result has been >set to the null string. In fact, I get a whole slew of these warnings, one for each case in my file, up to the point where I am warned that the limit of mxwarnings in the data pass has been reached and I won't get any more. I've seen similar syntax posted here before using INDEX and RINDEX in the third argument in a substring expression. What's the problem here? Is there a better way to do this? Thanks in advance, Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. |
|
In reply to this post by Tanya Temkin
Hi
You get this Warning for cases without any hyphen within variable ICD9_1. -- Raynald Levesque www.spsstools.net On 10/2/07, Tanya Temkin <[hidden email]> wrote: > > Hello all, > > I've been trying to chop up a string variable called ICD9_1 whose values > look like this: > > 3051-005-67 > 4779-000-134 > 7999-050-259 > 8470-010-232 > 0780-000-7 > 61171- -167 > > *The number of digits (and, sometimes, blank spaces) before the second > hyphen varies. > * There are always three spaces between 1st and 2nd hyphen - if these > don't have numbers, they are padded with blanks. > * There are one to three digits after second hyphen. > > I want to set up a new variable that has all the digits before the second > hyphen, so I went: > string ICD9_1cd (A9). > compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). > > and I get my variable but also an error message that says > >Warning # 606 > >The third argument to SUBSTR (the length) is missing or is otherwise > >invalid. The argument must be a non-negative integer. The result has > been > >set to the null string. > > In fact, I get a whole slew of these warnings, one for each case in my > file, up to the point where I am warned that the limit of mxwarnings in > the data pass has been reached and I won't get any more. > > I've seen similar syntax posted here before using INDEX and RINDEX in the > third argument in a substring expression. What's the problem here? Is > there a better way to do this? > > Thanks in advance, > > Tanya Temkin > Research Associate > AACC Reporting > Northern California Regional Office > The Permanente Medical Group > (510) 625-6680 > > NOTICE TO RECIPIENT: If you are not the intended recipient of this > e-mail, you are prohibited from sharing, copying, or otherwise using or > disclosing its contents. If you have received this e-mail in error, > please notify the sender immediately by reply e-mail and permanently > delete this e-mail and any attachments without reading, forwarding or > saving them. Thank you. > |
|
In reply to this post by Tanya Temkin
At 07:13 PM 10/2/2007, Tanya Temkin wrote:
>I've been trying to chop up a string variable called ICD9_1 whose >values look like this [variable is A13]: |-----------------------------|---------------------------| |Output Created |02-OCT-2007 20:32:24 | |-----------------------------|---------------------------| [TestData] ICD9_1 3051-005-67 4779-000-134 7999-050-259 8470-010-232 0780-000-7 61171- -167 Number of cases read: 6 Number of cases listed: 6 >*The number of digits (and, sometimes, blank spaces) before the second >hyphen varies. >* There are always three spaces between 1st and 2nd hyphen - if these >don't have numbers, they are padded with blanks. >* There are one to three digits after second hyphen. > >I want a variable that has all the digits before the second hyphen, so >I went: >string ICD9_1cd (A9). >compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). > >and I get my variable but also an error message that says >>Warning # 606 >>The third argument to SUBSTR (the length) is missing or is otherwise >>invalid. The argument must be a non-negative integer. The result >>has been set to the null string. > >What's the problem here? Is there a better way to do this? Good question. I don't get the warning messages, I do get the results, and the SUBSTR arguments are valid. (I do the computation twice: once as you have it, one breaking out the third argument to SUBSTR into a separate variable.) I'm not sure what to say. This is SPSS 14 draft output: ---------------------------- * ... From the posting ... . string ICD9_1cd (A9). compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). NUMERIC SbstrLen (F3). STRING ICD9_1TS (A9). COMPUTE SbstrLen = rindex(ICD9_1,"-")-1. compute ICD9_1TS = substr(ICD9_1,1,SbstrLen). LIST. List |-----------------------------|---------------------------| |Output Created |02-OCT-2007 20:32:25 | |-----------------------------|---------------------------| [WorkSet] ICD9_1 ICD9_1cd SbstrLen ICD9_1TS 3051-005-67 3051-005 8 3051-005 4779-000-134 4779-000 8 4779-000 7999-050-259 7999-050 8 7999-050 8470-010-232 8470-010 8 8470-010 0780-000-7 0780-000 8 0780-000 61171- -167 61171- 9 61171- Number of cases read: 6 Number of cases listed: 6 =================== APPENDIX: Test data =================== DATA LIST FIXED/ ICD9_1 01-13(A). BEGIN DATA 3051-005-67 4779-000-134 7999-050-259 8470-010-232 0780-000-7 61171- -167 END DATA. DATASET NAME TestData WINDOW=FRONT. DATASET ACTIVATE TestData WINDOW=FRONT. LIST. |
|
In reply to this post by Barnett, Adrian (DECS)
Try this:
Create a variable called index as follows: Index = INDEX(oldvar,"-") Then create newvar as follows: Newvar = CONCAT(SUBSTR(oldvar,1,index-1),SUBSTR(oldvar,index+1,INDEX(SUBSTR(oldvar,in dex+1),"-")-1)) |
|
In reply to this post by Tanya Temkin
Hi Tanya
In my earlier reply, I assumed that what you wanted was the digits BETWEEN the hyphens. Having re-read your original post, I'm not so sure now. What I offered should get you that, but if instead, you want everything before the second hyphen, my suggestion definitely won't do it. As Richard says, your original syntax looks OK and shouldn't have generated warnings. If you were to follow his suggestion of separately computing the values of the position indicators and look at these to see where it thinks they are, it may give you some clues as to what is going on. Regards Adrian Barnett -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Temkin Sent: Wednesday, 3 October 2007 8:44 AM To: [hidden email] Subject: breaking up (a string variable) is hard to do Hello all, I've been trying to chop up a string variable called ICD9_1 whose values look like this: 3051-005-67 4779-000-134 7999-050-259 8470-010-232 0780-000-7 61171- -167 *The number of digits (and, sometimes, blank spaces) before the second hyphen varies. * There are always three spaces between 1st and 2nd hyphen - if these don't have numbers, they are padded with blanks. * There are one to three digits after second hyphen. I want to set up a new variable that has all the digits before the second hyphen, so I went: string ICD9_1cd (A9). compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). and I get my variable but also an error message that says >Warning # 606 >The third argument to SUBSTR (the length) is missing or is otherwise >invalid. The argument must be a non-negative integer. The result has been >set to the null string. In fact, I get a whole slew of these warnings, one for each case in my file, up to the point where I am warned that the limit of mxwarnings in the data pass has been reached and I won't get any more. I've seen similar syntax posted here before using INDEX and RINDEX in the third argument in a substring expression. What's the problem here? Is there a better way to do this? Thanks in advance, Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. |
|
In reply to this post by Tanya Temkin
POSTSRIPT: At 07:13 PM 10/2/2007, Tanya Temkin wrote:
>I went: >string ICD9_1cd (A9). >compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). > >and I get my variable but also an error message that says >>Warning # 606 >>The third argument to SUBSTR (the length) is missing or is otherwise >>invalid. The argument must be a non-negative integer. The result has >>been set to the null string. As I posted, your code seems OK for the test data you posted. But it will give this warning with other values. Notably, strings that include no hyphen ("-") will give a third argument -1 to SUBSTR; that includes strings that are blank altogether. Are there any of those in the data, that could be causing the problem? |
|
In reply to this post by Tanya Temkin
here's another approach.
data list fixed/ICD9_1 1-15 (a). begin data. 3051-005-67 4779-000-134 79991-050-259 8470-010-232 0780-000-7 61171- -167 end data. string ICD9_1cd (a10). compute ICD9_1cd = substr(ICD9_1 ,1,index(ICD9_1,"-")+3). exe. At 07:13 PM 10/2/2007, Tanya Temkin wrote: >Hello all, > >I've been trying to chop up a string variable called ICD9_1 whose values >look like this: > >3051-005-67 >4779-000-134 >7999-050-259 >8470-010-232 >0780-000-7 >61171- -167 > >*The number of digits (and, sometimes, blank spaces) before the second >hyphen varies. >* There are always three spaces between 1st and 2nd hyphen - if these >don't have numbers, they are padded with blanks. >* There are one to three digits after second hyphen. > >I want to set up a new variable that has all the digits before the second >hyphen, so I went: >string ICD9_1cd (A9). >compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). > >and I get my variable but also an error message that says > >Warning # 606 > >The third argument to SUBSTR (the length) is missing or is otherwise > >invalid. The argument must be a non-negative integer. The result has >been > >set to the null string. > >In fact, I get a whole slew of these warnings, one for each case in my >file, up to the point where I am warned that the limit of mxwarnings in >the data pass has been reached and I won't get any more. > >I've seen similar syntax posted here before using INDEX and RINDEX in the >third argument in a substring expression. What's the problem here? Is >there a better way to do this? > >Thanks in advance, > >Tanya Temkin >Research Associate >AACC Reporting >Northern California Regional Office >The Permanente Medical Group >(510) 625-6680 > >NOTICE TO RECIPIENT: If you are not the intended recipient of this >e-mail, you are prohibited from sharing, copying, or otherwise using or >disclosing its contents. If you have received this e-mail in error, >please notify the sender immediately by reply e-mail and permanently >delete this e-mail and any attachments without reading, forwarding or >saving them. Thank you. |
|
In reply to this post by Raynald Levesque-2
Thanks to all who responded with alternative solutions and workarounds.
Richard & Raynald nailed cause of the problem...there were indeed values with no hyphen at all, mostly missing values. So I just went string ICD9_1cd (A9). do if (ICD9_1 NE "" & ICD9_1 NE "VONLY"). compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). end if. exe. and it worked, no error messages. Still have to RTRIM that pesky hyphen that shows up on end of some values (the ones that had blanks between hyphens in original variable) but that is minor. Again, thanks for supporting me in my breakup ;) Tanya NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. "Raynald Levesque" <[hidden email]> 10/02/2007 05:05 PM To Tanya L TemKin/CA/KAIPERM@KAIPERM cc [hidden email] Subject Re: breaking up (a string variable) is hard to do Hi You get this Warning for cases without any hyphen within variable ICD9_1. -- Raynald Levesque www.spsstools.net On 10/2/07, Tanya Temkin <[hidden email]> wrote: Hello all, I've been trying to chop up a string variable called ICD9_1 whose values look like this: 3051-005-67 4779-000-134 7999-050-259 8470-010-232 0780-000-7 61171- -167 *The number of digits (and, sometimes, blank spaces) before the second hyphen varies. * There are always three spaces between 1st and 2nd hyphen - if these don't have numbers, they are padded with blanks. * There are one to three digits after second hyphen. I want to set up a new variable that has all the digits before the second hyphen, so I went: string ICD9_1cd (A9). compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). and I get my variable but also an error message that says >Warning # 606 >The third argument to SUBSTR (the length) is missing or is otherwise >invalid. The argument must be a non-negative integer. The result has been >set to the null string. In fact, I get a whole slew of these warnings, one for each case in my file, up to the point where I am warned that the limit of mxwarnings in the data pass has been reached and I won't get any more. I've seen similar syntax posted here before using INDEX and RINDEX in the third argument in a substring expression. What's the problem here? Is there a better way to do this? Thanks in advance, Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. |
|
At 01:17 PM 10/3/2007, Tanya Temkin wrote:
>Thanks to all who responded with alternative solutions and >workarounds. Richard & Raynald nailed cause of the problem...there >were indeed values with no hyphen at all, mostly missing values. Great! Glad it worked. >So I just went > >string ICD9_1cd (A9). >do if (ICD9_1 NE "" & ICD9_1 NE "VONLY"). >compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1). >end if. Good. (You can also calculate the index as a separate variable, as I did in one example; that lets you test specifically for no hyphen. You can use a scratch variable as the index variable, to avoid its cluttering your file.) >Still have to RTRIM that pesky hyphen that shows up on end of some >values It looks like you may know this, but RTRIM will work directly, no logic needed (following not tested): COMPUTE ICD9_1cd = RTRIM(ICD9_1cd,'-'). |
| Free forum by Nabble | Edit this page |
