SPSSX Discussion

breaking up (a string variable) is hard to do

Classic

List

Threaded

12 messages Options

Tanya Temkin

breaking up (a string variable) is hard to do

Hello all,

I've been trying to chop up a string variable called ICD9_1 whose values
look like this:

3051-005-67
4779-000-134
7999-050-259
8470-010-232
0780-000-7
61171- -167

*The number of digits (and, sometimes, blank spaces) before the second
hyphen varies.
* There are always three spaces between 1st and 2nd hyphen - if these
don't have numbers, they are padded with blanks.
* There are one to three digits after second hyphen.

I want to set up a new variable that has all the digits before the second
hyphen, so I went:
string ICD9_1cd (A9).
compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).

and I get my variable but also an error message that says
>Warning # 606
>The third argument to SUBSTR (the length) is missing or is otherwise
>invalid. The argument must be a non-negative integer. The result has
been
>set to the null string.

In fact, I get a whole slew of these warnings, one for each case in my
file, up to the point where I am warned that the limit of mxwarnings in
the data pass has been reached and I won't get any more.

I've seen similar syntax posted here before using INDEX and RINDEX in the
third argument in a substring expression. What's the problem here? Is
there a better way to do this?

Thanks in advance,

Tanya Temkin
Research Associate
AACC Reporting
Northern California Regional Office
The Permanente Medical Group
(510) 625-6680

NOTICE TO RECIPIENT: If you are not the intended recipient of this
e-mail, you are prohibited from sharing, copying, or otherwise using or
disclosing its contents. If you have received this e-mail in error,
please notify the sender immediately by reply e-mail and permanently
delete this e-mail and any attachments without reading, forwarding or
saving them. Thank you.

Hal 9000

Re: breaking up (a string variable) is hard to do

I wiped the tears from my eyes and came up with this (~sniff):

* sample data *.
data list free /icd9 (a13).
begin data
'3051-005-67 '
'4779-000-134 '
'7999-050-259 '
'8470-010-232 '
'0780-000-7 '
'61171- -167'
end data.

string a (a5) b (a7) icd9_1 (a8).
compute b = substr(icd9, index(icd9, '-') + 1).
compute a = substr(icd9, 1, index(icd9, '-') - 1).
compute icd9_1 = concat(a, '-', substr(b, 1, index(b, '-') - 1)).
exe.

Gary

On 10/2/07, Tanya Temkin <[hidden email]> wrote:

> Hello all,
>
> I've been trying to chop up a string variable called ICD9_1 whose values
> look like this:
>
> 3051-005-67
> 4779-000-134
> 7999-050-259
> 8470-010-232
> 0780-000-7
> 61171- -167
>
> *The number of digits (and, sometimes, blank spaces) before the second
> hyphen varies.
> * There are always three spaces between 1st and 2nd hyphen - if these
> don't have numbers, they are padded with blanks.
> * There are one to three digits after second hyphen.
>
> I want to set up a new variable that has all the digits before the second
> hyphen, so I went:
> string ICD9_1cd (A9).
> compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).
>
> and I get my variable but also an error message that says
> >Warning # 606
> >The third argument to SUBSTR (the length) is missing or is otherwise
> >invalid. The argument must be a non-negative integer. The result has
> been
> >set to the null string.
>
> In fact, I get a whole slew of these warnings, one for each case in my
> file, up to the point where I am warned that the limit of mxwarnings in
> the data pass has been reached and I won't get any more.
>
> I've seen similar syntax posted here before using INDEX and RINDEX in the
> third argument in a substring expression. What's the problem here? Is
> there a better way to do this?
>
> Thanks in advance,
>
> Tanya Temkin
> Research Associate
> AACC Reporting
> Northern California Regional Office
> The Permanente Medical Group
> (510) 625-6680
>
> NOTICE TO RECIPIENT: If you are not the intended recipient of this
> e-mail, you are prohibited from sharing, copying, or otherwise using or
> disclosing its contents. If you have received this e-mail in error,
> please notify the sender immediately by reply e-mail and permanently
> delete this e-mail and any attachments without reading, forwarding or
> saving them. Thank you.
>

Hal 9000

Re: breaking up (a string variable) is hard to do

oops, better make this:

> compute icd9_1 = concat(a, '-', substr(b, 1, index(b, '-') - 1)).

this:

compute icd9_1 = concat(rtrim(a), '-', rtrim(substr(b, 1, index(b, '-') - 1))).

;)

On 10/2/07, Hal 9000 <[hidden email]> wrote:

> I wiped the tears from my eyes and came up with this (~sniff):
>
> * sample data *.
> data list free /icd9 (a13).
> begin data
> '3051-005-67 '
> '4779-000-134 '
> '7999-050-259 '
> '8470-010-232 '
> '0780-000-7 '
> '61171- -167'
> end data.
>
> string a (a5) b (a7) icd9_1 (a8).
> compute b = substr(icd9, index(icd9, '-') + 1).
> compute a = substr(icd9, 1, index(icd9, '-') - 1).
> compute icd9_1 = concat(a, '-', substr(b, 1, index(b, '-') - 1)).
> exe.
>
> Gary
>
>
> On 10/2/07, Tanya Temkin <[hidden email]> wrote:
> > Hello all,
> >
> > I've been trying to chop up a string variable called ICD9_1 whose values
> > look like this:
> >
> > 3051-005-67
> > 4779-000-134
> > 7999-050-259
> > 8470-010-232
> > 0780-000-7
> > 61171- -167
> >
> > *The number of digits (and, sometimes, blank spaces) before the second
> > hyphen varies.
> > * There are always three spaces between 1st and 2nd hyphen - if these
> > don't have numbers, they are padded with blanks.
> > * There are one to three digits after second hyphen.
> >
> > I want to set up a new variable that has all the digits before the second
> > hyphen, so I went:
> > string ICD9_1cd (A9).
> > compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).
> >
> > and I get my variable but also an error message that says
> > >Warning # 606
> > >The third argument to SUBSTR (the length) is missing or is otherwise
> > >invalid. The argument must be a non-negative integer. The result has
> > been
> > >set to the null string.
> >
> > In fact, I get a whole slew of these warnings, one for each case in my
> > file, up to the point where I am warned that the limit of mxwarnings in
> > the data pass has been reached and I won't get any more.
> >
> > I've seen similar syntax posted here before using INDEX and RINDEX in the
> > third argument in a substring expression. What's the problem here? Is
> > there a better way to do this?
> >
> > Thanks in advance,
> >
> > Tanya Temkin
> > Research Associate
> > AACC Reporting
> > Northern California Regional Office
> > The Permanente Medical Group
> > (510) 625-6680
> >
> > NOTICE TO RECIPIENT: If you are not the intended recipient of this
> > e-mail, you are prohibited from sharing, copying, or otherwise using or
> > disclosing its contents. If you have received this e-mail in error,
> > please notify the sender immediately by reply e-mail and permanently
> > delete this e-mail and any attachments without reading, forwarding or
> > saving them. Thank you.
> >
>

Barnett, Adrian (DECS)

Re: breaking up (a string variable) is hard to do

In reply to this post by Tanya Temkin

Hi Tanya
Your code is not doing what you intended because you are extracting
everything from the first character (position 1) to the last character
before the second dash.

So you need to fix both parameters in the SUBSTR expression - the start
position and the length.

Start position will be: index(ICD9_1,"-")+1
Length is (in pseudo-code) : Last_Dash_Posn - First_Dash_Posn - 1

You've already got the Last_Dash_Posn worked out in your code sample
below, so it's a straightforward matter to translate the above into real
code by generalizing from that.

Sometimes it is easier to test and debug complicated string computations
by breaking it up into separate steps in which you calculate and store
the components of the final expression so that it doesn't get too long
and difficult to read.

Regards

Adrian Barnett

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tanya Temkin
Sent: Wednesday, 3 October 2007 8:44 AM
To: [hidden email]
Subject: breaking up (a string variable) is hard to do

Hello all,

I've been trying to chop up a string variable called ICD9_1 whose values
look like this:

3051-005-67
4779-000-134
7999-050-259
8470-010-232
0780-000-7
61171- -167

*The number of digits (and, sometimes, blank spaces) before the second
hyphen varies.
* There are always three spaces between 1st and 2nd hyphen - if these
don't have numbers, they are padded with blanks.
* There are one to three digits after second hyphen.

I want to set up a new variable that has all the digits before the
second
hyphen, so I went:
string ICD9_1cd (A9).
compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).

and I get my variable but also an error message that says
>Warning # 606
>The third argument to SUBSTR (the length) is missing or is otherwise
>invalid. The argument must be a non-negative integer. The result has
been
>set to the null string.

In fact, I get a whole slew of these warnings, one for each case in my
file, up to the point where I am warned that the limit of mxwarnings in
the data pass has been reached and I won't get any more.

I've seen similar syntax posted here before using INDEX and RINDEX in
the
third argument in a substring expression. What's the problem here? Is
there a better way to do this?

Thanks in advance,

Tanya Temkin
Research Associate
AACC Reporting
Northern California Regional Office
The Permanente Medical Group
(510) 625-6680

NOTICE TO RECIPIENT: If you are not the intended recipient of this
e-mail, you are prohibited from sharing, copying, or otherwise using or
disclosing its contents. If you have received this e-mail in error,
please notify the sender immediately by reply e-mail and permanently
delete this e-mail and any attachments without reading, forwarding or
saving them. Thank you.

Raynald Levesque-2

Re: breaking up (a string variable) is hard to do

In reply to this post by Tanya Temkin

Hi

You get this Warning for cases without any hyphen within variable ICD9_1.

--
Raynald Levesque
www.spsstools.net

On 10/2/07, Tanya Temkin <[hidden email]> wrote:

>
> Hello all,
>
> I've been trying to chop up a string variable called ICD9_1 whose values
> look like this:
>
> 3051-005-67
> 4779-000-134
> 7999-050-259
> 8470-010-232
> 0780-000-7
> 61171- -167
>
> *The number of digits (and, sometimes, blank spaces) before the second
> hyphen varies.
> * There are always three spaces between 1st and 2nd hyphen - if these
> don't have numbers, they are padded with blanks.
> * There are one to three digits after second hyphen.
>
> I want to set up a new variable that has all the digits before the second
> hyphen, so I went:
> string ICD9_1cd (A9).
> compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).
>
> and I get my variable but also an error message that says
> >Warning # 606
> >The third argument to SUBSTR (the length) is missing or is otherwise
> >invalid. The argument must be a non-negative integer. The result has
> been
> >set to the null string.
>
> In fact, I get a whole slew of these warnings, one for each case in my
> file, up to the point where I am warned that the limit of mxwarnings in
> the data pass has been reached and I won't get any more.
>
> I've seen similar syntax posted here before using INDEX and RINDEX in the
> third argument in a substring expression. What's the problem here? Is
> there a better way to do this?
>
> Thanks in advance,
>
> Tanya Temkin
> Research Associate
> AACC Reporting
> Northern California Regional Office
> The Permanente Medical Group
> (510) 625-6680
>
> NOTICE TO RECIPIENT: If you are not the intended recipient of this
> e-mail, you are prohibited from sharing, copying, or otherwise using or
> disclosing its contents. If you have received this e-mail in error,
> please notify the sender immediately by reply e-mail and permanently
> delete this e-mail and any attachments without reading, forwarding or
> saving them. Thank you.
>

Richard Ristow

Re: breaking up (a string variable) is hard to do

In reply to this post by Tanya Temkin

At 07:13 PM 10/2/2007, Tanya Temkin wrote:

>I've been trying to chop up a string variable called ICD9_1 whose
>values look like this [variable is A13]:

|-----------------------------|---------------------------|
|Output Created |02-OCT-2007 20:32:24 |
|-----------------------------|---------------------------|
[TestData]

ICD9_1

3051-005-67
4779-000-134
7999-050-259
8470-010-232
0780-000-7
61171- -167

Number of cases read: 6 Number of cases listed: 6

>*The number of digits (and, sometimes, blank spaces) before the second
>hyphen varies.
>* There are always three spaces between 1st and 2nd hyphen - if these
>don't have numbers, they are padded with blanks.
>* There are one to three digits after second hyphen.
>
>I want a variable that has all the digits before the second hyphen, so
>I went:
>string ICD9_1cd (A9).
>compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).
>
>and I get my variable but also an error message that says
>>Warning # 606
>>The third argument to SUBSTR (the length) is missing or is otherwise
>>invalid. The argument must be a non-negative integer. The result
>>has been set to the null string.
>
>What's the problem here? Is there a better way to do this?

Good question. I don't get the warning messages, I do get the results,
and the SUBSTR arguments are valid. (I do the computation twice: once
as you have it, one breaking out the third argument to SUBSTR into a
separate variable.) I'm not sure what to say.

This is SPSS 14 draft output:
----------------------------
* ... From the posting ... .
string ICD9_1cd (A9).
compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).

NUMERIC SbstrLen (F3).
STRING ICD9_1TS (A9).
COMPUTE SbstrLen = rindex(ICD9_1,"-")-1.
compute ICD9_1TS = substr(ICD9_1,1,SbstrLen).

LIST.

List
|-----------------------------|---------------------------|
|Output Created |02-OCT-2007 20:32:25 |
|-----------------------------|---------------------------|
[WorkSet]

ICD9_1 ICD9_1cd SbstrLen ICD9_1TS

3051-005-67 3051-005 8 3051-005
4779-000-134 4779-000 8 4779-000
7999-050-259 7999-050 8 7999-050
8470-010-232 8470-010 8 8470-010
0780-000-7 0780-000 8 0780-000
61171- -167 61171- 9 61171-

Number of cases read: 6 Number of cases listed: 6

===================
APPENDIX: Test data
===================
DATA LIST FIXED/
ICD9_1 01-13(A).
BEGIN DATA
3051-005-67
4779-000-134
7999-050-259
8470-010-232
0780-000-7
61171- -167
END DATA.

DATASET NAME TestData WINDOW=FRONT.

DATASET ACTIVATE TestData WINDOW=FRONT.
LIST.

Barth Riley

Re: breaking up (a string variable) is hard to do

In reply to this post by Barnett, Adrian (DECS)

Try this:

Create a variable called index as follows:

Index = INDEX(oldvar,"-")

Then create newvar as follows:

Newvar =
CONCAT(SUBSTR(oldvar,1,index-1),SUBSTR(oldvar,index+1,INDEX(SUBSTR(oldvar,in
dex+1),"-")-1))

Barnett, Adrian (DECS)

Re: breaking up (a string variable) is hard to do

In reply to this post by Tanya Temkin

Hi Tanya
In my earlier reply, I assumed that what you wanted was the digits
BETWEEN the hyphens. Having re-read your original post, I'm not so sure
now.
What I offered should get you that, but if instead, you want everything
before the second hyphen, my suggestion definitely won't do it.

As Richard says, your original syntax looks OK and shouldn't have
generated warnings. If you were to follow his suggestion of separately
computing the values of the position indicators and look at these to see
where it thinks they are, it may give you some clues as to what is going
on.

Regards

Adrian Barnett

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tanya Temkin
Sent: Wednesday, 3 October 2007 8:44 AM
To: [hidden email]
Subject: breaking up (a string variable) is hard to do

Hello all,

I've been trying to chop up a string variable called ICD9_1 whose values
look like this:

3051-005-67
4779-000-134
7999-050-259
8470-010-232
0780-000-7
61171- -167

*The number of digits (and, sometimes, blank spaces) before the second
hyphen varies.
* There are always three spaces between 1st and 2nd hyphen - if these
don't have numbers, they are padded with blanks.
* There are one to three digits after second hyphen.

I want to set up a new variable that has all the digits before the
second
hyphen, so I went:
string ICD9_1cd (A9).
compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).

and I get my variable but also an error message that says
>Warning # 606
>The third argument to SUBSTR (the length) is missing or is otherwise
>invalid. The argument must be a non-negative integer. The result has
been
>set to the null string.

In fact, I get a whole slew of these warnings, one for each case in my
file, up to the point where I am warned that the limit of mxwarnings in
the data pass has been reached and I won't get any more.

I've seen similar syntax posted here before using INDEX and RINDEX in
the
third argument in a substring expression. What's the problem here? Is
there a better way to do this?

Thanks in advance,

Tanya Temkin
Research Associate
AACC Reporting
Northern California Regional Office
The Permanente Medical Group
(510) 625-6680

NOTICE TO RECIPIENT: If you are not the intended recipient of this
e-mail, you are prohibited from sharing, copying, or otherwise using or
disclosing its contents. If you have received this e-mail in error,
please notify the sender immediately by reply e-mail and permanently
delete this e-mail and any attachments without reading, forwarding or
saving them. Thank you.

Richard Ristow

Re: breaking up (a string variable) is hard to do

In reply to this post by Tanya Temkin

POSTSRIPT: At 07:13 PM 10/2/2007, Tanya Temkin wrote:

>I went:
>string ICD9_1cd (A9).
>compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).
>
>and I get my variable but also an error message that says
>>Warning # 606
>>The third argument to SUBSTR (the length) is missing or is otherwise
>>invalid. The argument must be a non-negative integer. The result has
>>been set to the null string.

As I posted, your code seems OK for the test data you posted. But it
will give this warning with other values. Notably, strings that include
no hyphen ("-") will give a third argument -1 to SUBSTR; that includes
strings that are blank altogether.

Are there any of those in the data, that could be causing the problem?

Florio Arguillas

Re: breaking up (a string variable) is hard to do

In reply to this post by Tanya Temkin

here's another approach.

data list fixed/ICD9_1 1-15 (a).
begin data.
3051-005-67
4779-000-134
79991-050-259
8470-010-232
0780-000-7
61171- -167
end data.

string ICD9_1cd (a10).
compute ICD9_1cd = substr(ICD9_1 ,1,index(ICD9_1,"-")+3).
exe.

At 07:13 PM 10/2/2007, Tanya Temkin wrote:

>Hello all,
>
>I've been trying to chop up a string variable called ICD9_1 whose values
>look like this:
>
>3051-005-67
>4779-000-134
>7999-050-259
>8470-010-232
>0780-000-7
>61171- -167
>
>*The number of digits (and, sometimes, blank spaces) before the second
>hyphen varies.
>* There are always three spaces between 1st and 2nd hyphen - if these
>don't have numbers, they are padded with blanks.
>* There are one to three digits after second hyphen.
>
>I want to set up a new variable that has all the digits before the second
>hyphen, so I went:
>string ICD9_1cd (A9).
>compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).
>
>and I get my variable but also an error message that says
> >Warning # 606
> >The third argument to SUBSTR (the length) is missing or is otherwise
> >invalid. The argument must be a non-negative integer. The result has
>been
> >set to the null string.
>
>In fact, I get a whole slew of these warnings, one for each case in my
>file, up to the point where I am warned that the limit of mxwarnings in
>the data pass has been reached and I won't get any more.
>
>I've seen similar syntax posted here before using INDEX and RINDEX in the
>third argument in a substring expression. What's the problem here? Is
>there a better way to do this?
>
>Thanks in advance,
>
>Tanya Temkin
>Research Associate
>AACC Reporting
>Northern California Regional Office
>The Permanente Medical Group
>(510) 625-6680
>
>NOTICE TO RECIPIENT: If you are not the intended recipient of this
>e-mail, you are prohibited from sharing, copying, or otherwise using or
>disclosing its contents. If you have received this e-mail in error,
>please notify the sender immediately by reply e-mail and permanently
>delete this e-mail and any attachments without reading, forwarding or
>saving them. Thank you.

Tanya Temkin

Re: breaking up (a string variable) is hard to do

In reply to this post by Raynald Levesque-2

Thanks to all who responded with alternative solutions and workarounds.
Richard & Raynald nailed cause of the problem...there were indeed values
with no hyphen at all, mostly missing values. So I just went

string ICD9_1cd (A9).
do if (ICD9_1 NE "" & ICD9_1 NE "VONLY").
compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).
end if.
exe.

and it worked, no error messages.
Still have to RTRIM that pesky hyphen that shows up on end of some values
(the ones that had blanks between hyphens in original variable) but that
is minor.

Again, thanks for supporting me in my breakup ;)

Tanya

NOTICE TO RECIPIENT: If you are not the intended recipient of this
e-mail, you are prohibited from sharing, copying, or otherwise using or
disclosing its contents. If you have received this e-mail in error,
please notify the sender immediately by reply e-mail and permanently
delete this e-mail and any attachments without reading, forwarding or
saving them. Thank you.

"Raynald Levesque" <[hidden email]>
10/02/2007 05:05 PM

To
Tanya L TemKin/CA/KAIPERM@KAIPERM
cc
[hidden email]
Subject
Re: breaking up (a string variable) is hard to do

Hi

You get this Warning for cases without any hyphen within variable ICD9_1.

--
Raynald Levesque
www.spsstools.net

On 10/2/07, Tanya Temkin <[hidden email]> wrote:
Hello all,

I've been trying to chop up a string variable called ICD9_1 whose values
look like this:

3051-005-67
4779-000-134
7999-050-259
8470-010-232
0780-000-7
61171- -167

*The number of digits (and, sometimes, blank spaces) before the second
hyphen varies.
* There are always three spaces between 1st and 2nd hyphen - if these
don't have numbers, they are padded with blanks.
* There are one to three digits after second hyphen.

I want to set up a new variable that has all the digits before the second
hyphen, so I went:
string ICD9_1cd (A9).
compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).

and I get my variable but also an error message that says
>Warning # 606
>The third argument to SUBSTR (the length) is missing or is otherwise
>invalid. The argument must be a non-negative integer. The result has
been
>set to the null string.

In fact, I get a whole slew of these warnings, one for each case in my
file, up to the point where I am warned that the limit of mxwarnings in
the data pass has been reached and I won't get any more.

I've seen similar syntax posted here before using INDEX and RINDEX in the
third argument in a substring expression. What's the problem here? Is
there a better way to do this?

Thanks in advance,

Tanya Temkin
Research Associate
AACC Reporting
Northern California Regional Office
The Permanente Medical Group
(510) 625-6680

NOTICE TO RECIPIENT: If you are not the intended recipient of this
e-mail, you are prohibited from sharing, copying, or otherwise using or
disclosing its contents. If you have received this e-mail in error,
please notify the sender immediately by reply e-mail and permanently
delete this e-mail and any attachments without reading, forwarding or
saving them. Thank you.

Richard Ristow

Re: breaking up (a string variable) is hard to do

At 01:17 PM 10/3/2007, Tanya Temkin wrote:

>Thanks to all who responded with alternative solutions and
>workarounds. Richard & Raynald nailed cause of the problem...there
>were indeed values with no hyphen at all, mostly missing values.

Great! Glad it worked.

>So I just went
>
>string ICD9_1cd (A9).
>do if (ICD9_1 NE "" & ICD9_1 NE "VONLY").
>compute ICD9_1cd=substr(ICD9_1,1,rindex(ICD9_1,"-")-1).
>end if.

Good. (You can also calculate the index as a separate variable, as I
did in one example; that lets you test specifically for no hyphen. You
can use a scratch variable as the index variable, to avoid its
cluttering your file.)

>Still have to RTRIM that pesky hyphen that shows up on end of some
>values

It looks like you may know this, but RTRIM will work directly, no logic
needed (following not tested):

COMPUTE ICD9_1cd = RTRIM(ICD9_1cd,'-').