Flag cases where a given string variable contains a given word

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flag cases where a given string variable contains a given word

Paul Mcgeoghan
Hi,

I am using the syntax 'Flag cases where a given string variable contains a given word'
downloaded from Raynald's SPSS Tools website.
http://www.spsstools.net/

Syntax is as follows:

* Flag cases where a given string variable contains a given word.

* The following syntax searches for the word 'spanish'.
* Ray 2002/08/16.

DATA LIST FIXED /str1 1-40(A).
BEGIN DATA
this is a Spanish word
only Italian here
SPANISH is good
le francais est aussi ok
END DATA.
LIST.

COMPUTE flag=INDEX(UPCASE(str1),'SPANISH')>0.
FILTER BY flag.
LIST.


How would one go about modifying this so that one can find 2 words (e.g. SPANISH, WELSH)?
Also, is it possible to amend the question to find an item in the string (e.g. SPANISH) so long as another item (e.g. WELSH) is not in the string?

Thanks,
Paul


==================
Paul McGeoghan,
Application support specialist (Statistics and Databases),
University Infrastructure Group (UIG),
Information Services,
Cardiff University.
Tel. 02920 (875035).
Reply | Threaded
Open this post in threaded view
|

Re: Flag cases where a given string variable contains a given word

Peck, Jon
You are starting down the path of regular expression processing of strings.  SPSS 14 does not have an easy way to do this, but with SPSS 15, the Bonus Pack, provided initially to early adopters but generally available later, contains a regular expression processor that makes this type of thing very easy with programmability.

For example,
begin program.
import spss, trans, extendedTransforms

tproc = trans.Tfunction()
tproc.append(extendedTransforms.search, 'spanishOrWelsh', 'a8',
    ["str1",  trans.const("spanish|welsh"), trans.const(True)])

tproc.execute()
end program.

This program computes a new variable, spanishOrWelsh, which contains the result of searching for spanish or welsh, "Spanish|welsh", ignoring case (from the True value above).

This is a very simple regular expression, but as the search conditions get more elaborate, the power of the regular expression approach becomes essential.

Regards,
Jon Peck


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Paul Mcgeoghan
Sent: Friday, October 13, 2006 10:28 AM
To: [hidden email]
Subject: [SPSSX-L] Flag cases where a given string variable contains a given word

Hi,

I am using the syntax 'Flag cases where a given string variable contains a given word'
downloaded from Raynald's SPSS Tools website.
http://www.spsstools.net/

Syntax is as follows:

* Flag cases where a given string variable contains a given word.

* The following syntax searches for the word 'spanish'.
* Ray 2002/08/16.

DATA LIST FIXED /str1 1-40(A).
BEGIN DATA
this is a Spanish word
only Italian here
SPANISH is good
le francais est aussi ok
END DATA.
LIST.

COMPUTE flag=INDEX(UPCASE(str1),'SPANISH')>0.
FILTER BY flag.
LIST.


How would one go about modifying this so that one can find 2 words (e.g. SPANISH, WELSH)?
Also, is it possible to amend the question to find an item in the string (e.g. SPANISH) so long as another item (e.g. WELSH) is not in the string?

Thanks,
Paul


==================
Paul McGeoghan,
Application support specialist (Statistics and Databases),
University Infrastructure Group (UIG),
Information Services,
Cardiff University.
Tel. 02920 (875035).
Reply | Threaded
Open this post in threaded view
|

Re: Flag cases where a given string variable contains a given word

Richard Ristow
In reply to this post by Paul Mcgeoghan
At 11:27 AM 10/13/2006, Paul Mcgeoghan wrote:

>Hi,
>
>I am using the syntax 'Flag cases where a given string variable
>contains a given word' downloaded from Raynald's SPSS Tools website
>http://www.spsstools.net/
>
>Syntax is as follows:
>
>* Flag cases where a given string variable contains a given word.
>
>* The following syntax searches for the word 'spanish'.
>* Ray 2002/08/16.
>
>DATA LIST FIXED /str1 1-40(A).
>BEGIN DATA
>this is a Spanish word
>only Italian here
>SPANISH is good
>le francais est aussi ok
>END DATA.
>LIST.
>
>COMPUTE flag=INDEX(UPCASE(str1),'SPANISH')>0.
>FILTER BY flag.
>LIST.
>
>
>How would one go about modifying this so that one can find 2 words
>(e.g. SPANISH, WELSH)? Also, is it possible to amend the question to
>find an item in the string (e.g. SPANISH) so long as another item
>(e.g. WELSH) is not in the string?

Well, Jon Peck talked about how to do this if you're an 'early adopter'
of SPSS release 15. But, maybe you're not. In which case, try these
(not tested):

*  "find strings containing 2 words (e.g. SPANISH, WELSH)?" .
*  (Syntax is, deliberately, less compact than that on      .
*  Ray's site.)                                             .

*  (This uses 'AND' logic: start by accepting a case,       .
*  reject it if it fails any criterion.)                    .
COMPUTE FLAG = 1.
IF     (INDEX(UPCASE(str1),'SPANISH') EQ 0)
               /* Reject if 'SPANISH' is absent  */ FLAG = 0.
IF     (INDEX(UPCASE(str1),'WELSH')   EQ 0)
               /* Reject if 'WELSH'   is absent  */ FLAG = 0.
FILTER BY flag.
LIST.

*  "find [a string containing an item] (e.g. SPANISH) so     .
*   long as another item (e.g. WELSH) is not in the string?" .
COMPUTE FLAG = 1.
IF     (INDEX(UPCASE(str1),'SPANISH') EQ 0)
               /* Reject if 'SPANISH' is absent  */ FLAG = 0.
IF     (INDEX(UPCASE(str1),'WELSH')   GT 0)
               /* Reject if 'WELSH'   is present */ FLAG = 0.
FILTER BY flag.
LIST.