Hi to all,
I need advice on how to deal with identifying "duplicate" values when strings are very close but not identical. A good wildcard function would be helpful but don't know of any. This is a mockup of my data (much simplified) that logs calls regarding patient complaints: Pt_No calldate SnP Complaint 1 12/07/04 TSR SPRAINS-JOINT INJURY & BROKEN BONES 1 12/07/04 RN SPRAINS/JOINT INJURY/BROKEN BONES 1 12/12/04 TSR COUGH/COLD 2 3/05/05 TSR HEAD INJURY 2 3/05/05 RN HEAD INJURY/TRAUMA 2 3/12/05 TSR DIAGNOSTIC TEST:RESULTS 2 3/12/05 RN DIAGNOSTIC TEST: RESULTS 2 9/01/05 TSR CAST PROBLEMS 3 7/28/04 TSR DIZZY/VERTIGO/FAINTING 3 7/28/04 RN DIZZINESS/VERTIGO/SYNCOPE Sometimes a teleservice representative (TSR) handles the call on her own; often she transfers it to a registered nurse (RN). The patient may talk to the TSR and RN about the same problem. However, as you can see, the text strings the TSRs and RNs use to document the same patient complaint may differ - extra spaces, slightly different wording or punctuation.... So much for my plans to use the LAG function to flag TSR and RN handling of the same complaint on the same call! I've looked at past postings on use of INDEX and SCAN but I'm not sure that's what I need here... my SAS-using colleagues tell me that SAS has a LIKE function that can identify similar words or phrases in different string values. That sounds like what I need. Is there anything like this in SPSS? Oh yes, I'm using V 14. Only base system. Thanks in advance. Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 TIE 8-428-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. |
SPSS does not have a built-in wildcard function, but using programmability (optional and free with SPSS Base), there is a powerful regular expression facility available that can be used for this purpose. You have to figure out what patterns to look for, but the re language is very expressive.
In SPSS 14, you have to create a new text file and merge it back to your data. With SPSS 15 you can do this directly. -Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Temkin Sent: Thursday, April 26, 2007 2:52 PM To: [hidden email] Subject: [SPSSX-L] anything like LIKE in SPSS? Hi to all, I need advice on how to deal with identifying "duplicate" values when strings are very close but not identical. A good wildcard function would be helpful but don't know of any. This is a mockup of my data (much simplified) that logs calls regarding patient complaints: Pt_No calldate SnP Complaint 1 12/07/04 TSR SPRAINS-JOINT INJURY & BROKEN BONES 1 12/07/04 RN SPRAINS/JOINT INJURY/BROKEN BONES 1 12/12/04 TSR COUGH/COLD 2 3/05/05 TSR HEAD INJURY 2 3/05/05 RN HEAD INJURY/TRAUMA 2 3/12/05 TSR DIAGNOSTIC TEST:RESULTS 2 3/12/05 RN DIAGNOSTIC TEST: RESULTS 2 9/01/05 TSR CAST PROBLEMS 3 7/28/04 TSR DIZZY/VERTIGO/FAINTING 3 7/28/04 RN DIZZINESS/VERTIGO/SYNCOPE Sometimes a teleservice representative (TSR) handles the call on her own; often she transfers it to a registered nurse (RN). The patient may talk to the TSR and RN about the same problem. However, as you can see, the text strings the TSRs and RNs use to document the same patient complaint may differ - extra spaces, slightly different wording or punctuation.... So much for my plans to use the LAG function to flag TSR and RN handling of the same complaint on the same call! I've looked at past postings on use of INDEX and SCAN but I'm not sure that's what I need here... my SAS-using colleagues tell me that SAS has a LIKE function that can identify similar words or phrases in different string values. That sounds like what I need. Is there anything like this in SPSS? Oh yes, I'm using V 14. Only base system. Thanks in advance. Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 TIE 8-428-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. |
In reply to this post by Tanya Temkin
Tanya, if both the RN and the TSR have a unique set of complaint descriptions, I would pick one and set up the other to match it.
Example of assigning values of TSR to RN: if (Complaint = "SPRAINS-JOINT INJURY & BROKEN BONES") Complaint = "SPRAINS/JOINT INJURY/BROKEN BONES". I would not usually do this by hand unless I only have a few descriptions. What I would do is put the matching columns in an Excel spreadsheet and write the syntax arount them. Then I copy it into the SPSS syntax. If you use Excel, you will have to play around with it a bit to get it right. Good luck! meljr
|
Hello,
There is also a LIKE function available in SQL. Do you have a database program (e.g., Access) available? Best, John meljr wrote: > Tanya, if both the RN and the TSR have a unique set of complaint > descriptions, I would pick one and set up the other to match it. > Example of assigning values of TSR to RN: > if (Complaint = "SPRAINS-JOINT INJURY & BROKEN BONES") Complaint = > "SPRAINS/JOINT INJURY/BROKEN BONES". > > I would not usually do this by hand unless I only have a few descriptions. > What I would do is put the matching columns in an Excel spreadsheet and > write the syntax arount them. Then I copy it into the SPSS syntax. If you > use Excel, you will have to play around with it a bit to get it right. > > Good luck! > meljr > > > Tanya Temkin wrote: > >> Hi to all, >> >> I need advice on how to deal with identifying "duplicate" values when >> strings are very close but not identical. A good wildcard function would >> be helpful but don't know of any. >> >> This is a mockup of my data (much simplified) that logs calls regarding >> patient complaints: >> >> Pt_No calldate SnP Complaint >> 1 12/07/04 TSR SPRAINS-JOINT INJURY & BROKEN BONES >> 1 12/07/04 RN SPRAINS/JOINT INJURY/BROKEN BONES >> 1 12/12/04 TSR COUGH/COLD >> 2 3/05/05 TSR HEAD INJURY >> 2 3/05/05 RN HEAD INJURY/TRAUMA >> 2 3/12/05 TSR DIAGNOSTIC TEST:RESULTS >> 2 3/12/05 RN DIAGNOSTIC TEST: RESULTS >> 2 9/01/05 TSR CAST PROBLEMS >> 3 7/28/04 TSR DIZZY/VERTIGO/FAINTING >> 3 7/28/04 RN DIZZINESS/VERTIGO/SYNCOPE >> >> Sometimes a teleservice representative (TSR) handles the call on her own; >> often she transfers it to a registered nurse (RN). The patient may talk to >> the TSR and RN about the same problem. However, as you can see, the text >> strings the TSRs and RNs use to document the same patient complaint may >> differ - extra spaces, slightly different wording or punctuation.... >> >> So much for my plans to use the LAG function to flag TSR and RN handling >> of the same complaint on the same call! >> >> I've looked at past postings on use of INDEX and SCAN but I'm not sure >> that's what I need here... my SAS-using colleagues tell me that SAS has a >> LIKE function that can identify similar words or phrases in different >> string values. That sounds like what I need. Is there anything like this >> in SPSS? >> >> Oh yes, I'm using V 14. Only base system. >> >> >> Thanks in advance. >> >> Tanya Temkin >> Research Associate >> AACC Reporting >> Northern California Regional Office >> The Permanente Medical Group >> (510) 625-6680 >> TIE 8-428-6680 >> >> NOTICE TO RECIPIENT: If you are not the intended recipient of this >> e-mail, you are prohibited from sharing, copying, or otherwise using or >> disclosing its contents. If you have received this e-mail in error, >> please notify the sender immediately by reply e-mail and permanently >> delete this e-mail and any attachments without reading, forwarding or >> saving them. Thank you. >> >> >> > > -- > View this message in context: http://www.nabble.com/anything-like-LIKE-in-SPSS--tf3653842.html#a10208083 > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > > |
In reply to this post by Tanya Temkin
Look into the Index function--you can select based on a subset of items--eg
index(Complaint,"SPRAIN") or Index(Compleint,"DIZZ") would get what you want. ________________________________ From: SPSSX(r) Discussion on behalf of Tanya Temkin Sent: Thu 4/26/2007 2:51 PM To: [hidden email] Subject: [SPSSX-L] anything like LIKE in SPSS? Hi to all, I need advice on how to deal with identifying "duplicate" values when strings are very close but not identical. A good wildcard function would be helpful but don't know of any. This is a mockup of my data (much simplified) that logs calls regarding patient complaints: Pt_No calldate SnP Complaint 1 12/07/04 TSR SPRAINS-JOINT INJURY & BROKEN BONES 1 12/07/04 RN SPRAINS/JOINT INJURY/BROKEN BONES 1 12/12/04 TSR COUGH/COLD 2 3/05/05 TSR HEAD INJURY 2 3/05/05 RN HEAD INJURY/TRAUMA 2 3/12/05 TSR DIAGNOSTIC TEST:RESULTS 2 3/12/05 RN DIAGNOSTIC TEST: RESULTS 2 9/01/05 TSR CAST PROBLEMS 3 7/28/04 TSR DIZZY/VERTIGO/FAINTING 3 7/28/04 RN DIZZINESS/VERTIGO/SYNCOPE Sometimes a teleservice representative (TSR) handles the call on her own; often she transfers it to a registered nurse (RN). The patient may talk to the TSR and RN about the same problem. However, as you can see, the text strings the TSRs and RNs use to document the same patient complaint may differ - extra spaces, slightly different wording or punctuation.... So much for my plans to use the LAG function to flag TSR and RN handling of the same complaint on the same call! I've looked at past postings on use of INDEX and SCAN but I'm not sure that's what I need here... my SAS-using colleagues tell me that SAS has a LIKE function that can identify similar words or phrases in different string values. That sounds like what I need. Is there anything like this in SPSS? Oh yes, I'm using V 14. Only base system. Thanks in advance. Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 TIE 8-428-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
In reply to this post by Peck, Jon
Is there a programmability module for addresses?
--jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Thursday, April 26, 2007 2:58 PM To: [hidden email] Subject: Re: anything like in SPSS? SPSS does not have a built-in wildcard function, but using programmability (optional and free with SPSS Base), there is a powerful regular expression facility available that can be used for this purpose. You have to figure out what patterns to look for, but the re language is very expressive. In SPSS 14, you have to create a new text file and merge it back to your data. With SPSS 15 you can do this directly. -Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Temkin Sent: Thursday, April 26, 2007 2:52 PM To: [hidden email] Subject: [SPSSX-L] anything like LIKE in SPSS? Hi to all, I need advice on how to deal with identifying "duplicate" values when strings are very close but not identical. A good wildcard function would be helpful but don't know of any. This is a mockup of my data (much simplified) that logs calls regarding patient complaints: Pt_No calldate SnP Complaint 1 12/07/04 TSR SPRAINS-JOINT INJURY & BROKEN BONES 1 12/07/04 RN SPRAINS/JOINT INJURY/BROKEN BONES 1 12/12/04 TSR COUGH/COLD 2 3/05/05 TSR HEAD INJURY 2 3/05/05 RN HEAD INJURY/TRAUMA 2 3/12/05 TSR DIAGNOSTIC TEST:RESULTS 2 3/12/05 RN DIAGNOSTIC TEST: RESULTS 2 9/01/05 TSR CAST PROBLEMS 3 7/28/04 TSR DIZZY/VERTIGO/FAINTING 3 7/28/04 RN DIZZINESS/VERTIGO/SYNCOPE Sometimes a teleservice representative (TSR) handles the call on her own; often she transfers it to a registered nurse (RN). The patient may talk to the TSR and RN about the same problem. However, as you can see, the text strings the TSRs and RNs use to document the same patient complaint may differ - extra spaces, slightly different wording or punctuation.... So much for my plans to use the LAG function to flag TSR and RN handling of the same complaint on the same call! I've looked at past postings on use of INDEX and SCAN but I'm not sure that's what I need here... my SAS-using colleagues tell me that SAS has a LIKE function that can identify similar words or phrases in different string values. That sounds like what I need. Is there anything like this in SPSS? Oh yes, I'm using V 14. Only base system. Thanks in advance. Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 TIE 8-428-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. |
There is no module already set up for this, but we put an example in the 4th edition of the Data Management book (pdf downloadable from http://www.spss.com/spss/data_management_book.htm)
in which regular expressions are used to parse out parts of addresses that are not rigorously structured. It is in Chapter 21. Using these techniques, you can do a lot with a little bit of code. How much work (code) you would have to do depends on how robust a result you want and how well controlled the input is, so some experimentation would be a good idea. HTH, Jon Peck -----Original Message----- From: Marks, Jim [mailto:[hidden email]] Sent: Thursday, April 26, 2007 11:21 PM To: Peck, Jon; [hidden email] Subject: RE: Re: anything like in SPSS? Is there a programmability module for addresses? --jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Thursday, April 26, 2007 2:58 PM To: [hidden email] Subject: Re: anything like in SPSS? SPSS does not have a built-in wildcard function, but using programmability (optional and free with SPSS Base), there is a powerful regular expression facility available that can be used for this purpose. You have to figure out what patterns to look for, but the re language is very expressive. In SPSS 14, you have to create a new text file and merge it back to your data. With SPSS 15 you can do this directly. -Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Temkin Sent: Thursday, April 26, 2007 2:52 PM To: [hidden email] Subject: [SPSSX-L] anything like LIKE in SPSS? Hi to all, I need advice on how to deal with identifying "duplicate" values when strings are very close but not identical. A good wildcard function would be helpful but don't know of any. This is a mockup of my data (much simplified) that logs calls regarding patient complaints: Pt_No calldate SnP Complaint 1 12/07/04 TSR SPRAINS-JOINT INJURY & BROKEN BONES 1 12/07/04 RN SPRAINS/JOINT INJURY/BROKEN BONES 1 12/12/04 TSR COUGH/COLD 2 3/05/05 TSR HEAD INJURY 2 3/05/05 RN HEAD INJURY/TRAUMA 2 3/12/05 TSR DIAGNOSTIC TEST:RESULTS 2 3/12/05 RN DIAGNOSTIC TEST: RESULTS 2 9/01/05 TSR CAST PROBLEMS 3 7/28/04 TSR DIZZY/VERTIGO/FAINTING 3 7/28/04 RN DIZZINESS/VERTIGO/SYNCOPE Sometimes a teleservice representative (TSR) handles the call on her own; often she transfers it to a registered nurse (RN). The patient may talk to the TSR and RN about the same problem. However, as you can see, the text strings the TSRs and RNs use to document the same patient complaint may differ - extra spaces, slightly different wording or punctuation.... So much for my plans to use the LAG function to flag TSR and RN handling of the same complaint on the same call! I've looked at past postings on use of INDEX and SCAN but I'm not sure that's what I need here... my SAS-using colleagues tell me that SAS has a LIKE function that can identify similar words or phrases in different string values. That sounds like what I need. Is there anything like this in SPSS? Oh yes, I'm using V 14. Only base system. Thanks in advance. Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 TIE 8-428-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. |
Free forum by Nabble | Edit this page |