Hiya,
I have 2 datasets that I am trying to match. One is a lookup table and one is a transaction table. In common between the two tables is a keyword, however I need to not only exact match on the keyword but also broadmatch to append a classification. For example in the Lookup table I have a keyword of "media planning group" this can match to any transaction as long as the 3 words ("media", "planning" and "group") appear within the transaction keyword in any order with any other characters eg mediaplanninggroup, media-planning group, media planning groups, communication media planninggroup etc I have split out the keyword on the lookup table on the lookup table but I am unsure of how to match via this methodology as I have 95k on the lookup table so I can not hardcode all the options and I need to repeat the work. If you could please point me in the right direction that would be great as I am new to SPSS but have being programming in SAS for many years. Cheers, Niki |
I'd be tempted to convert these keywords into a categorical variable
with values equivalent to the information you want. E.g. for media planning group, you could use the syntax: If (index(keywd,"media")>0 and index(keywd,"planning")>0 and index(keywd,"group")>0) key=1. Then for other groups you could have something like... if (index(keywd,"other")>0 and index(keywd,"planning")>0 and index(keywd,"group")>0) key=2. For overlapping words (other planning group and other) you could have something like.... if (index(keywd,"other")>0 and index(keywd,"planning")=0 and index(keywd,"group")=0) key=3. Then your lookup table would have the variable 'key' with the appropriate value labels and you'd match on that. Unless I'm missing something... Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Niki Thorne Sent: Thursday, September 28, 2006 11:00 AM To: [hidden email] Subject: [SPSSX-L] Broad matching to a look up table Hiya, I have 2 datasets that I am trying to match. One is a lookup table and one is a transaction table. In common between the two tables is a keyword, however I need to not only exact match on the keyword but also broadmatch to append a classification. For example in the Lookup table I have a keyword of "media planning group" this can match to any transaction as long as the 3 words ("media", "planning" and "group") appear within the transaction keyword in any order with any other characters eg mediaplanninggroup, media-planning group, media planning groups, communication media planninggroup etc I have split out the keyword on the lookup table on the lookup table but I am unsure of how to match via this methodology as I have 95k on the lookup table so I can not hardcode all the options and I need to repeat the work. If you could please point me in the right direction that would be great as I am new to SPSS but have being programming in SAS for many years. Cheers, Niki |
Hi Melissa,
Thanks for getting back to me. The issue I have is that I have 95,000 in the lookup table and I need to have a very broadmatch eg in Transaction table keyword = mediagroup planning bob in lookup table keyword = media planning group This is actually a match as the 3 words are within the transaction table keyword field so I can append the group from the lookup table. Does that make sense? I'm wondering if I can generate a join similiar to SQL where join is index(transaction.keyword,lookup.word1) > 0 and index(transaction.keyword,lookup.word2) > 0 etc As there can be up to 9 word's within the lookup keyword would this work where wordn is missing ? and how can I do this in spss ? Cheers Niki Melissa Ives <[hidden email]> Sent by: "SPSSX(r) Discussion" <[hidden email]> 28/09/2006 17:15 Please respond to Melissa Ives <[hidden email]> To [hidden email] cc Subject Re: Broad matching to a look up table I'd be tempted to convert these keywords into a categorical variable with values equivalent to the information you want. E.g. for media planning group, you could use the syntax: If (index(keywd,"media")>0 and index(keywd,"planning")>0 and index(keywd,"group")>0) key=1. Then for other groups you could have something like... if (index(keywd,"other")>0 and index(keywd,"planning")>0 and index(keywd,"group")>0) key=2. For overlapping words (other planning group and other) you could have something like.... if (index(keywd,"other")>0 and index(keywd,"planning")=0 and index(keywd,"group")=0) key=3. Then your lookup table would have the variable 'key' with the appropriate value labels and you'd match on that. Unless I'm missing something... Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Niki Thorne Sent: Thursday, September 28, 2006 11:00 AM To: [hidden email] Subject: [SPSSX-L] Broad matching to a look up table Hiya, I have 2 datasets that I am trying to match. One is a lookup table and one is a transaction table. In common between the two tables is a keyword, however I need to not only exact match on the keyword but also broadmatch to append a classification. For example in the Lookup table I have a keyword of "media planning group" this can match to any transaction as long as the 3 words ("media", "planning" and "group") appear within the transaction keyword in any order with any other characters eg mediaplanninggroup, media-planning group, media planning groups, communication media planninggroup etc I have split out the keyword on the lookup table on the lookup table but I am unsure of how to match via this methodology as I have 95k on the lookup table so I can not hardcode all the options and I need to repeat the work. If you could please point me in the right direction that would be great as I am new to SPSS but have being programming in SAS for many years. Cheers, Niki |
In reply to this post by Niki Thorne
Here is a way to get a unique match, but I don't know if it is practical
for 95,000 values (!??)in the lookup table: *** sample data-- lookup table. DATA LIST FREE (",") /keyword (a30) class1 class2 (2F8.0). BEGIN DATA media planning group, 1, 1 daily operation sched, 2, 5 END DATA. *** requires v14 or higher to use DATASET. DATASET NAME lookup. ** create a numeric value for each distinct keyword group. AUTORECODE keyword /into match_nbr. SORT CASES by match_nbr. ** sample transaction data. DATA LIST FREE (",") / transid (f8.0) keyword (a40). BEGIN DATA 1, mediaplanninggroup, 2, media-planning group, 3, media planning groups, 4, communication media planninggroup 5, ops schedule, 6, daily schedule, 7, daily ops schedule, 8, daily operation schedule END DATA. DATASET NAME trans. ** numeric match value (if all three keywords are present). COMPUTE match_nbr = 0. IF INDEX(keyword,'media') GT 0 & INDEX(keyword,'planning') GT 0 & INDEX(keyword,'group') GT 0 match_nbr = 2. IF INDEX(keyword,'daily') GT 0 & INDEX(keyword,'operation') GT 0 & INDEX(keyword,'sched') GT 0 match_nbr = 1. SORT CASES BY match_nbr. MATCH FILES file = * /table = lookup /RENAME (keyword = look) /BY match_nbr /DROP look. RECODE class1 class2 (SYSMIS = 0). EXECUTE. The hard part is creating 95,000 IF statements. You can export the list of keyword and match_nbr values from the lookup table to a text editor and create the IF statements via string manipulation, but that will still be ugly. Also, SPSS probably won't handle 95,000 lines of syntax. You will need to create separate syntax files and use INCLUDE to run them. Does this help with the process? --jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Niki Thorne-Dolan Sent: Thursday, September 28, 2006 10:41 AM To: [hidden email] Subject: Broad matching to a look up table Hiya, I have 2 datasets that I am trying to match. One is a lookup table and one is a transaction table. In common between the two tables is a keyword, however I need to not only exact match on the keyword but also broadmatch to append a classification. For example in the Lookup table I have a keyword of "media planning group" this can match to any transaction as long as the 3 words ("media", "planning" and "group") appear within the transaction keyword in any order with any other characters eg mediaplanninggroup, media-planning group, media planning groups, communication media planninggroup etc I have split out the keyword on the lookup table on the lookup table but I am unsure of how to match via this methodology as I have 95k on the lookup table so I can not hardcode all the options and I need to repeat the work. If you could please point me in the right direction that would be great as I am new to SPSS but have being programming in SAS for many years. Cheers, Niki --------------------------------- Yahoo! Photos - NEW, now offering a quality print service from just 8p a photo. |
In reply to this post by Niki Thorne
At 11:59 AM 9/28/2006, Niki Thorne wrote:
>I have 2 datasets that I am trying to match. One is a lookup table and >one is a transaction table. In common between the two tables is a >keyword, however I need to not only exact match on the keyword but >also broadmatch to append a classification. > >For example in the Lookup table I have a keyword of "media planning >group" this can match to any transaction as long as the 3 words >("media", "planning" and "group") appear within the transaction keyword OK. Your tables *don't* have a keyword in common. It looks like you have a text fields I'll call 'description' in the transaction table; a set of, I suppose, varying numbers of keywords, for each entry in the lookup table; and a lookup entry matches a transaction entry if *all* the keywords in the lookup entry occur somewhere in the description in the transaction entry. Do I have that right? That's related text, but hardly text 'in common'. >I have split out the keyword on the lookup table but I am unsure of >how to match via this methodology as I have 95k on the lookup table so >I can not hardcode all the options and I need to repeat the work. Do I really have this right? You have 95,000 entries in the lookup table, each with a different set of three or so keywords that must be found in the transaction description, for a match? That's a good many. How many transactions do you have? Can you give more examples? In your lookup table, "media planning group" is one 'keyword'. (I wouldn't call it a 'keyword', but a set of three keywords, which each individually must match). What are a few of the others? (Maybe a large 'few', like 100.) Can you list a few description ('keyword') fields from transactions, as well? And, again: how many transactions? >If you could please point me in the right direction that would be >great as I am new to SPSS but have being programming in SAS for many >years. This looks to me like a many-to-many problem. As you describe it, the natural logic is to check every transaction entry against all the sets of keywords from every lookup entry, and see which match. Comparing 95,000 lookup table entries with every transaction record sounds pretty slow, though. How would you do it in SAS? Many-to-many merges in SAS are best done with PROC SQL. In SPSS, they're best done with "long to wide to long" logic, which I don't expect to be self-explanatory but which we can show you, with a little more data to work with. But I think there has to be some optimization, so you don't have to check 95,000 each time. So: tell us a little more. Data, data, always says more than any description. -Good luck, Richard |
Free forum by Nabble | Edit this page |