Broad matching to a look up table

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Broad matching to a look up table

Niki Thorne
 Hiya,

I have 2 datasets that I am trying to match. One is a lookup table and one
is a transaction table.

In common between the two tables is a keyword, however I need to not only
exact match on the keyword but also broadmatch to append a classification.


For example in the Lookup table I have a keyword of "media planning group"
this can match to any transaction as long as the 3 words ("media",
"planning" and "group") appear within the transaction keyword in any order
with any other characters eg mediaplanninggroup, media-planning group,
media planning groups, communication media planninggroup etc

I have split out the keyword on the lookup table on the lookup table but I
am unsure of how to match via this methodology as I have 95k on the lookup
table so I can not hardcode all the options and I need to repeat the work.


If you could please point me in the right direction that would be great as
I am new to SPSS but have being programming in SAS for many years.

Cheers,
Niki
Reply | Threaded
Open this post in threaded view
|

Re: Broad matching to a look up table

Melissa Ives
I'd be tempted to convert these keywords into a categorical variable
with values equivalent to the information you want.

E.g. for media planning group, you could use the syntax:

If (index(keywd,"media")>0 and index(keywd,"planning")>0 and
index(keywd,"group")>0) key=1.
Then for other groups you could have something like...
if (index(keywd,"other")>0 and index(keywd,"planning")>0 and
index(keywd,"group")>0) key=2.
For overlapping words (other planning group and other) you could have
something like....
if (index(keywd,"other")>0 and index(keywd,"planning")=0 and
index(keywd,"group")=0) key=3.

Then your lookup table would have the variable 'key' with the
appropriate value labels and you'd match on that.
Unless I'm missing something...

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Niki Thorne
Sent: Thursday, September 28, 2006 11:00 AM
To: [hidden email]
Subject: [SPSSX-L] Broad matching to a look up table

 Hiya,

I have 2 datasets that I am trying to match. One is a lookup table and
one is a transaction table.

In common between the two tables is a keyword, however I need to not
only exact match on the keyword but also broadmatch to append a
classification.


For example in the Lookup table I have a keyword of "media planning
group"
this can match to any transaction as long as the 3 words ("media",
"planning" and "group") appear within the transaction keyword in any
order with any other characters eg mediaplanninggroup, media-planning
group, media planning groups, communication media planninggroup etc

I have split out the keyword on the lookup table on the lookup table but
I am unsure of how to match via this methodology as I have 95k on the
lookup table so I can not hardcode all the options and I need to repeat
the work.


If you could please point me in the right direction that would be great
as I am new to SPSS but have being programming in SAS for many years.

Cheers,
Niki
Reply | Threaded
Open this post in threaded view
|

Re: Broad matching to a look up table

Niki Thorne
Hi Melissa,

Thanks for getting back to me.

The issue I have is that I have 95,000 in the lookup table and I need to
have a very broadmatch eg

in Transaction table keyword = mediagroup planning bob

in lookup table keyword = media planning group

This is actually a match as the 3 words are within the transaction table
keyword field so I can append the group from the lookup table.

Does that make sense?

I'm wondering if I can generate a join similiar to SQL where join is

index(transaction.keyword,lookup.word1) > 0 and
index(transaction.keyword,lookup.word2) > 0 etc

As there can be  up to 9 word's within the lookup keyword would this work
where wordn is missing ? and how can I do this in spss ?

Cheers

Niki



Melissa Ives <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
28/09/2006 17:15
Please respond to
Melissa Ives <[hidden email]>


To
[hidden email]
cc

Subject
Re: Broad matching to a look up table






I'd be tempted to convert these keywords into a categorical variable
with values equivalent to the information you want.

E.g. for media planning group, you could use the syntax:

If (index(keywd,"media")>0 and index(keywd,"planning")>0 and
index(keywd,"group")>0) key=1.
Then for other groups you could have something like...
if (index(keywd,"other")>0 and index(keywd,"planning")>0 and
index(keywd,"group")>0) key=2.
For overlapping words (other planning group and other) you could have
something like....
if (index(keywd,"other")>0 and index(keywd,"planning")=0 and
index(keywd,"group")=0) key=3.

Then your lookup table would have the variable 'key' with the
appropriate value labels and you'd match on that.
Unless I'm missing something...

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Niki Thorne
Sent: Thursday, September 28, 2006 11:00 AM
To: [hidden email]
Subject: [SPSSX-L] Broad matching to a look up table

 Hiya,

I have 2 datasets that I am trying to match. One is a lookup table and
one is a transaction table.

In common between the two tables is a keyword, however I need to not
only exact match on the keyword but also broadmatch to append a
classification.


For example in the Lookup table I have a keyword of "media planning
group"
this can match to any transaction as long as the 3 words ("media",
"planning" and "group") appear within the transaction keyword in any
order with any other characters eg mediaplanninggroup, media-planning
group, media planning groups, communication media planninggroup etc

I have split out the keyword on the lookup table on the lookup table but
I am unsure of how to match via this methodology as I have 95k on the
lookup table so I can not hardcode all the options and I need to repeat
the work.


If you could please point me in the right direction that would be great
as I am new to SPSS but have being programming in SAS for many years.

Cheers,
Niki
Reply | Threaded
Open this post in threaded view
|

Re: Broad matching to a look up table

Marks, Jim
In reply to this post by Niki Thorne
Here is a way to get a unique match, but I don't know if it is practical
for 95,000 values (!??)in the lookup table:

*** sample data-- lookup table.
DATA LIST FREE (",")  /keyword (a30)  class1 class2 (2F8.0).
BEGIN DATA
media planning group, 1, 1
daily operation sched, 2, 5
END DATA.

*** requires v14 or higher to use DATASET.
DATASET NAME lookup.

** create a numeric value for each distinct keyword group.
AUTORECODE keyword /into match_nbr.
SORT CASES by match_nbr.

** sample transaction data.
DATA LIST FREE (",") / transid (f8.0)  keyword (a40).
BEGIN DATA
1, mediaplanninggroup, 2, media-planning group, 3, media planning
groups, 4, communication media planninggroup
5, ops schedule, 6, daily schedule, 7, daily ops schedule, 8, daily
operation schedule
END DATA.
DATASET NAME trans.

** numeric match value (if all three keywords are present).
COMPUTE match_nbr = 0.
IF INDEX(keyword,'media') GT 0 & INDEX(keyword,'planning') GT 0 &
INDEX(keyword,'group') GT 0 match_nbr = 2.
IF INDEX(keyword,'daily') GT 0 & INDEX(keyword,'operation') GT 0 &
INDEX(keyword,'sched') GT 0 match_nbr = 1.

SORT CASES BY match_nbr.

MATCH FILES file = *
     /table = lookup   /RENAME (keyword = look)
/BY match_nbr
   /DROP look.
RECODE class1 class2 (SYSMIS = 0).
EXECUTE.

The hard part is creating 95,000 IF statements.
You can export the list of keyword and match_nbr values from the lookup
table to a text editor and create the IF statements via string
manipulation, but that will still be ugly.

Also, SPSS probably won't handle 95,000 lines of syntax. You will need
to create separate syntax files and use INCLUDE to run them.

Does this help with the process?

--jim


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Niki Thorne-Dolan
Sent: Thursday, September 28, 2006 10:41 AM
To: [hidden email]
Subject: Broad matching to a look up table

Hiya,

  I have 2 datasets that I am trying to match. One is a lookup table and
one is a transaction table.

  In common between the two tables is a keyword, however I need to not
only exact match on the keyword but also broadmatch to append a
classification.

  For example in the Lookup table I have a keyword of "media planning
group" this can match to any transaction as long as the 3 words
("media", "planning" and "group") appear within the transaction keyword
in any order with any other characters eg mediaplanninggroup,
media-planning group, media planning groups, communication media
planninggroup etc

  I have split out the keyword on the lookup table on the lookup table
but I am unsure of how to match via this methodology as I have 95k on
the lookup table so I can not hardcode all the options and I need to
repeat the work.

  If you could please point me in the right direction that would be
great as I am new to SPSS but have being programming in SAS for many
years.

  Cheers,
  Niki


---------------------------------
Yahoo! Photos - NEW, now offering a quality print service from just 8p a
photo.
Reply | Threaded
Open this post in threaded view
|

Re: Broad matching to a look up table

Richard Ristow
In reply to this post by Niki Thorne
At 11:59 AM 9/28/2006, Niki Thorne wrote:

>I have 2 datasets that I am trying to match. One is a lookup table and
>one is a transaction table. In common between the two tables is a
>keyword, however I need to not only exact match on the keyword but
>also broadmatch to append a classification.
>
>For example in the Lookup table I have a keyword of "media planning
>group" this can match to any transaction as long as the 3 words
>("media", "planning" and "group") appear within the transaction keyword

OK. Your tables *don't* have a keyword in common. It looks like you
have a text fields I'll call 'description' in the transaction table; a
set of, I suppose, varying numbers of keywords, for each entry in the
lookup table; and a lookup entry matches a transaction entry if *all*
the keywords in the lookup entry occur somewhere in the description in
the transaction entry.

Do I have that right? That's related text, but hardly text 'in common'.

>I have split out the keyword on the lookup table but I am unsure of
>how to match via this methodology as I have 95k on the lookup table so
>I can not hardcode all the options and I need to repeat the work.

Do I really have this right? You have 95,000 entries in the lookup
table, each with a different set of three or so keywords that must be
found in the transaction description, for a match? That's a good many.
How many transactions do you have?

Can you give more examples? In your lookup table, "media planning
group" is one 'keyword'. (I wouldn't call it a 'keyword', but a set of
three keywords, which each individually must match). What are a few of
the others? (Maybe a large 'few', like 100.)

Can you list a few description ('keyword') fields from transactions, as
well? And, again: how many transactions?

>If you could please point me in the right direction that would be
>great as I am new to SPSS but have being programming in SAS for many
>years.

This looks to me like a many-to-many problem. As you describe it, the
natural logic is to check every transaction entry against all the sets
of keywords from every lookup entry, and see which match. Comparing
95,000 lookup table entries with every transaction record sounds pretty
slow, though.

How would you do it in SAS?

Many-to-many merges in SAS are best done with PROC SQL. In SPSS,
they're best done with "long to wide to long" logic, which I don't
expect to be self-explanatory but which we can show you, with a little
more data to work with.

But I think there has to be some optimization, so you don't have to
check 95,000 each time.

So: tell us a little more. Data, data, always says more than any
description.

-Good luck,
  Richard