Matching within a sentence

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Matching within a sentence

Lars J
I have two files I'd like to merge by adding variables. (The working
file contains 65,000 cases and the other 7,000 cases.)

I want there to be a match when the content in  "variable a" in the file
to be added is found somewhere within "variable a" in the working data
file. In other words, I do not want to do a 100% exact match. I'm doing
this to to identify certain word(s) in the sentence found within
"variable a" in the working file.

This is, in other words, one way of finding words in sentences.

The Merge File function only seems to support exact match. Is there a
way of making the type of matching described above using syntax?

Thanks!

Lars J

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Matching within a sentence

Marks, Jim
Lars:

In each file, you could create variables to flag cases with each
specific string in variable a:

COMPUTE string1 = INDEX(var_a,'findit') GT 0.

Sort each file on the flag variables:
SORT CASES BY string1 string2 string3.


Then you can match the two files using something like:

MATCH FILES /FILE = * /TABLE = shortfile /BY string1 string2 string3.

I tested with two matching strings, and made matches.

Hope this helps you go forward with your issue.

--jim


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Lars J
Sent: Monday, January 12, 2009 1:03 PM
To: [hidden email]
Subject: Matching within a sentence

I have two files I'd like to merge by adding variables. (The working
file contains 65,000 cases and the other 7,000 cases.)

I want there to be a match when the content in  "variable a" in the file
to be added is found somewhere within "variable a" in the working data
file. In other words, I do not want to do a 100% exact match. I'm doing
this to to identify certain word(s) in the sentence found within
"variable a" in the working file.

This is, in other words, one way of finding words in sentences.

The Merge File function only seems to support exact match. Is there a
way of making the type of matching described above using syntax?

Thanks!

Lars J

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Matching within a sentence

Lars J
Thanks, Jim!

But if I have 7,000 combinations I'm looking for I'd have to enter them
instead of 'findit', right?

It'd be one chore.

Or am I missing something?

Thanks,
Lars


Marks, Jim wrote:

> Lars:
>
> In each file, you could create variables to flag cases with each
> specific string in variable a:
>
> COMPUTE string1 = INDEX(var_a,'findit') GT 0.
>
> Sort each file on the flag variables:
> SORT CASES BY string1 string2 string3.
>
>
> Then you can match the two files using something like:
>
> MATCH FILES /FILE = * /TABLE = shortfile /BY string1 string2 string3.
>
> I tested with two matching strings, and made matches.
>
> Hope this helps you go forward with your issue.
>
> --jim
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Lars J
> Sent: Monday, January 12, 2009 1:03 PM
> To: [hidden email]
> Subject: Matching within a sentence
>
> I have two files I'd like to merge by adding variables. (The working
> file contains 65,000 cases and the other 7,000 cases.)
>
> I want there to be a match when the content in  "variable a" in the file
> to be added is found somewhere within "variable a" in the working data
> file. In other words, I do not want to do a 100% exact match. I'm doing
> this to to identify certain word(s) in the sentence found within
> "variable a" in the working file.
>
> This is, in other words, one way of finding words in sentences.
>
> The Merge File function only seems to support exact match. Is there a
> way of making the type of matching described above using syntax?
>
> Thanks!
>
> Lars J
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Matching within a sentence

Peck, Jon
In reply to this post by Lars J
You need to match based on a regular expression, which can't be done with standard syntax but isn't hard if you can use programmability and have at least version 16.

- Is it possible for more than one external file case to be added to the working file?  E.g, the working file a variable a value river brook stream and the external file has cases with values river, brook, and stream.

When you match, do you mean to restrict "found somewhere" to mean a complete word or any string of characters?

If, OTOH, you just want to flag cases where any of a list of words occur, this would be an easier programmability exercise: load a dictionary of the words to check for and then check each case for one or more occurrences.

-Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Lars J
Sent: Monday, January 12, 2009 12:03 PM
To: [hidden email]
Subject: [SPSSX-L] Matching within a sentence

I have two files I'd like to merge by adding variables. (The working
file contains 65,000 cases and the other 7,000 cases.)

I want there to be a match when the content in  "variable a" in the file
to be added is found somewhere within "variable a" in the working data
file. In other words, I do not want to do a 100% exact match. I'm doing
this to to identify certain word(s) in the sentence found within
"variable a" in the working file.

This is, in other words, one way of finding words in sentences.

The Merge File function only seems to support exact match. Is there a
way of making the type of matching described above using syntax?

Thanks!

Lars J

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD