singe and joint frequencies of words

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

singe and joint frequencies of words

Empi
Dear all,

my aim is to calculate the single and joint frequencies of words from texts
saved as a string-variable (“text_var”); each cell of the string-variable
contains multiple sentences (ultimately, I’d like to use these frequencies
to calculate a Jaccard-index to assess the strength of the co-occurrence of
words).

Ideally, the results would indicate per cell (1) how often word “x” occurs
(2) how often word “y” occurs and (3) how often words “x” and “y” occur
together as “xy” in a text.

I assume that the single frequencies of “x” and “y” and the joint frequency
of “xy” could be stored in three new variables - but it is not really clear
to me how to request the quantities.

I think that this syntax
compute var_x =char.index(lower(text), "cats") > 0.
compute var_y =char.index(lower(text), "dogs") > 0.

gives the single frequencies of the words “cats” and “dogs” per text. But I
failed to adjust this syntax (or any other syntax) in order to obtain the
joint frequencies of “cats” and “dogs” – can anybody help me out here???

Thank you very much & regards,
Empi




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

Empi
...and of course, the problem is to identify the joint occurecnce per
sentence, not just per text - so how can one identify sentences per text
where two (or more) words of interest occur together?

Best,
Empi



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

spss.giesel@yahoo.de
In reply to this post by Empi
Dear Empi,

feels like a Sisyphean task without using a programming language like R or Python and some NLP package.
But maybe you can send some text examples and the result you want to achieve.

Regards,
Mario


Am Freitag, 25. Oktober 2019, 12:18:17 MESZ hat Empi <[hidden email]> Folgendes geschrieben:


Dear all,

my aim is to calculate the single and joint frequencies of words from texts
saved as a string-variable (“text_var”); each cell of the string-variable
contains multiple sentences (ultimately, I’d like to use these frequencies
to calculate a Jaccard-index to assess the strength of the co-occurrence of
words).

Ideally, the results would indicate per cell (1) how often word “x” occurs
(2) how often word “y” occurs and (3) how often words “x” and “y” occur
together as “xy” in a text.

I assume that the single frequencies of “x” and “y” and the joint frequency
of “xy” could be stored in three new variables - but it is not really clear
to me how to request the quantities.

I think that this syntax
compute var_x =char.index(lower(text), "cats") > 0.
compute var_y =char.index(lower(text), "dogs") > 0.

gives the single frequencies of the words “cats” and “dogs” per text. But I
failed to adjust this syntax (or any other syntax) in order to obtain the
joint frequencies of “cats” and “dogs” – can anybody help me out here???

Thank you very much & regards,
Empi




--

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

Kirill Orlov
Just break text: Make one cell one word. There will be single very long column in your dataset. Cases are words in their sequence. Sentences can be separated by a blank cell or indicated by a separate categorical variable. Then remove waste words (if needed): stemma/lemmatization. Then AUTORECODE words into numeric codes. Then you can do everything you want.


25.10.2019 14:14, [hidden email] пишет:
Dear Empi,

feels like a Sisyphean task without using a programming language like R or Python and some NLP package.
But maybe you can send some text examples and the result you want to achieve.

Regards,
Mario


Am Freitag, 25. Oktober 2019, 12:18:17 MESZ hat Empi [hidden email] Folgendes geschrieben:


Dear all,

my aim is to calculate the single and joint frequencies of words from texts
saved as a string-variable (“text_var”); each cell of the string-variable
contains multiple sentences (ultimately, I’d like to use these frequencies
to calculate a Jaccard-index to assess the strength of the co-occurrence of
words).

Ideally, the results would indicate per cell (1) how often word “x” occurs
(2) how often word “y” occurs and (3) how often words “x” and “y” occur
together as “xy” in a text.

I assume that the single frequencies of “x” and “y” and the joint frequency
of “xy” could be stored in three new variables - but it is not really clear
to me how to request the quantities.

I think that this syntax
compute var_x =char.index(lower(text), "cats") > 0.
compute var_y =char.index(lower(text), "dogs") > 0.

gives the single frequencies of the words “cats” and “dogs” per text. But I
failed to adjust this syntax (or any other syntax) in order to obtain the
joint frequencies of “cats” and “dogs” – can anybody help me out here???

Thank you very much & regards,
Empi




--

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

Empi
In reply to this post by spss.giesel@yahoo.de
Hi Mario,

many thanks for your reply - let me try to offer a (hopefully) somewhat more
precise description of my aim.
My question:

Using SPSS,  how could one examine if two words within one sentence occur
together or not?

Let's imagine a researcher would be interested in counting how often the
words "illegal" and "immigrant*" in sentences in tweets from a politician
occur together or not.

In order to count the single occurences of "illegal" and "immigrant" per
tweet my earlier example should suffice:

compute var_immigrants =char.index(lower(text), "immigrants") > 0.
compute var_illegal =char.index(lower(text), "illegal") > 0.

But how can the char.index function - or any other functuin - be used to
(a) restrict the search to single sentences (as indicated by a dot "." or
maybe a question  mark "?") and

(b) to indicate the joint occurence of the words, such as the phrase
"illegal immigrants"?

We could then calculate Jaccard's Index as =
f_illegal&immigrant / (f_illegal + f_immigrant - f_illegal&immigrant)

PS: Just let me know if Ishould  provide some real tweets :)



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

Maguin, Eugene
Start where Kirill left off but with a modification. Number message and sentence within message. You have a dictionary of words and their numbers. In that long, single variable file, use aggregate for first occurrence of word x and word y. You now have a crosstab after filling in the sysmis values where a word or both were not in a sentence.
Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Empi
Sent: Friday, October 25, 2019 10:17 AM
To: [hidden email]
Subject: Re: singe and joint frequencies of words

Hi Mario,

many thanks for your reply - let me try to offer a (hopefully) somewhat more precise description of my aim.
My question:

Using SPSS,  how could one examine if two words within one sentence occur together or not?

Let's imagine a researcher would be interested in counting how often the words "illegal" and "immigrant*" in sentences in tweets from a politician occur together or not.

In order to count the single occurences of "illegal" and "immigrant" per tweet my earlier example should suffice:

compute var_immigrants =char.index(lower(text), "immigrants") > 0.
compute var_illegal =char.index(lower(text), "illegal") > 0.

But how can the char.index function - or any other functuin - be used to
(a) restrict the search to single sentences (as indicated by a dot "." or maybe a question  mark "?") and

(b) to indicate the joint occurence of the words, such as the phrase "illegal immigrants"?

We could then calculate Jaccard's Index as = f_illegal&immigrant / (f_illegal + f_immigrant - f_illegal&immigrant)

PS: Just let me know if Ishould  provide some real tweets :)



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

spss.giesel@yahoo.de
In reply to this post by Kirill Orlov
Hi, Empi,

To answer your question straight: It’s too complicated, too time consuming and too error prone to do this with SPSS syntax alone. If sentences are your basic analytical units I would start with breaking your tweets with some regular expression construction, e.g. in Python
---
import re
str = "\.|\?|\!" # separator is ‘.’ Or ‘?’ or ‘!’
x = re.split("\s", str)
---

You’ll get separate sentences, then and several rows per person.

If you don’t bother with programming you can take a more shirtsleeved approach:
Copy your variable content into Notepad++
Select all
Go to Search -> Replace
Find: for each sentence separator (.?! Etc.) insert the separator 
Search mode: Extended
Replace with: write your sentence separator and an additional “\n”
This will insert a new line.

You can copy the result into a new SPSS data file. Then you can use your cats&dogs syntax. Of course, you will lose relations to other variables in the dataset. But, sorry, there’s no easy way I'm aware of to do it otherwise.

Mario Giesel
Munich, Germany


Am Samstag, 26. Oktober 2019, 01:09:21 MESZ hat Kirill Orlov <[hidden email]> Folgendes geschrieben:


Just break text: Make one cell one word. There will be single very long column in your dataset. Cases are words in their sequence. Sentences can be separated by a blank cell or indicated by a separate categorical variable. Then remove waste words (if needed): stemma/lemmatization. Then AUTORECODE words into numeric codes. Then you can do everything you want.


25.10.2019 14:14, [hidden email] пишет:
Dear Empi,

feels like a Sisyphean task without using a programming language like R or Python and some NLP package.
But maybe you can send some text examples and the result you want to achieve.

Regards,
Mario


Am Freitag, 25. Oktober 2019, 12:18:17 MESZ hat Empi [hidden email] Folgendes geschrieben:


Dear all,

my aim is to calculate the single and joint frequencies of words from texts
saved as a string-variable (“text_var”); each cell of the string-variable
contains multiple sentences (ultimately, I’d like to use these frequencies
to calculate a Jaccard-index to assess the strength of the co-occurrence of
words).

Ideally, the results would indicate per cell (1) how often word “x” occurs
(2) how often word “y” occurs and (3) how often words “x” and “y” occur
together as “xy” in a text.

I assume that the single frequencies of “x” and “y” and the joint frequency
of “xy” could be stored in three new variables - but it is not really clear
to me how to request the quantities.

I think that this syntax
compute var_x =char.index(lower(text), "cats") > 0.
compute var_y =char.index(lower(text), "dogs") > 0.

gives the single frequencies of the words “cats” and “dogs” per text. But I
failed to adjust this syntax (or any other syntax) in order to obtain the
joint frequencies of “cats” and “dogs” – can anybody help me out here???

Thank you very much & regards,
Empi




--

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

spss.giesel@yahoo.de
Sorry, my answer has been cut as it looks. Second try:


To answer your question straight: It’s too complicated, too time consuming and too error prone to do this with SPSS syntax alone. If sentences are your basic analytical units I would start with breaking your tweets with some regular expression construction, e.g. in Python

import re
str = "\.|\?|\!" # separator is ‘.’ Or ‘?’ or ‘!’
x = re.split(
"\s"str)

You’ll get separate sentences, then and several rows per person.

 

If you don’t bother with programming you can take a more shirtsleeved approach:

·        Copy your variable content into Notepad++

·        Select all

·        Go to Search -> Replace

·        Find: for each sentence separator (.?! Etc.) insert the separator

·        Search mode: Extended

·        Replace with: write your sentence separator and an additional “\n”

This will insert a new line

You can copy the result into a new SPSS data file. Then you can use your cats&dogs syntax. Of course, you will lose relations to other variables in the dataset. But there’s no easy way to do it otherwise.



Mario Giesel
Munich, Germany


Am Samstag, 26. Oktober 2019, 12:57:54 MESZ hat Mario Giesel <[hidden email]> Folgendes geschrieben:


Hi, Empi,

To answer your question straight: It’s too complicated, too time consuming and too error prone to do this with SPSS syntax alone. If sentences are your basic analytical units I would start with breaking your tweets with some regular expression construction, e.g. in Python
---
import re
str = "\.|\?|\!" # separator is ‘.’ Or ‘?’ or ‘!’
x = re.split("\s", str)
---

You’ll get separate sentences, then and several rows per person.

If you don’t bother with programming you can take a more shirtsleeved approach:
Copy your variable content into Notepad++
Select all
Go to Search -> Replace
Find: for each sentence separator (.?! Etc.) insert the separator 
Search mode: Extended
Replace with: write your sentence separator and an additional “\n”
This will insert a new line.

You can copy the result into a new SPSS data file. Then you can use your cats&dogs syntax. Of course, you will lose relations to other variables in the dataset. But, sorry, there’s no easy way I'm aware of to do it otherwise.

Mario Giesel
Munich, Germany


Am Samstag, 26. Oktober 2019, 01:09:21 MESZ hat Kirill Orlov <[hidden email]> Folgendes geschrieben:


Just break text: Make one cell one word. There will be single very long column in your dataset. Cases are words in their sequence. Sentences can be separated by a blank cell or indicated by a separate categorical variable. Then remove waste words (if needed): stemma/lemmatization. Then AUTORECODE words into numeric codes. Then you can do everything you want.


25.10.2019 14:14, [hidden email] пишет:
Dear Empi,

feels like a Sisyphean task without using a programming language like R or Python and some NLP package.
But maybe you can send some text examples and the result you want to achieve.

Regards,
Mario


Am Freitag, 25. Oktober 2019, 12:18:17 MESZ hat Empi [hidden email] Folgendes geschrieben:


Dear all,

my aim is to calculate the single and joint frequencies of words from texts
saved as a string-variable (“text_var”); each cell of the string-variable
contains multiple sentences (ultimately, I’d like to use these frequencies
to calculate a Jaccard-index to assess the strength of the co-occurrence of
words).

Ideally, the results would indicate per cell (1) how often word “x” occurs
(2) how often word “y” occurs and (3) how often words “x” and “y” occur
together as “xy” in a text.

I assume that the single frequencies of “x” and “y” and the joint frequency
of “xy” could be stored in three new variables - but it is not really clear
to me how to request the quantities.

I think that this syntax
compute var_x =char.index(lower(text), "cats") > 0.
compute var_y =char.index(lower(text), "dogs") > 0.

gives the single frequencies of the words “cats” and “dogs” per text. But I
failed to adjust this syntax (or any other syntax) in order to obtain the
joint frequencies of “cats” and “dogs” – can anybody help me out here???

Thank you very much & regards,
Empi




--

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

Jon Peck
I posted a solution for the two-word problem on the IBM Predictive Analytics site, but I am copying it here.  It uses the SPSSINC TRANS extension command with a small Python function to find counts of joint occurrences per sentence of two specified words.  It could be generalized in a number of ways.

* Encoding: UTF-8.
data list list/text(a60).
begin data
"dogs and cats are enemies.  but dogs sometimes like cats."
"there are no dogs here."
"are there cats or dogs here?  Maybe just cats."
"there are elephants."
end data.
dataset name text.

begin program.
import re
def counter(text, word1, word2):
    sentences = re.findall(r"(.*?)(?:\.|\?)", text)
    paircount = 0
    for s in sentences:
        has1 = re.search(r"\b%s\b" % word1.strip(), s, flags=re.I) is not None
        has2 = re.search(r"\b%s\b" % word2.strip(), s, flags=re.I) is not None
        if has1 and has2:
            paircount = paircount + 1
    return paircount
end program.    


spssinc trans result=counts
/formula 'counter(text, word1="dogs", word2="cats")'.

On Sat, Oct 26, 2019 at 5:00 AM Mario Giesel <[hidden email]> wrote:
Sorry, my answer has been cut as it looks. Second try:


To answer your question straight: It’s too complicated, too time consuming and too error prone to do this with SPSS syntax alone. If sentences are your basic analytical units I would start with breaking your tweets with some regular expression construction, e.g. in Python

import re
str = "\.|\?|\!" # separator is ‘.’ Or ‘?’ or ‘!’
x = re.split(
"\s"str)

You’ll get separate sentences, then and several rows per person.

 

If you don’t bother with programming you can take a more shirtsleeved approach:

·        Copy your variable content into Notepad++

·        Select all

·        Go to Search -> Replace

·        Find: for each sentence separator (.?! Etc.) insert the separator

·        Search mode: Extended

·        Replace with: write your sentence separator and an additional “\n”

This will insert a new line

You can copy the result into a new SPSS data file. Then you can use your cats&dogs syntax. Of course, you will lose relations to other variables in the dataset. But there’s no easy way to do it otherwise.



Mario Giesel
Munich, Germany


Am Samstag, 26. Oktober 2019, 12:57:54 MESZ hat Mario Giesel <[hidden email]> Folgendes geschrieben:


Hi, Empi,

To answer your question straight: It’s too complicated, too time consuming and too error prone to do this with SPSS syntax alone. If sentences are your basic analytical units I would start with breaking your tweets with some regular expression construction, e.g. in Python
---
import re
str = "\.|\?|\!" # separator is ‘.’ Or ‘?’ or ‘!’
x = re.split("\s", str)
---

You’ll get separate sentences, then and several rows per person.

If you don’t bother with programming you can take a more shirtsleeved approach:
Copy your variable content into Notepad++
Select all
Go to Search -> Replace
Find: for each sentence separator (.?! Etc.) insert the separator 
Search mode: Extended
Replace with: write your sentence separator and an additional “\n”
This will insert a new line.

You can copy the result into a new SPSS data file. Then you can use your cats&dogs syntax. Of course, you will lose relations to other variables in the dataset. But, sorry, there’s no easy way I'm aware of to do it otherwise.

Mario Giesel
Munich, Germany


Am Samstag, 26. Oktober 2019, 01:09:21 MESZ hat Kirill Orlov <[hidden email]> Folgendes geschrieben:


Just break text: Make one cell one word. There will be single very long column in your dataset. Cases are words in their sequence. Sentences can be separated by a blank cell or indicated by a separate categorical variable. Then remove waste words (if needed): stemma/lemmatization. Then AUTORECODE words into numeric codes. Then you can do everything you want.


25.10.2019 14:14, [hidden email] пишет:
Dear Empi,

feels like a Sisyphean task without using a programming language like R or Python and some NLP package.
But maybe you can send some text examples and the result you want to achieve.

Regards,
Mario


Am Freitag, 25. Oktober 2019, 12:18:17 MESZ hat Empi [hidden email] Folgendes geschrieben:


Dear all,

my aim is to calculate the single and joint frequencies of words from texts
saved as a string-variable (“text_var”); each cell of the string-variable
contains multiple sentences (ultimately, I’d like to use these frequencies
to calculate a Jaccard-index to assess the strength of the co-occurrence of
words).

Ideally, the results would indicate per cell (1) how often word “x” occurs
(2) how often word “y” occurs and (3) how often words “x” and “y” occur
together as “xy” in a text.

I assume that the single frequencies of “x” and “y” and the joint frequency
of “xy” could be stored in three new variables - but it is not really clear
to me how to request the quantities.

I think that this syntax
compute var_x =char.index(lower(text), "cats") > 0.
compute var_y =char.index(lower(text), "dogs") > 0.

gives the single frequencies of the words “cats” and “dogs” per text. But I
failed to adjust this syntax (or any other syntax) in order to obtain the
joint frequencies of “cats” and “dogs” – can anybody help me out here???

Thank you very much & regards,
Empi




--

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: singe and joint frequencies of words

David Marso-2
In reply to this post by Empi
Parse into a single new record per word retaining caseid, Cartesian match records within each caseid, aggregate... Done. All of these steps can be found in this group's archives.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD