SPSSX Discussion

Identifying

Classic

List

Threaded

3 messages Options

helefun

Identifying

Hello,
I have 5 variables that identify channel usage by percentage. Each variable
identifies a different channel and the 5 variables add up to 100%. I am
wondering how I can I compute this into 1 variable (to try and provide a
score for each row of data), to be able to compute it (eg. compare,
correlate) with other variables? I have some ideas such as identifying the
variable with highest percentage, but I am not sure that this is the best
way.
Thanks,

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: Identifying

Do you have any hypotheses? (What is a "channel"?)

Why do you care about the data?

Items that add up to 100% are sometimes called "compositional

data" and there are books on the subject. When some proportions

are small (less than 10%, say), your correlations, etc., might be

more robust if you convert the proportions to logits, where a

logit equals log(P/(1-P).

Given a related set of scores, I start by looking at frequencies

and looking at correlations. Are some of them "irrelevant"

because the numbers are small? - or does that make them

of special interest?

If there are correlations that are very high between channels,

that suggests that you /might/ want to reduce the count

of variables from 5 to 4 (or fewer).

As you suggest, you could easily look at max(p1 to p5) to fine

the channel that is highest for each person. Or lowest. And

create a single summary score which is the category (1-5)

which is highest for each person. Or it might be more interesting

to consider a variation like this: assign the category where the

person has the highest percentile score. (Or, you could also use

a cutoff, and label "6" for the person who has no score that is

especially high, whatever you mean by "especially" - like, in the

top 10 or 15 percent.)

Hope this helps.

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of helefun <[hidden email]>
Sent: Wednesday, July 24, 2019 5:38 AM
To: [hidden email] <[hidden email]>
Subject: Identifying

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

spss.giesel@yahoo.de

Re: Identifying

You could try to detect channel profiles by first clustering the channel probabilities.
Depending on the cluster you could find relations with other variables.

Mario Giesel
Munich, Germany

Am Donnerstag, 25. Juli 2019, 04:21:29 MESZ hat Rich Ulrich <[hidden email]> Folgendes geschrieben:

Do you have any hypotheses? (What is a "channel"?)

Why do you care about the data?

Items that add up to 100% are sometimes called "compositional

data" and there are books on the subject. When some proportions

are small (less than 10%, say), your correlations, etc., might be

more robust if you convert the proportions to logits, where a

logit equals log(P/(1-P).

Given a related set of scores, I start by looking at frequencies

and looking at correlations. Are some of them "irrelevant"

because the numbers are small? - or does that make them

of special interest?

If there are correlations that are very high between channels,

that suggests that you /might/ want to reduce the count

of variables from 5 to 4 (or fewer).

As you suggest, you could easily look at max(p1 to p5) to fine

the channel that is highest for each person. Or lowest. And

create a single summary score which is the category (1-5)

which is highest for each person. Or it might be more interesting

to consider a variation like this: assign the category where the

person has the highest percentile score. (Or, you could also use

a cutoff, and label "6" for the person who has no score that is

especially high, whatever you mean by "especially" - like, in the

top 10 or 15 percent.)

Hope this helps.

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of helefun <[hidden email]>
Sent: Wednesday, July 24, 2019 5:38 AM
To: [hidden email] <[hidden email]>
Subject: Identifying