Hello,
I have 5 variables that identify channel usage by percentage. Each variable identifies a different channel and the 5 variables add up to 100%. I am wondering how I can I compute this into 1 variable (to try and provide a score for each row of data), to be able to compute it (eg. compare, correlate) with other variables? I have some ideas such as identifying the variable with highest percentage, but I am not sure that this is the best way. Thanks, -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Do you have any hypotheses? (What is a "channel"?)
Why do you care about the data?
Items that add up to 100% are sometimes called "compositional
data" and there are books on the subject. When some proportions
are small (less than 10%, say), your correlations, etc., might be
more robust if you convert the proportions to logits, where a
logit equals log(P/(1-P).
Given a related set of scores, I start by looking at frequencies
and looking at correlations. Are some of them "irrelevant"
because the numbers are small? - or does that make them
of special interest?
If there are correlations that are very high between channels,
that suggests that you /might/ want to reduce the count
of variables from 5 to 4 (or fewer).
As you suggest, you could easily look at max(p1 to p5) to fine
the channel that is highest for each person. Or lowest. And
create a single summary score which is the category (1-5)
which is highest for each person. Or it might be more interesting
to consider a variation like this: assign the category where the
person has the highest percentile score. (Or, you could also use
a cutoff, and label "6" for the person who has no score that is
especially high, whatever you mean by "especially" - like, in the
top 10 or 15 percent.)
Hope this helps.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of helefun <[hidden email]>
Sent: Wednesday, July 24, 2019 5:38 AM To: [hidden email] <[hidden email]> Subject: Identifying Hello,
I have 5 variables that identify channel usage by percentage. Each variable identifies a different channel and the 5 variables add up to 100%. I am wondering how I can I compute this into 1 variable (to try and provide a score for each row of data), to be able to compute it (eg. compare, correlate) with other variables? I have some ideas such as identifying the variable with highest percentage, but I am not sure that this is the best way. Thanks, -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
You could try to detect channel profiles by first clustering the channel probabilities. Depending on the cluster you could find relations with other variables. Mario Giesel Munich, Germany
Am Donnerstag, 25. Juli 2019, 04:21:29 MESZ hat Rich Ulrich <[hidden email]> Folgendes geschrieben:
Do you have any hypotheses? (What is a "channel"?)
Why do you care about the data?
Items that add up to 100% are sometimes called "compositional
data" and there are books on the subject. When some proportions
are small (less than 10%, say), your correlations, etc., might be
more robust if you convert the proportions to logits, where a
logit equals log(P/(1-P).
Given a related set of scores, I start by looking at frequencies
and looking at correlations. Are some of them "irrelevant"
because the numbers are small? - or does that make them
of special interest?
If there are correlations that are very high between channels,
that suggests that you /might/ want to reduce the count
of variables from 5 to 4 (or fewer).
As you suggest, you could easily look at max(p1 to p5) to fine
the channel that is highest for each person. Or lowest. And
create a single summary score which is the category (1-5)
which is highest for each person. Or it might be more interesting
to consider a variation like this: assign the category where the
person has the highest percentile score. (Or, you could also use
a cutoff, and label "6" for the person who has no score that is
especially high, whatever you mean by "especially" - like, in the
top 10 or 15 percent.)
Hope this helps.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of helefun <[hidden email]>
Sent: Wednesday, July 24, 2019 5:38 AM To: [hidden email] <[hidden email]> Subject: Identifying Hello,
I have 5 variables that identify channel usage by percentage. Each variable identifies a different channel and the 5 variables add up to 100%. I am wondering how I can I compute this into 1 variable (to try and provide a score for each row of data), to be able to compute it (eg. compare, correlate) with other variables? I have some ideas such as identifying the variable with highest percentage, but I am not sure that this is the best way. Thanks, -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |