Identifying

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Identifying

helefun
Hello,
I have 5 variables that identify channel usage by percentage. Each variable
identifies a different channel and the 5 variables add up to 100%. I am
wondering how I can I compute this into 1 variable (to try and provide a
score for each row of data), to be able to compute it (eg. compare,
correlate) with other variables? I have some ideas such as identifying the
variable with highest percentage, but I am not sure that this is the best
way.
Thanks,



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying

Rich Ulrich
Do you have any hypotheses?  (What is a "channel"?)
Why do you care about the data?  

Items that add up to 100% are sometimes called "compositional
data" and there are books on the subject. When some proportions
are small (less than 10%, say), your correlations, etc., might be
more robust if you convert the proportions to logits, where a
logit equals  log(P/(1-P).

Given a related set of scores, I start by looking at frequencies
and looking at correlations. Are some of them "irrelevant"
because the numbers are small? - or does that make them
of special interest?

If there are correlations that are very high between channels,
that suggests that you /might/ want to reduce the count
of variables from 5 to 4 (or fewer).

As you suggest, you could easily look at max(p1 to p5) to fine
the channel that is highest for each person. Or lowest.  And
create a single summary score which is the category (1-5)
which is highest for each person.  Or it might be more interesting
to consider a variation like this: assign the category where the
person has the highest percentile score.  (Or, you could also use
a cutoff, and label "6" for the person who has no score that is
especially high, whatever you mean by "especially" - like, in the
top 10 or 15 percent.)

Hope this helps.

--
Rich Ulrich


 







From: SPSSX(r) Discussion <[hidden email]> on behalf of helefun <[hidden email]>
Sent: Wednesday, July 24, 2019 5:38 AM
To: [hidden email] <[hidden email]>
Subject: Identifying
 
Hello,
I have 5 variables that identify channel usage by percentage. Each variable
identifies a different channel and the 5 variables add up to 100%. I am
wondering how I can I compute this into 1 variable (to try and provide a
score for each row of data), to be able to compute it (eg. compare,
correlate) with other variables? I have some ideas such as identifying the
variable with highest percentage, but I am not sure that this is the best
way.
Thanks,



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying

spss.giesel@yahoo.de
You could try to detect channel profiles by first clustering the channel probabilities.
Depending on the cluster you could find relations with other variables.

Mario Giesel
Munich, Germany


Am Donnerstag, 25. Juli 2019, 04:21:29 MESZ hat Rich Ulrich <[hidden email]> Folgendes geschrieben:


Do you have any hypotheses?  (What is a "channel"?)
Why do you care about the data?  

Items that add up to 100% are sometimes called "compositional
data" and there are books on the subject. When some proportions
are small (less than 10%, say), your correlations, etc., might be
more robust if you convert the proportions to logits, where a
logit equals  log(P/(1-P).

Given a related set of scores, I start by looking at frequencies
and looking at correlations. Are some of them "irrelevant"
because the numbers are small? - or does that make them
of special interest?

If there are correlations that are very high between channels,
that suggests that you /might/ want to reduce the count
of variables from 5 to 4 (or fewer).

As you suggest, you could easily look at max(p1 to p5) to fine
the channel that is highest for each person. Or lowest.  And
create a single summary score which is the category (1-5)
which is highest for each person.  Or it might be more interesting
to consider a variation like this: assign the category where the
person has the highest percentile score.  (Or, you could also use
a cutoff, and label "6" for the person who has no score that is
especially high, whatever you mean by "especially" - like, in the
top 10 or 15 percent.)

Hope this helps.

--
Rich Ulrich


 







From: SPSSX(r) Discussion <[hidden email]> on behalf of helefun <[hidden email]>
Sent: Wednesday, July 24, 2019 5:38 AM
To: [hidden email] <[hidden email]>
Subject: Identifying
 
Hello,
I have 5 variables that identify channel usage by percentage. Each variable
identifies a different channel and the 5 variables add up to 100%. I am
wondering how I can I compute this into 1 variable (to try and provide a
score for each row of data), to be able to compute it (eg. compare,
correlate) with other variables? I have some ideas such as identifying the
variable with highest percentage, but I am not sure that this is the best
way.
Thanks,



--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD