SPSSX Discussion

Categorizing a power law distribution of user participation in an online community (non spss question)

Classic

List

Threaded

13 messages Options

whatsinaname

Categorizing a power law distribution of user participation in an online community (non spss question)

I have data on participation in a large scale online community. Participation follows a power law distribution. This means that a fraction of the users are responsible for the majority of posts, i.e. a few people post a LOT while a lot of people don’t post much at all.

My question is how to segment this population to tease apart differences between high, low, and 'in between' usage? Splitting users into groups at equal percentiles does not seem appropriate. I have not come across an established method for this kind of segmentation.

Thoughts?

Thanks in advance!

David Marso

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Administrator

First thing I would do I create a histogram and see if there are obvious clumps.

whatsinaname wrote

I have data on participation in a large scale online community. Participation follows a power law distribution. This means that a fraction of the users are responsible for the majority of posts, i.e. a few people post a LOT while a lot of people don’t post much at all.

My question is how to segment this population to tease apart differences between high, low, and 'in between' usage? Splitting users into groups at equal percentiles does not seem appropriate. I have not come across an established method for this kind of segmentation.

Thoughts?

Thanks in advance!

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

John F Hall

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Alternatively, make an arbitrary decision as to what counts as High and what counts as Low, then use something like:

RECODE (lo thru <1st cut point> = 1)( <1st cut point> thru <2nd cut point> = 2)( <2nd cut point> thru HI = 3) INTO TESTVAR.

FREQ TEStVAR .

If you have missing values, you'll need to replace lo with lowest valid value, Hi with highest valid value and add (ELSE = SYMIS) to the RECODE command.

Forget statistics, make sociological sense first.

Advice from a died-in-the-wool Old Dog survey researcher.

John Hall

John F Hall (Mr)

Email: [hidden email]
Website: www.surveyresearch.weebly.com

PS Have a look at "Cyberchiefs", a book by Matthieu O'Neil (Pluto Press, 2009) about democracy and the formation of hierarchies in social networks.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: 19 May 2012 15:55
To: [hidden email]
Subject: Re: Categorizing a power law distribution of user participation in an online community (non spss question)

First thing I would do I create a histogram and see if there are obvious clumps.

whatsinaname wrote

>
> I have data on participation in a large scale online community.
> Participation follows a power law distribution. This means that a
> fraction of the users are responsible for the majority of posts, i.e.
> a few people post a LOT while a lot of people don’t post much at all.
>
> My question is how to segment this population to tease apart
> differences between high, low, and 'in between' usage? Splitting
> users into groups at equal percentiles does not seem appropriate. I
> have not come across an established method for this kind of segmentation.
>
> Thoughts?
>
> Thanks in advance!
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Categorizing-a-power-law-distribution-of-user-participation-in-an-online-community-non-spss-question-tp5712286p5712325.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Administrator

In reply to this post by David Marso

You can also use the Chart Editor to display different types of distribution curves on a histogram.

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fidh_webhelp_distribution_palette.htm

HTH.

David Marso wrote

First thing I would do I create a histogram and see if there are obvious clumps.

whatsinaname wrote

I have data on participation in a large scale online community. Participation follows a power law distribution. This means that a fraction of the users are responsible for the majority of posts, i.e. a few people post a LOT while a lot of people don’t post much at all.

My question is how to segment this population to tease apart differences between high, low, and 'in between' usage? Splitting users into groups at equal percentiles does not seem appropriate. I have not come across an established method for this kind of segmentation.

Thoughts?

Thanks in advance!

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Jon K Peck

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

In reply to this post by John F Hall

Do you have predictors? What are you going to do with this distribution?

If you want to just fit a power law sort of distribution, try a q-q plot with, say, Pareto, as the distribution. Analyze > Descriptive Statistics > Q-Q plots or PPLOT ... /TYPE = Q-Q.

If you want segment interactively, try Transform > Visual Binning, or, especially if you have a target, Transform > Optimal Binning.

HTH

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

-----Original Message----- From: SPSSX(r) Discussion [[hidden email]] On Behalf Of David Marso Sent: 19 May 2012 15:55 To: [hidden email] Subject: Re: Categorizing a power law distribution of user participation in an online community (non spss question) First thing I would do I create a histogram and see if there are obvious clumps. whatsinaname wrote > > I have data on participation in a large scale online community. > Participation follows a power law distribution. This means that a > fraction of the users are responsible for the majority of posts, i.e. > a few people post a LOT while a lot of people don’t post much at all. > > My question is how to segment this population to tease apart > differences between high, low, and 'in between' usage? Splitting > users into groups at equal percentiles does not seem appropriate. I > have not come across an established method for this kind of segmentation. > > Thoughts? > > Thanks in advance! > -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Categorizing-a-power-law-distribution-of-user-participation-in-an-online-community-non-spss-question-tp5712286p5712325.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

In reply to this post by Bruce Weaver

you can also work with visual binning.

You can recode your variable into a new variable. Try to avoid using (... =sysmis) whenever possible. You the user decided that the value should be missing.

Art Kendall
Social Research Consultants

On 5/19/2012 10:24 AM, Bruce Weaver wrote:

You can also use the Chart Editor to display different types of distribution
curves on a histogram.

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fidh_webhelp_distribution_palette.htm

HTH.



David Marso wrote

First thing I would do I create a histogram and see if there are obvious
clumps.


whatsinaname wrote

I have data on participation in a large scale online community.
Participation follows a power law distribution. This means that a
fraction of the users are responsible for the majority of posts, i.e. a
few people post a LOT while a lot of people don’t post much at all.

My question is how to segment this population to tease apart differences
between high, low, and 'in between' usage?  Splitting users into groups
at equal percentiles does not seem appropriate.  I have not come across
an established method for this kind of segmentation.

Thoughts?

Thanks in advance!


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Categorizing-a-power-law-distribution-of-user-participation-in-an-online-community-non-spss-question-tp5712286p5712338.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Rich Ulrich

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

In reply to this post by whatsinaname

As Jon asks, What is your purpose? ... "teasing apart
differences..." only says a little bit.

When I was watching over various aspects of computer
usage on mainframes in the 1980s, it was useful to me to list
some raw data -- ID, and relevant related information --
the top 10 or 20 users in a category, in descending order.
My purpose was related to "managing a scarce resource."

The Pareto-curve descriptions were useful, saying (for instance)
90% of the consumption was by 10% of the people. Or "by 3 people".
It is popular to use reciprocal fractions, like 90/10 or 80/20, and
it is also popular to use rounded-off cut-offs, like "the top 1%" and
"the top 10%", when those fractions account for large amounts
of the resource in question.

However, you are referring to e-mails/posts, so that is not a
limited resource.

I does make sense to lump together the top 2 users if they are
similar in profile, or the same user under two names, but that
is while still thinking of using up a resource. Is that sort of
reduction useful for your data summary?

Whatever the subject, it makes less sense to "lump" some fraction,
the more that the aggregated folks differ. But what "differences" or
what characteristics are going to be relevant to you? - That takes us
back to the question, What is your purpose?

--
Rich Ulrich

> Date: Sat, 19 May 2012 04:41:58 -0700

> From: [hidden email]
> Subject: Categorizing a power law distribution of user participation in an online community (non spss question)
> To: [hidden email]
>
> I have data on participation in a large scale online community.
> Participation follows a power law distribution. This means that a fraction
> of the users are responsible for the majority of posts, i.e. a few people
> post a LOT while a lot of people don’t post much at all.
>
> My question is how to segment this population to tease apart differences
> between high, low, and 'in between' usage? Splitting users into groups at
> equal percentiles does not seem appropriate. I have not come across an
> established method for this kind of segmentation.
>
> Thoughts?
> ...

whatsinaname

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Thanks for all the helpful replies.... much appreciated!

I am trying to create a model where 'degree of participation in online community' is one of many predictors of an outcome such as performance. I have a measure of performance, just need a better understanding of how to model participation.

Rich Ulrich

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Taking what is "meaningful" - I would say that participating
in an average of one "thread" per month is a fairly high
level of participation. The number of threads can be more
salient than the number of posts, especially if folks do create
new Subject: lines as needed, stick to the topic, and don't
often break one topic into multiple threads.

Creating a new thread can be different from Replying.

This assumes that your data and software can readily
define a thread.

For people with the same average, regular participation
is a different commitment from sporadic. But that might not
be easy to disentangle.

--
Rich Ulrich

> Date: Sat, 19 May 2012 12:37:31 -0700

> From: [hidden email]
> Subject: Re: Categorizing a power law distribution of user participation in an online community (non spss question)
> To: [hidden email]
>
> Thanks for all the helpful replies.... much appreciated!
>
> I am trying to create a model where 'degree of participation in online
> community' is one of many predictors of an outcome such as performance. I
> have a measure of performance, just need a better understanding of how to
> model participation.
>

Maguin, Eugene

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

In reply to this post by whatsinaname

I know you asked about categorizing but I wonder if an alternative, possibly useful, variable would be the log of the posting frequency.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of whatsinaname
Sent: Saturday, May 19, 2012 3:38 PM
To: [hidden email]
Subject: Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Thanks for all the helpful replies.... much appreciated!

I am trying to create a model where 'degree of participation in online community' is one of many predictors of an outcome such as performance. I have a measure of performance, just need a better understanding of how to model participation.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Categorizing-a-power-law-distribution-of-user-participation-in-an-online-community-non-spss-question-tp5712286p5712418.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

whatsinaname

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Gene, can you tell me more about why you think log of posting frequency would be a good variable?

Bruce Weaver

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Administrator

What Gene is getting at, I suspect, is that it's usually better to analyze continuous variables as continuous rather than carving them into categories. There are lots of articles that address this issue, including this very readable one by Dave Streiner:

http://ww1.cpa-apc.org/publications/archives/cjp/2002/april/researchMethodsDichotomizingData.asp

HTH.

whatsinaname wrote

Gene, can you tell me more about why you think log of posting frequency would be a good variable?

Maguin, Eugene

Re: Categorizing a power law distribution of user participation in an online community (non spss question)

In reply to this post by whatsinaname

As Bruce said, that is the reason. As part of that reason, with a DV like log frequency, you can look at the linearity of the relationship between the predictors and changes in the DV.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of whatsinaname
Sent: Sunday, May 20, 2012 3:23 PM
To: [hidden email]
Subject: Re: Categorizing a power law distribution of user participation in an online community (non spss question)

Gene, can you tell me more about why you think log of posting frequency would be a good variable?

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Categorizing-a-power-law-distribution-of-user-participation-in-an-online-community-non-spss-question-tp5712286p5712691.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD