big datasets and careless responding

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

big datasets and careless responding

Maguin, Eugene

My question is for those of you who analyze or have analyzed big online or paper based health and behavior datasets. Respondents do all sorts of things and the results of some of those things can be easily dealt with but not so much for others. Scenario: Drug and alcohol (D&A) use question section and a social/behavioral use consequences section. D&A framed as recentness or frequency; consequences framed the same. Consider, for instance, a 100K respondent dataset. Dataset is clean in conventional terms. I’m interested in looking at sets of items against sets of other items, e.g., D&A against consequences. My first question is about detecting and quantifying response patterns and my second question is the criteria and the decisions you make about those patterns. As a specific pattern, consider that 3% say ‘past month’ to all D&A items and 5% say ‘past year’ to all consequences items. I know there is a literature on ‘careless responding’ but it doesn’t seem scalable.

 

Thanks, Gene Maguin

 

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: big datasets and careless responding

Andy W
Surveys I have worked on previously (variants of Monitoring the Future) have placed a "false drug" question to spot folks who said they took everything all the time.

They would also eliminate folks based on the impossibility of some drug habits. So the drug questions would have frequency of use, once in a month, once in a week, every day etc. Everyday is feasible for smoking cigarettes - not so much for heroin. So they had rules to eliminate surveys if they stated too high a frequency for too many drugs. Ad-hoc, but that is scalable to large datasets.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: big datasets and careless responding

Bruce Weaver
Administrator
In reply to this post by Maguin, Eugene
Hi Gene.  I've not yet used it myself, but I wonder if DETECTANOMALY (GUI: Data > Identify Unusual Cases) would be useful.  

http://www.ibm.com/support/knowledgecenter/SSLVMB_24.0.0/spss/data_validation/syn_detectanomaly.html

Cheers,
Bruce

Maguin, Eugene wrote
My question is for those of you who analyze or have analyzed big online or paper based health and behavior datasets. Respondents do all sorts of things and the results of some of those things can be easily dealt with but not so much for others. Scenario: Drug and alcohol (D&A) use question section and a social/behavioral use consequences section. D&A framed as recentness or frequency; consequences framed the same. Consider, for instance, a 100K respondent dataset. Dataset is clean in conventional terms. I'm interested in looking at sets of items against sets of other items, e.g., D&A against consequences. My first question is about detecting and quantifying response patterns and my second question is the criteria and the decisions you make about those patterns. As a specific pattern, consider that 3% say 'past month' to all D&A items and 5% say 'past year' to all consequences items. I know there is a literature on 'careless responding' but it doesn't seem scalable.

Thanks, Gene Maguin





=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: big datasets and careless responding

Jon Peck
I was going to suggest giving that a try.  It does clustering and then looks for outliers.  It should scale well to large datasets.

On Wed, Feb 1, 2017 at 12:30 PM, Bruce Weaver <[hidden email]> wrote:
Hi Gene.  I've not yet used it myself, but I wonder if DETECTANOMALY (GUI:
*Data > Identify Unusual Cases*) would be useful.

http://www.ibm.com/support/knowledgecenter/SSLVMB_24.0.0/spss/data_validation/syn_detectanomaly.html

Cheers,
Bruce


Maguin, Eugene wrote
> My question is for those of you who analyze or have analyzed big online or
> paper based health and behavior datasets. Respondents do all sorts of
> things and the results of some of those things can be easily dealt with
> but not so much for others. Scenario: Drug and alcohol (D&A) use question
> section and a social/behavioral use consequences section. D&A framed as
> recentness or frequency; consequences framed the same. Consider, for
> instance, a 100K respondent dataset. Dataset is clean in conventional
> terms. I'm interested in looking at sets of items against sets of other
> items, e.g., D&A against consequences. My first question is about
> detecting and quantifying response patterns and my second question is the
> criteria and the decisions you make about those patterns. As a specific
> pattern, consider that 3% say 'past month' to all D&A items and 5% say
> 'past year' to all consequences items. I know there is a literature on
> 'careless responding' but it doesn't seem scalable.
>
> Thanks, Gene Maguin
>
>
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/big-datasets-and-careless-responding-tp5733768p5733770.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: big datasets and careless responding

David Marso
Administrator
In reply to this post by Andy W
Given the current political climate and the war on 'drugs'/stigma regarding smoking-drinking-bonging etc, I suspect most people would provide UNDER estimates of the frequency in which they indulge.

Andy W wrote
Surveys I have worked on previously (variants of Monitoring the Future) have placed a "false drug" question to spot folks who said they took everything all the time.

They would also eliminate folks based on the impossibility of some drug habits. So the drug questions would have frequency of use, once in a month, once in a week, every day etc. Everyday is feasible for smoking cigarettes - not so much for heroin. So they had rules to eliminate surveys if they stated too high a frequency for too many drugs. Ad-hoc, but that is scalable to large datasets.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: big datasets and careless responding

Andy W
I wouldn't bet money on that being true, but I do not know for sure.

For context, Monitoring the Future is for teenagers and is a paper based survey. They were more concerned about kids answering yes to all the drug questions as opposed to people being embarrassed.

You would think that would be more of a problem for in person interviews, but I am not familiar with any major surveys worrying about that so much (such as doing randomized responses, https://en.wikipedia.org/wiki/Randomized_response).

Simply by answering a survey at this point most adults are defacto outliers.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/