My question is for those of you who analyze or have analyzed big online or paper based health and behavior datasets. Respondents do all sorts of things and the results of some of those things can be easily dealt with but not so much for
others. Scenario: Drug and alcohol (D&A) use question section and a social/behavioral use consequences section. D&A framed as recentness or frequency; consequences framed the same. Consider, for instance, a 100K respondent dataset. Dataset is clean in conventional
terms. I’m interested in looking at sets of items against sets of other items, e.g., D&A against consequences. My first question is about detecting and quantifying response patterns and my second question is the criteria and the decisions you make about those
patterns. As a specific pattern, consider that 3% say ‘past month’ to all D&A items and 5% say ‘past year’ to all consequences items. I know there is a literature on ‘careless responding’ but it doesn’t seem scalable.
Thanks, Gene Maguin |
Surveys I have worked on previously (variants of Monitoring the Future) have placed a "false drug" question to spot folks who said they took everything all the time.
They would also eliminate folks based on the impossibility of some drug habits. So the drug questions would have frequency of use, once in a month, once in a week, every day etc. Everyday is feasible for smoking cigarettes - not so much for heroin. So they had rules to eliminate surveys if they stated too high a frequency for too many drugs. Ad-hoc, but that is scalable to large datasets. |
Administrator
|
In reply to this post by Maguin, Eugene
Hi Gene. I've not yet used it myself, but I wonder if DETECTANOMALY (GUI: Data > Identify Unusual Cases) would be useful.
http://www.ibm.com/support/knowledgecenter/SSLVMB_24.0.0/spss/data_validation/syn_detectanomaly.html Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
I was going to suggest giving that a try. It does clustering and then looks for outliers. It should scale well to large datasets. On Wed, Feb 1, 2017 at 12:30 PM, Bruce Weaver <[hidden email]> wrote: Hi Gene. I've not yet used it myself, but I wonder if DETECTANOMALY (GUI: |
Administrator
|
In reply to this post by Andy W
Given the current political climate and the war on 'drugs'/stigma regarding smoking-drinking-bonging etc, I suspect most people would provide UNDER estimates of the frequency in which they indulge.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I wouldn't bet money on that being true, but I do not know for sure.
For context, Monitoring the Future is for teenagers and is a paper based survey. They were more concerned about kids answering yes to all the drug questions as opposed to people being embarrassed. You would think that would be more of a problem for in person interviews, but I am not familiar with any major surveys worrying about that so much (such as doing randomized responses, https://en.wikipedia.org/wiki/Randomized_response). Simply by answering a survey at this point most adults are defacto outliers. |
Free forum by Nabble | Edit this page |