Dark figure

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Dark figure

drfg2008
Does anyone have a literary reference to the calculation of unknown cases / “Dark figure” or knows how to solve (if at all) the following problem or has any idea?

The following example:
Two real estate platforms publish houses that are for sale (sold homes). These two real estate  platforms are independent of each other and cover a very large share of homes sold in a country per year. However, it is not known exactly what percentage of the total market each of the platforms cover.

It is known:
A) The number of sold homes per year, which were offered on platform X ONLY.

B) The number of sold homes per year that were offered on platform Y ONLY.

C) The number of sold homes per year, that were offered on both platforms, X and Y.

Question: How many properties have been offered in a specific  year (including the properties that were not on platform X and Y)?
thx
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: Dark figure

Andy W
See capture-recapture, https://en.wikipedia.org/wiki/Mark_and_recapture

If you really know the platforms are independent, the formula is simply:

Estimated Total sold homes per year = (A*B)/C

see the wikipedia page for more references.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Dark figure

Rich Ulrich
Andy: I think you blew it. 

See the Wikip page, where A and B are /marginal totals/ and not the
cell frequencies as defined in the Question.  The formula works using
the marginal totals.

The example on the Wikipedia page has cell frequencies implicitly
defined as (10, 5), (10, 5).  For the formula to work, the 2x2 table,
labeled appropriately for the formula, is

   -    +         total
- 10   5
+ 10   5=C  ,  15= A
   ------
  20  10=B

Then 15*10/5 gives 30, the total for all the cells.

Whether you start from the cells or the marginal totals, you can fill in
three of the cells.  Then the simple estimate for the other cell is the
number that is proportional, so that the correlation is zero:  Independence.

The page also shows estimators that are less biased, and has links to
similar problems.

--
Rich Ulrich


> Date: Wed, 25 May 2016 07:43:10 -0700
> From: [hidden email]
> Subject: Re: Dark figure
> To: [hidden email]
>
> See capture-recapture, https://en.wikipedia.org/wiki/Mark_and_recapture
>
> If you really know the platforms are independent, the formula is simply:
>
> Estimated Total sold homes per year = (A*B)/C
>
> see the wikipedia page for more references.
>
>

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dark figure

Andy W
Yep your right, good catch. So here it would be

N = [(A+C)*(B+C)]/C
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Dark figure

Mike
In reply to this post by Andy W
Just a couple of suggestions to look at:

(1) See:
Chao, A., Tsay, P. K., Lin, S. H., Shau, W. Y., &
Chao, D. Y. (2001). The applications of capture-recapture
models to epidemiological data. Statistics in medicine,
20(20), 3123-3157.

This is available at:
https://www.researchgate.net/profile/Anne_Chao/publication/263455563_Population_size_estimation_for_capture-recapture_models_with_applications_to_epidemiological_data/links/55a221fe08ae1c0e046418d5.pdf

(2) See:
Amstrup, S. C., McDonald, T. L., & Manly, B. F. (Eds.). (2010).
Handbook of capture-recapture analysis. Princeton University Press.

Portions of this can be seen in preview mode on Google books:
https://books.google.com/books?hl=en&lr=&id=hOJxGNERUKgC&oi=fnd&pg=PP2&ots=-3VFWM9c6F&sig=-TJXIgnQ-yXA7rvJFEBypGKR-5I#v=onepage&q&f=false

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Andy W" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, May 25, 2016 10:43 AM
Subject: Re: Dark figure


> See capture-recapture,
> https://en.wikipedia.org/wiki/Mark_and_recapture
>
> If you really know the platforms are independent, the formula is
> simply:
>
> Estimated Total sold homes per year = (A*B)/C
>
> see the wikipedia page for more references.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dark figure

Andy W
Yeah there is a quite a bit of work on this problem across a few fields. (Statistical Science just posted an issue on the topic.)

A bit of a less mathy (than typical journal articles at least) introduction that I have saved is here,

http://granta.com/violence-in-blue/

They also talk about what happens when the samples are correlated.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Dark figure

Mike
On Wednesday, May 25, 2016 12:20 PM,  Andy W wrote:

> Yeah there is a quite a bit of work on this problem across a few
> fields.
> (Statistical Science just posted an issue on the topic.)
>
> A bit of a less mathy (than typical journal articles at least)
> introduction
> that I have saved is here,
>
> http://granta.com/violence-in-blue/
>
> They also talk about what happens when the samples are correlated.

Nice article.  It lays out the basics clearly as well as the problem of
dependence and the need for three or more "lists" to measure that
dependence.  With two lists, like in the situation that Frank provided,
the correlation cannot be directly estimated but one can see the effect
of different degrees of correlation on the estimates.

And then there is the problem of the "uncatchables", that is, relevant
instances that for one reason or another, cannot be counted (in the
article above, areas that refuse to provide the FBI with their number
of police based homicides are an example).  Which makes the
numbers more suspect but hopefully the dark arts of statistics can
help to remedy the situation somewhat. ;-)

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD