Weighting for stratified, single-stage cluster sampling with unequal cluster sizes

M

MaraS

Guest
#1
I’m analyzing previously-collected survey data that was collected in the following manner:

Data collection was stratified by urban and rural locations, with proportionate allocation of primary sampling units (in this case, villages) based on somewhat outdated census data. The proportionate allocation of 30 PSUs was determined to be 17 in the urban location and 13 in the rural location based on their relative populations according to the census. The specified number of PSUs were then selected using simple random sampling within each location (selected 17 out of a possible 30 villages in the urban location and 13 out of a possible 31 villages in the rural location). All eligible respondents were interviewed in each selected PSU. PSUs were NOT identical in size. The total number of people interviewed ended up being 594 urban and 868 rural for a total sample size of 1462.

My question is—how should I weight the data to be representative of the population as a whole? I keep getting hung up on how to account for (1) the theoretically proportionate allocation of PSUs using the census data (is it a problem that the number of people interviewed doesn’t match this at all??) and (2) the fact that the PSUs had different population sizes.

Any guidance you can provide would be greatly appreciated!
 
A

ammar

Guest
#2
hi there
I am new here and I want some assistance to solve my problem. I have 6 tables, each table has two column (time vs distance). it was set the final value for all the distance in each table to 23.5 but the time required to reach that value was different. for example table 1 needs 15 min to reach 23.5 while the second table shows 20 min to reach 23.5 and so on. the question is how can I calculate the standard deviation and error bar for these tables?


Regards

A. W.
 
M

MaraS

Guest
#3
Hi ammar,

I'm not sure I understand the format of your data, but I think I can help you regardless. It is rare to calculate the standard deviation manually, but if your data set is small, the step-by-step process is broken down quite well by the Khan Academy, here: https://www.khanacademy.org/math/probability/data-distributions-a1/summarizing-spread-distributions/a/calculating-standard-deviation-step-by-step

If you have the data in Excel, then the command =STDEV() will calculate the standard deviation for you (or STDEV.P() in Excel 365).

Because you are calculating the standard deviation and not the standard error, I'm going to assume that you are working with the full population of your data, rather than a sample. In that case, the error bars are simply your standard deviation added and subtracted from your mean. So, for example, if the mean of your data is 12 and the standard error is 2, then your error bars would go from 12-2 =10 to 12+2 = 14.

Hope this helps! It may also help to post your question as its own thread if you hope to receive more answers--no one seems to be responding to my question in this thread!

Best,
Mara

hi there
I am new here and I want some assistance to solve my problem. I have 6 tables, each table has two column (time vs distance). it was set the final value for all the distance in each table to 23.5 but the time required to reach that value was different. for example table 1 needs 15 min to reach 23.5 while the second table shows 20 min to reach 23.5 and so on. the question is how can I calculate the standard deviation and error bar for these tables?


Regards

A. W.