Uncle Sam likes to stop by every 10 years to ask a few questions for a little thing we call the Census. You’ve probably heard of it.
It’s required under U.S. constitutional law, and it’s results are used to determine congressional seats, electoral votes and funding for government spending (except for higher education, of course).
The details are listed under Title 13 of the United States Code, and under that very same Title is the infamous Section 9.
This little bit of legal protection makes sure the government doesn’t require you to answer how much you make, then tell all your friends — and anyone with access to the Internet.
Normally, the methods the government uses to keep findings confidential have little effect on research. Occasionally, things get messy.
In this case, a study done by three researchers at The National Bureau of Economic Research found some serious problems with Uncle Sam’s math.
The method the researchers used was actually ridiculously simple. They took data from a public use micro-sample, or PUMS, available from ipums.org and compared it with other data released by the government. In theory, they should yield similar population counts.
I’ll bet you can already see where this is going.
Most of the values were close, but once you hit the 65 and older generation, things get wacky. The difference between the two population counts varied by as much as 15 percent in some cases.
This means information used in debates about things like Social Security and health care has serious flaws that need to be addressed.
Yes, we have to virtually start from the beginning. Our data, what many of these decisions are based on, is just plain wrong.
If you’re curious how our government could faux pas so badly — how something like this could happen — let me explain.
To protect us little guys, the powers that be take all our information and play around with it a bit — not totally unlike how kids squish play-dough.
Essentially, they use several simple methods.
First, they hide some information. This method is useful in situations where there are small sample sizes. It’s as if we lived in a village with only 100 citizens, and only two of us were older than 70. It would be terribly easy to use public Census data to find out their income.
Second, they add noise. They take the number and professionally fudge ’em a little. Obviously, they (try to) keep the values close to the originals — we see how that’s working out.
Third and fourth are data swapping and synthetic data. Data swapping is just what it sounds like. If you and I have similar information, they take some liberty and swap our information around. One would hope that, after many, many swaps, the statistical information will be similar while still protecting our confidentiality.
With synthetic data, the government literally makes up data.
That’s right — they make it up.
I won’t go into this too much, but imagine if you and I were about to set sail across the Atlantic with a “synthetic” map. How would you feel if I gave your airline pilot a “synthetic” route to the Bahamas?
How can we justify guiding missiles with real data, but government policy, our health, with “synthetic” data?
So, it’s a catch-22. If we don’t scramble the information enough, we break the law and reveal personal information to the world. If we scramble it too much, the data becomes useless — and downright misleading.
I’m not sure what the solution is. We’re in a legal bind now where it would be impossible for anyone to get access to the unfiltered information, the same information we need to make policy decisions.
After all the swaps, scrambles and made-up numbers, the government has succeeded — we’re totally safe from the Census, because, especially at certain ages, it doesn’t even reflect reality.
Devin Graham is a 21-year-old business management senior from Prairieville. Follow him on Twitter @TDR_dgraham.
–
Contact Devin Graham at dgraham@lsureveille.com
The Bottom Line: Census skews important research, time to start over
November 16, 2010