Retrieved from https://studentshare.org/finance-accounting/1671240-assignment-4
https://studentshare.org/finance-accounting/1671240-assignment-4.
Suppose you went around writing down every possible number you saw - dollar amounts from ATM receipts, random statistics about annual crop yields from the almanac, page numbers of the book you're reading - it doesn't matter the context as long as you gather as many numbers from as many different random sources as possible. You may be asking yourself "Why on Earth anyone would want to do such a thing?" Good question. You're smart! The answer has to do with Benford's Law. So buckle up.
So now you've got this list of hundreds, maybe thousands, of numbers. Now imagine each number is on its own index card inside of a grab bag. So pick a card, any card. Wouldn't you think that your odds of finding a number starting with 1 would be the same as finding a number starting with 9? Or 3? Or 7? (See Figure 1) After all, you gathered as many different numbers from as many different locations as possible, so they should all be evenly distributed, right? Wrong!!! Here comes Benford's Law, bitch!
Benford's Law Explained
Benford's law says that the odds of obtaining 1 as the first digit of a number are much higher than obtaining any other digit. (See Figure 2) And nobody can really explain why! Creeeepy. But the coolest thing is that the broader the sampling of numbers, the more accurately they conform to Benford's law. For example, if you only examined the numbers in a New York City phone book, it wouldn't fit with Benford's law because your data would favor 2s and 7s (because of the popular area codes 212 and 718). But mix a phone book's numbers with an almanac's numbers with an encyclopedia's numbers and without a doubt you'll start seeing a "Benfordian" distribution. Didn't I tell you this shit'd freak you out?
But the most important part of Benford's law (and partially why it's so fascinating) is that it only works with numbers observed and gathered from the real world. So if you were to randomly generate a list of numbers with a computer, or by simply making them up, their first digits would most likely be evenly distributed from 1-9 and NOT in accordance with Benford's law. (See Figure 1 again). For this reason, Benford's law is used by the IRS to spot defrauders who make up phony numbers, because if the numbers don't follow Benford's law, they weren't from real transactions.
My Experiment
Fascinated by all this, I decided to test it for myself. Rather than spend years gathering numbers from all over the world, I decided to turn to Google - arguably the broadest source of data in existence. Seeing how many results Google finds for a number is a surefire way to judge how many times that number appears in the real world. With a little help from macscripter.net I was able to write a program that Googled a list of numbers almost instantly. So I generated a few lists of random numbers* and fed them into Google. When I looked at the results, sure enough, the numbers that started with 1 showed up the most frequently, followed by 2, then 3 and so on, in a near-perfect example of Benford's law. Holy fuck!!! (Compare Figures 2 and 4)
As expected, 1,000 Google searches is much more in accordance with Benford's law than 100 searches. (Compare Figures 3 and 4) I would have tried 10,000 searches, but Google's "acceptable use policy" forbids any automated searching, so I didn't want to risk it that much. But at least I got to break the law, which made me feel a little less geeky about doing a math experiment. But the cursing helps too.
Read More