Benford’s Law: First Digit Analysis and Its Applications to the World

Griffin McCauley
25 min readNov 25, 2020

Setting the Stage

You probably remember learning about the special patterns that occur in nature as a child. These might include cases of symmetry, spirals, tessellations, and fractals, amongst others. One of the most common of these patterns, however, is the Golden Ratio.

The Golden Ratio in Nature

The beautiful imagery evoked by this ratio is eye catching and very visually appealing as you can see above, but what if I told you there was an even simpler and more elegant law to which nearly all random collections of data in nature adhere? And what if I told you that random data sets weren’t quite so random after all? In fact, what if I told you that most numbers in our universe start with lower valued digits? This would be quite surprising, I suspect. It seems intuitive that each of the nine digits which numbers can start with in our base 10 system should be “equicoincident,” but this is not the case. Benford’s Law is a beautiful mathematical law that, put simply, states that, in many seemingly random, naturally occurring datasets, the frequency of numbers with low valued first digits will actually be higher than that of numbers with high valued first digits. In other words, there are more numbers in datasets whose first digit is a one than a two, more whose first digit is a two than a three, and so on. The probability of the first digit being each of the nine possible starting digits in the base 10 system is, in fact, logarithmically distributed as depicted in the graphic below. The supplemental table includes the specific probabilities for each possible first digit, and, as you can see, while the number 1 appears as the first digit approximately 30.1% of the time, the digit 9 only appears as the first digit approximately 4.6% of the time.

Chart of First Digit Probabilities
Table of First Digit Probabilities

This may seem surprising and potentially shocking to you as it was to me the first time I heard about it, but I assure you that it is true, and, in fact, you can check it for yourself. This pattern occurs in a plethora of naturally occurring datasets from the lengths of rivers around the world, to molecular weights, to the population of U.S. cities, and even to the distances from Earth to other galaxies. These are just a few examples of where you can see this Law, but one collection of data you probably have easy access to without much effort is your followers on social media. In particular, the number of followers of your followers will conform to Benford’s as well. While you may initially think this Law is quite niche and may be interesting but not particularly useful, I hope to make clear that this Law does in fact have a tremendous number of applications which span a vast array of different fields and disciplines. It truly does act as a force that connects us all and underlies so much of the natural world we interact with every day whether we know it or not. First, however, let us unpack what it really is saying more formally and try to grapple with how this law can be mathematically sound.

The History & Formal Definition of Benford’s Law

While the central idea of Benford’s Law is quite simple and easy to grasp, the mathematical underpinnings are not as trivial and are rigorous enough to support such a universally documented phenomena. It all started in 1881 when astronomer Simon Newcomb made a peculiar observation while riffling through pages of a well used book of logarithm tables. (For those of us not born in the early 1900s, a book of logarithm tables is just a book containing a series of tables that can be used to compute the values of the logarithm function for different numbers. Essentially, they were analog calculators.) What he noticed was that the pages near the beginning of the book were substantially more worn than those towards the end, indicating that it was far more frequent for people to be looking up logs of numbers with low valued starting digits than with large valued starting digits. This led him to begin wondering whether this was just an anomaly or whether this represented a deeper underlying pattern of numbers which was physically manifesting itself. After delving into this idea deeper, he developed and published the principle that, in any random set of natural data, one should expect to find more numbers starting with 1 than any other digit. Over the next half century, however, this proposed law slipped into obscurity until 1938 when it resurfaced once more, following the identical discovery by the eponymous physicist Frank Benford. Rather than simply publishing this vague observation and moving on, what Benford decided to do was to collect as much data as he could across many natural domains and to document the first digits of all of the entries he collected. Some of the results of this analysis are shown in the table below.

First Digits of Data from a Diverse Array of Fields

What he found was that, rather than being uniformly distributed across the nine possible starting digits of the base 10 system, smaller valued starting digits were indeed far more frequent. In fact, the starting digits actually fit a logarithmic curve as shown in the first diagram and table above, and it was uncovered that the probabilities of each were uniformly distributed across the logarithm values of each. (We are assuming log base 10 throughout unless stated otherwise.) Since the logarithm function is monotonic, the probability of x being bounded by a and b is proportional to the probability of being between log a and log b, and, since the probability is uniform from log(1) = 0 to log(10) = 1, this distribution will have a density of 1 across its domain. Thus, the probability of a number starting with the digit d is just equal to log(d+1)-log(d) = log((d+1)/d) = log(1+(1/d)).

In order to think about what we are doing with this Law in a slightly different but more precise way, let us briefly consider some of the properties of the logarithm function. When we take the log of, say, 1234, what we are really going to end up looking at is this number expressed in scientific notation as 1.234*10³, and we will quickly find that log(1.234*10³) = log(1.234)+log(10³) = 3+log(1.234). This makes it so that whenever we are looking at the difference between the logs of two numbers that are of the same magnitude of 10 (i.e. 10s, 100s, 1000s, etc.), the magnitude will be irrelevant since all that matters for determining the difference is this fractional part of the logarithm which is also known as the mantissa. Thus, the reason we can always write the probability of a starting digit being d as log(d+1)-log(d) is because, for any magnitude of numbers, all that matters is the mantissa, and what Benford’s Law purports is that these mantissa of numbers found in natural datasets are uniformly distributed on 0 to 1.

When I first heard about this Law, I immediately wondered the same thing as Benford and was curious as to whether this property was in fact a natural phenomena inherent in the universe or whether it was simply a product of our accounting and how we choose to represent numbers in our base 10 system. It turns out that the latter is not the case, and this Law holds regardless of what base we choose to use. Similar to the base 10 formula previously given, in order to generalize to any arbitrary base, we can simply find the probability of having a certain starting digit as the difference between the log of that digit plus one minus the log of that digit, the only difference being that we are using the log in the newly specified base.

In order to assume this uniformity over the logs which allows us to state the probability of the first digit being a certain value, one thing which must be true is that the magnitudes of the values in our dataset must span many magnitudes of the base in which we are working. This is due to the fact that, if the distribution over the logs were narrow and sharply peaked, our relative probability of one digit compared to the others would be warped and no longer equal to just the distance between the log evaluations at the different starting digits since the density would not be constant across the values. This idea is captured in the diagram below where the red area represents the probability of having a starting digit of 1 while the blue area represents the probability of having a starting digit of 8.

Broad Expanse over Logs
Narrow Expanse over Logs

In order to explain this essentially uniform property, we can consider what must be true of the underlying data and observations (which can be seen as random variables for all ostensive purposes). One explanation is that each of these observations can be thought of as the product of many different random variables, and, as the number of random variables we are multiplying increases, we will get a log-normal distribution with an increasingly large standard deviation. This is due to the Central Limit Theorem with which we are all familiar in the context of the sums of random variables converging to the normal distribution, just applied to the product rather than the sum. This works because, if we have the independent random variables X1,…,Xn and are taking the value of their product, we can take the log of this product and rewrite log(X1*…*Xn) as log(X1)+…+log(Xn) which will now simply be a sum of random variables as desired, and we can then apply the Central Limit Theorem to this sum and see that it will converge to a log-normal distribution. As the number of random variables contributing to the final product increases, the standard deviation of this resulting log-normal distribution will increase until we converge to a uniform distribution over the range of values. (Note that the standard deviation will be strictly increasing in this case, as the number of random variables goes up, since the variance operator will behave in a linear fashion on independent random variables and, unlike in the familiar case of the sample mean, there is no factor in the denominator to counteract this growth and cause the variance to shrink.) It can also be shown that, if we apply modulo 1 to this sum of logs of independent random variables, the new density will be approximately uniform on the interval 0 to 1 as desired. Another way to consider how the uniformity arises is to conceptualize the data we are observing in nature as being a set of random variables that come from a random set of distributions. This mixing of distributions will smooth out any one distribution’s specific characteristics and, in aggregate, will result in an even, uniform distribution that causes Benford’s Law to arise. One thing to note at this point is that, given that our entire assumption is based on the notion of layered randomness, we cannot try to apply Benford’s to a set of data which we know will follow a certain form or distribution such as the heights or IQs of a population which should be normal and all of similar scales of magnitude.

This uniformity also leads to other desirable properties which we would certainly expect to hold for this Law. One of these such properties is that our starting digit probabilities are scale invariant. In other words, if we convert from one unit of measurement of the data to another, this does not impact the probabilities associated with the starting digit being a specific value. This can easily be seen in the example of converting from feet to yards since any number of feet between 3 and 6 will map to the first digit in yards being 1, and this probability of the new starting digit being 1 should just be log(2) (since log(1) = 0) which is the same as the sum of the probabilities of the starting digit in feet being between 3 and 6 since log(4)-log(3)+log(5)-log(4)+log(6)-log(5) = log(6)-log(3) = log(6/3) = log(2). This ability to sum up the starting digit probabilities in the units before conversion allows us to see that scale invariance will always hold for Benford’s Law.

One final and very satisfying extension of Benford’s Law is to establish the probability of the nth digit of a number being a specific value. The way we can do this is to first formulate what the probability is of the first digits of a number being the string “abc…” where a, b, c, and so forth are arbitrary digits. This probability will just be log(abc… + 1)-log(abc…). Since we are only interested in the probability of the nth digit being a specific value d, we can just take the sum of this difference of logs over all of the possible combinations of digits coming before that entry. This formula can symbolically be expressed as:

Probability of the nth digit being d

In practice, what you will find is that, once you start looking at roughly the fourth digit, the probability distribution of that digit is uniform 1/10 across the digits 0 through 9. This is due to the fact that, as we get increasingly large strings of numbers of which we must iterate over all the possible permutations before the actual digit of interest, the difference between the log of those strings and the log of those strings plus 1 is marginal, and the probability mass of the digit of interest will be evenly spread across all possible values. While first digit analysis is the most common technique and implemented most abundantly, second digit analysis is not that much more complicated and still has value, so there are some situations where this form of Benford’s Law will actually be more useful in practice. As you can see in the table below, however, even by the time you are looking at the second digit, you have lost a lot of the ability to discriminate the digits based upon their relative frequencies.

Probabilities of the First and Second Digits

At this point, we have thoroughly established that seemingly random collections of numbers found in nature do indeed exhibit this sort of predictable behavior, and, once this was understood, people began to apply this Law to determine whether datasets they came across were “natural” or not.

A Multitude of Applications

Once it was fully established that datasets which came from natural systems should follow Benford’s Law and exhibit this behavior of having the probability distribution of their first digits mirror a logarithmic curve, it was only a matter of time before people started trying to harness this knowledge to determine whether data with which they were being presented should be considered authentic or manipulated. Knowing the natural probability distribution of first digits and, thus, the expected proportion of times each digit should appear turned this problem into a perfect scenario to which to apply a Chi-Squared Test of Goodness of Fit. As you may remember from a previous course in statistics, the chi-squared test attempts to assess how well a given set of data fits a set of category probabilities given under the null hypothesis. More explicitly, when we have a collection of data where we have observed frequencies for each category and expected frequencies for the corresponding categories, we can form the chi-squared test statistic which is the sum of the squared deviations from the expected frequencies divided by the expected frequency over each category (shown symbolically below). Once we have this test statistic, we can find its corresponding p-value by calculating the area under a chi-squared distribution with k-1 degrees of freedom (where k is the number of mutually exclusive categories) at or more extreme than our test statistic.

Chi-Squared Test Statistic

Thus, the problem of determining whether a dataset adheres to or violates Benford’s Law comes down to collecting the first digits of your data set, comparing the proportions of each digit to those expected under Benford’s Law using a chi-squared test, and then either rejecting or failing to reject that it follows Benford’s based on the p-value. In order to appropriately apply a chi-squared test across all nine possible starting digits, however, the sample size you have access to needs to be quite large in order to get reliable results. If you do not have this luxury, there are other tests and statistics you can use as well. Namely, the Kolmogrov-Smirnov and Kuiper tests would be appropriate in the case of small sample size. There have also been test statistics developed just for the purpose of first digit analysis using Benford’s Law and these are described and shown below:

First Digit Analysis Specific Test Statistics

(I will not go into how to implement these different hypothesis tests here since they are far more niche and complicated than the standard Pearson’s Chi-Squared Test which we will be able to use in most instances, but, in case you are attempting to do first digit analysis and are either having issues with your sample size or would like to try implementing more specific tests on your data, I wanted to provide the resource here to identify what you could do to attempt to achieve this.)

One of the initial ways in which this methodology of testing whether data was genuine or tampered with was to determine whether individuals or organizations were manipulating the values on their tax returns. By using the process described above, it was not difficult to construct algorithms that could parse data rapidly and return whether the reported numbers seemed natural or not. While this tactic may seem a little unorthodox and not particularly dependent upon the actual numbers being reported in a meta or high level way, evidence based off of first digit analysis has been accepted by the courts in many cases of tax fraud and was used to identify instances where people were cooking the books in order to scam others in one way or another.

Another interesting way in which Benford’s is being applied to determine the authenticity of a source of information is by analyzing the number of followers of followers on social media. As mentioned at the top of the piece, one intriguing manifestation of the Law is that, within each of our social media accounts, we should expect the first digits of the number of followers of our followers to be distributed according to Benford’s Law. You may be surprised by this, as I was, and might not see what this has to do with data being organic or not, but it turns out that, by applying first digit analysis to these numbers from social media accounts, moderators can determine which accounts are being run by real people and which are bots or fake accounts that do not belong to actual people. This application struck me as being particularly fascinating since, in most of the other cases, the datasets we were looking at were collected from the natural world and were not directly influenced by people in the same way as a social media account is, but, once we consider how each of our follow counts and follower’s follower counts are compositions of essentially random and unique personal decisions which can be viewed as random variables, it kind of makes sense how Benford’s would arise from this. Even more wonderful is the fact that, because of this, we now have the ability to weed out or at least flag suspected bots through the use of this very simply implemented technique and try to stop the spread of fake news and manipulation of our content by undesirable entities.

Finally, as referenced before, but worth articulating again, what first digit analysis based on Benford’s Law allows us to do is to determine the validity of the data we are receiving and to tell whether we think it is representative of a natural phenomena. In this vein, whenever we are tasked with working with large sets of vital data, it is an important step to try and verify that the data is reliable or at least not suspicious. A very specific example of this is at the Barcelona Institute of Earth Sciences. Staff members there pour over geological data and constantly track the environmental conditions in order to gauge volcanic activity and make sure the volcanoes being monitored are stable and unlikely to erupt or become active anytime soon. This job and these numbers are extremely crucial because volcanic eruptions aren’t just local events but have impacts on a global scale. Ash from a relatively small Icelandic eruption in 2010 made it so that flights across the globe had to be grounded due to all the additional particulates in the air, and the global economy suffered losses of $5 billion due to it. Clearly, consequences of this magnitude are very serious, and, therefore, it is of the utmost importance that the data being used to make predictions about the state of the volcanoes is accurate, and the scientists who analyze this data do indeed use Benford’s to verify the reliability and naturalness of their numbers.

The fact that Benford’s has such a diverse array of applications across so many scientific and technological fields is truly remarkable, and, as will be explained shortly, the distribution presented in the Law can even have implications for monumental societal moments such as political elections.

Implementation of First Digit Analysis in Excel

One of the reasons Benford’s Law is so appealing is because of how straight forward and easy it is to use. With very little statistical knowledge or formal coding experience, one can perform first digit analysis to determine whether the null hypothesis that a set of data is authentic and can be considered natural is valid. Here, I will run through a brief illustrative example using Excel. In order to make this as interactive as possible, I will attempt to walk through each step and allow you to follow along on your own in parallel. The first thing to do is follow this link (https://www.census.gov/data/datasets/time-series/demo/popest/2010s-total-cities-and-towns.html#tables) and download the Annual Estimates of the Resident Population for Incorporated Places data for your favorite U.S. State. For the purposes of this example, I have chosen the data of my home-state of Georgia.

Once you have your file of choice downloaded, open it in Excel and create a second sheet in the same file. As you can see, the downloaded file contains actual census data from 2010 along with estimates for the following years. In order to avoid any human influence or unnatural constraints imposed by the projection models, we will perform first digit analysis on the data for 2010 rather than any of the other years. Copy and paste the columns containing the city name and 2010 census numbers into the second sheet. Now that we have the numbers of interest isolated, we can now begin to extract the first digits and perform our analysis. Fortunately for us, the LEFT() function in Excel allows us to do this easily. Just plug in the range of cells containing the data as the first input and the number of first digits in which we are interested for each value as the second input. In our case, this function will look like LEFT([data entries],1). You will now have a new column containing just the first digits.

Truncated Table Showing City Populations and Their First Digits

With all of the first digits extracted, we can now find the digit counts and the proportion that each represents. The COUNTIF() function lets us quickly determine how many of each starting digit there are by plugging in the range of cells of interest as the first input and the digit value we want the count of as the second input. We can then divide each count by the total number of entries we were looking over to get the proportion of numbers that start with each digit.

Digit Counts and Their Empirical Proportions

We now have the empirical probabilities and can both plot this along with the probabilities predicted by Benford’s Law to get a visual of how closely our dataset conforms to expectations and can also use a Chi-Squared Test of Goodness of Fit to determine how closely, statistically speaking, our results are to their anticipated proportions. In order to do this, we will need to make a column of the proportions stated in Benford’s Law. This can be easily coded since the probability of each digit being d is equal to log(1+(1/d)) and we can just populate a column with the proportions of the nine possible starting digits. The CHITEST() function will now allow us to find the p-value of the chi-squared test to find out whether there is a statically significant difference between the expect frequencies and observed ones. Just plug the range of observed probabilities in as the first input of the function and the range of expected frequencies in as the second input, and you will get the respective p-value as the output.

First Digit Analysis in Excel

In the case of the Georgia data, we can see that our p-value from the test is approximately 1, which means that there is absolutely no indication that our null hypothesis that our data conforms to Benford’s Law is incorrect. Remember that a p-value is the probability of getting a test statistic value at or more extreme than the one observed under the assumption that the null hypothesis is true, so, in general it would require a p-value of less than 0.05 to reject the null hypothesis. In this case, with a p-value of approximately 1, we are basically guaranteed to get a result at or more extreme than what we observed if Benford’s Law does apply, so we can safely fail to reject the null in this case and not be worried about our data having been tampered with or being inauthentic off of the basis of first digit analysis alone.

As you can clearly tell by now, this process of first digit analysis to tell whether a given dataset conforms to Benford’s is extremely simple to implement and does not require any rigorous understanding of the underlying math that causes the Law to even apply in the first place. This is great in the sense that many people can use this property of natural datasets to check the viability of their numbers in this niche sense, but the accessibility also opens up the door for misinformed people to try to inappropriately implement these techniques and claim results that aren’t at all valid. Unfortunately, this can have serious consequences and lead to the spread of misinformation and confusion, an example of which as you will now see.

The 2020 U.S. Presidential Election

Just over three weeks ago, our entire country collectively held our breaths and waited in a state of tension and anxiety to find out who our next president would be. In the wake of the eventual results, many folks flocked to this seemingly obscure mathematical law and began trying to apply it in order to justify their claims that they believed there to be rampant voter fraud. In particular, a few misinformed individuals began attempting to prove fraud by analyzing precinct data from a couple crucial areas of key battle ground states. What these people found and tried to publicize was the fact that, when looking at the first digit analysis for the vote counts for Biden versus Trump, Trump’s vote counts appeared to follow Benford’s Law while Biden’s did not, thus, in their view, confirming that Biden’s victory was invalid. Unfortunately for their claims, trying to apply Benford’s to election data is grossly inappropriate, and, while it is an accessible enough a law that anyone with Excel can try to use it, it is just technical enough that, without understanding how and why it works, one can become tripped up and try to declare results that simply are not actually corroborated by the Law.

This mini case study centered around the election provides a nice illustrative example of some of the more detailed aspects of Benford’s and why it will fail to work in cases such as this. To be specific, the two primary reasons that first digit analysis is invalid in this election scenario is the issue of not having data spanning multiple magnitudes and a lack of independence between the two vote count distributions we are trying to analyze, and I will discuss both of these in detail. The first problem with trying to use Benford’s in this case is that the individuals in question were attempting to use precinct data to prove their assertions. Unfortunately for them, one of the foundational assumptions of Benford’s is that our data is uniformly distributed across many scales of magnitude and thus the probability of each starting digit can just be computed as the difference between the log of the proceeding digit and the log of the digit in question. The whole point of precincts is to standardize the size of voter blocks and make it so that there are many smaller vote counting locations that have volumes which are much easier to handle. This property of precincts, however, causes them to perform terribly under Benford’s and have no predictive power about how authentic the data is. In this instance, where all of the vote count magnitudes are roughly the same, we find ourselves in the situation outlined much earlier in which the distribution over the logs is narrow and sharply peaked, and, therefore, it is inappropriate to associate the areas of each region with the probability formula assumed for the logarithm curve Benford’s purports.

The second crucial flaw in the methodology adopted by these misinformed statisticians was that they implicitly assumed independence between the vote counts for the two candidates. Given the nature of this election, however, where we only had two major candidates who would be receiving the vast majority of the vote, there is severe dependence between the votes for one candidate and those for the other since the votes for one will be approximately equal to the total number of votes cast in that precinct minus the votes cast for the other. This leads to a situation where it is not possible for both candidates’ votes to adhere to Benford’s. To provide a clear example of why this is the case, consider a precinct consisting of 1000 votes. If one candidate receives more votes than the other, than the first digit for their vote count in that precinct is necessarily going to be greater than or equal to 5 while the candidate who received fewer votes is guaranteed to have received a vote count whose first digit is less than or equal to 5. This would suggest that the losing candidate’s vote numbers will always follow Benford’s more closely than the winning candidate’s counts. Since clearly one person is going to receive more votes than the other without there being any fraud or election manipulation at all, trying to use Benford’s on two candidates’ vote counts is meaningless. These points starkly contradict the arguments of and “proof” provided by critics claiming voter fraud on the basis of first digit analysis.

In general, most statisticians say that digit analysis fails to hold any statistical significance in the context of elections, but, while all agree that first digit analysis will fail, some have demonstrated how second digit analysis can still provide some insights under certain circumstances and contingent upon the results of other diagnostics. In the case of this most recent presidential election, a few experts in the statistics community who have spent their careers developing techniques to appropriately apply second digit analysis to elections have gone through the data and concluded that, on the basis of their methodologies, there is no evidence of any sort of voter fraud. If you would like to learn more about this specific analysis which was performed and find out how the results were reached, I recommend checking out this quite concise document put out by political statistician Walter Mebane: http://www-personal.umich.edu/~wmebane/inapB.pdf

Concluding Remarks

By uncovering this hidden pattern which exists within datasets from a diverse range of natural domains, we have unlocked the potential to measure and gauge the authenticity of collections of numbers documented and reported. The technique of digit analysis provides a robust and informative starting point for many investigations of this nature, but it is important to bear in mind where the applications of this Law are appropriate and where they will produce meaningless results. Due to the simplicity of the statement of Benford’s Law, this mathematical property is accessible to a large number of people, which is fantastic, but, because it is seemingly so straight-forward, it can be difficult to keep in mind the technical details which cause it to be viable and it is easy for some to get carried away and try to apply Benford’s incorrectly. The main points to keep in mind are as follows: “Does my data span multiple scales of magnitude?,” “Is my data free of human imposed constraints, bounds, or formulations?,” and “Is my data coming from a process which appears to be the result of either the composition of random events in a multiplicative manner or the sampling and amalgamation of random variables being drawn from a set of random distributions?.” These will all help insure that the data spans the logs of the first digits uniformly so that it is appropriate to expect the probability of a number starting with a specific digit to be equal to the log of that digit plus one minus the log of that digit since we are looking for the domain over which that digit is the first digit and, since we have a density of 1 over the range of log(1) to log(10), we can just find the probability as the difference between the logs.

In some situations it may be helpful to consider digits other than the first, but, as discussed above, at each successive level, we lose the ability to differentiate between the probabilities for each digit, and, by about the fourth digit, all ten possibilities should be approximately uniformly likely. It is exciting that so many people have been drawn to this Law in the past few weeks and months, but we must refrain from accepting every seemingly statistically proven result when those conducting the analyses are not trained in the field and do not fully understand the Law upon which they are trying to base their claims and findings.

As we move forward, I firmly believe that Benford’s will come to prove fruitful in even more domains of data analysis and validation, and will continue to be a powerful tool for weeding out and flagging suspicious data sets. As more people become familiar with this Law, the necessity to understand the statistical implications of the results produced by it is of vital importance, and hopefully this article helped to illuminate some of the more darkly shrouded areas of this Law in your mind. The nature of numbers which Newcomb and Benford uncovered is both elegant and profound in a beautifully simplistic kind of way. I hope you come away from this with a greater appreciation of how connected we all are with the natural world and how this property found within all numbers acts as a guide to how all numbers in the universe are intertwined and related. A great cosmos of numbers described perfectly by a simple logarithmic curve - something about this is even more striking than the visual appeal of the Golden Ratio, and the statistical power causes Benford’s Law to be so much more profound.

If you are interested in pursuing the topic of Benford’s Law further and hearing about more very cool applications of it, I highly recommend both the fourth episode of the Netflix series Connected titled “Digits” and the piece put out by Radiolab titled “Breaking Benford’s.” Jad, Robert, Latif, and Soren do great work and are truly remarkable story tellers who are able to articulate extremely important topics in elegant ways and weave them expertly into highly engaging narratives.

Additional reading/listening material:

--

--

Griffin McCauley

M.S. in Data Science @ University of Virginia | Sc.B in Applied Mathematics @ Brown University