Benford's Law and the Denver City Checkbook

By Aaron O. Ellis

Thursday, January 30, 2014

The City and County of Denver Checkbook is open to the public, minus some confidential transactions and payment information.

The version I download on January 28th, 2014 contained 941,686 transactions between the dates of January 5th, 2009 and January 23rd, 2014. They totaled over $6.5 billion.

Total Transactions by Quarter

Transactions can also be summed by the day of the week that they occurred:

Day Total
Monday $849,048,448
Tuesday $1,125,252,608
Wednesday $1,445,859,328
Thursday $1,705,470,848
Friday $1,412,129,280
Saturday $1,011,808
Sunday $0

Total Transactions by Day of the Week

While the majority of the transactions are under $1,000, the checkbook also includes payments for general operations and contractual services. Some of the largest include:

Payee Department Date Amount
Denver Health Medical Center Environmental Health 2011-02-11 $15,505,120
Denver Health Medical Center Environmental Health 2009-02-19 $13,304,044
United Airlines DIA 2009-12-18 $12,500,000
Metro Wastewater Reclamation DIA 2012-12-18 $12,399,824
Manager of Finance Dept of Finance 2011-06-29 $11,436,182
Manager of Finance Wastewater Manage 2011-03-14 $11,252,650

Since the transaction amounts vary across several orders of magnitude, this data should abide to Benford’s law. If we take the first digit of each amount, we should find that the digit 1 occurs much more frequently than 9. Performing this count on the checkbook:

First Digit Count
1 286,341
2 168,059
3 106,017
4 85,579
5 70,787
6 63,240
7 61,767
8 46,758
9 37,612

Expressed as a percentage of the total, along with the expected percentages:

First Digit Expected Observed
1 30.1% 30.9%
2 17.6% 18.2%
3 12.5% 11.5%
4 9.7% 9.2%
5 7.9% 7.6%
6 6.7% 6.8%
7 5.8% 6.7%
8 5.1% 5.1%
9 4.6% 4.1%

Benford's Law

These observed counts are completely within the range of the expected percentages and the checkbook is a great real life example of Benford’s law.

Queries

With a copy of the checkbook.csv you can produce these results yourself. The following schema was used to load the checkbook into a PostGres database:

CREATE TABLE "checkbook" (
    "id" varchar PRIMARY KEY,
    "payee" varchar,
    "city" varchar,
    "state" varchar,
    "program_area" varchar,
    "department" varchar,
    "payment_id" varchar,
    "payment_date" timestamp,
    "year" int,
    "amount" real,
    "purchase_order" varchar,
    "funding_source" varchar,
    "project" varchar,
    "project_description" varchar,
    "expense_category" varchar,
    "expense_subcategory" varchar
);

The only change made to the data was the removal of a double quote in the Project Description "'Maintenance" RFID/Tech (temp)" that exists on multiple lines.

The PostGres COPY command was then used to quickly load the data:

COPY "checkbook" FROM 'checkbook.csv' WITH CSV HEADER;

Since the checkbook includes negative numbers, the sum was calculated using an absolute value:

SELECT to_char(SUM(@"amount"),'9,999,999,999D99') FROM "checkbook";

The largest transactions were found with:

SELECT
    "payee",
    "department",
    to_char("payment_date", 'Mon DD, YYYY') AS "date",
    to_char("amount", '999,999,999')
FROM "checkbook"
ORDER BY "amount" DESC
LIMIT 6

The transactions by quarter:

SELECT
    "year",
    date_part('quarter', "payment_date") AS "quarter",
    to_char(SUM(@"amount"),'9,999,999,999D99')
FROM "checkbook"
GROUP BY 1, 2
ORDER BY 1, 2

And by day of the week:

SELECT
    date_part('isodow', "payment_date") AS "day",
    to_char(SUM(@"amount"),'9,999,999,999D99')
FROM "checkbook"
GROUP BY 1
ORDER BY 1

To count the leading digits:

SELECT
    left("amount"::text, 1) AS "digit",
    count("id")
FROM "checkbook"
WHERE "amount" >= 1.0
GROUP BY 1

And the percentages:

SELECT
    left("amount"::text, 1) AS "digit",
    to_char(count("id")::real / (SELECT count("id")FROM "checkbook" WHERE "amount" > 1.0) * 100, '99D99') AS "%"
FROM "checkbook"
WHERE "amount" >= 1.0
GROUP BY 1