The City and County of Denver Checkbook is open to the public, minus some confidential transactions and payment information.
The version I download on January 28th, 2014 contained 941,686 transactions between the dates of January 5th, 2009 and January 23rd, 2014. They totaled over $6.5 billion.
Transactions can also be summed by the day of the week that they occurred:
Day | Total |
---|---|
Monday | $849,048,448 |
Tuesday | $1,125,252,608 |
Wednesday | $1,445,859,328 |
Thursday | $1,705,470,848 |
Friday | $1,412,129,280 |
Saturday | $1,011,808 |
Sunday | $0 |
While the majority of the transactions are under $1,000, the checkbook also includes payments for general operations and contractual services. Some of the largest include:
Payee | Department | Date | Amount |
---|---|---|---|
Denver Health Medical Center | Environmental Health | 2011-02-11 | $15,505,120 |
Denver Health Medical Center | Environmental Health | 2009-02-19 | $13,304,044 |
United Airlines | DIA | 2009-12-18 | $12,500,000 |
Metro Wastewater Reclamation | DIA | 2012-12-18 | $12,399,824 |
Manager of Finance | Dept of Finance | 2011-06-29 | $11,436,182 |
Manager of Finance | Wastewater Manage | 2011-03-14 | $11,252,650 |
Since the transaction amounts vary across several orders of magnitude, this data should abide to Benford’s law. If we take the first digit of each amount, we should find that the digit 1
occurs much more frequently than 9
. Performing this count on the checkbook:
First Digit | Count |
---|---|
1 | 286,341 |
2 | 168,059 |
3 | 106,017 |
4 | 85,579 |
5 | 70,787 |
6 | 63,240 |
7 | 61,767 |
8 | 46,758 |
9 | 37,612 |
Expressed as a percentage of the total, along with the expected percentages:
First Digit | Expected | Observed |
---|---|---|
1 | 30.1% | 30.9% |
2 | 17.6% | 18.2% |
3 | 12.5% | 11.5% |
4 | 9.7% | 9.2% |
5 | 7.9% | 7.6% |
6 | 6.7% | 6.8% |
7 | 5.8% | 6.7% |
8 | 5.1% | 5.1% |
9 | 4.6% | 4.1% |
These observed counts are completely within the range of the expected percentages and the checkbook is a great real life example of Benford’s law.
Queries
With a copy of the checkbook.csv
you can produce these results yourself. The following schema was used to load the checkbook into a PostGres database:
CREATE TABLE "checkbook" (
"id" varchar PRIMARY KEY,
"payee" varchar,
"city" varchar,
"state" varchar,
"program_area" varchar,
"department" varchar,
"payment_id" varchar,
"payment_date" timestamp,
"year" int,
"amount" real,
"purchase_order" varchar,
"funding_source" varchar,
"project" varchar,
"project_description" varchar,
"expense_category" varchar,
"expense_subcategory" varchar
);
The only change made to the data was the removal of a double quote in the Project Description "'Maintenance" RFID/Tech (temp)"
that exists on multiple lines.
The PostGres COPY
command was then used to quickly load the data:
COPY "checkbook" FROM 'checkbook.csv' WITH CSV HEADER;
Since the checkbook includes negative numbers, the sum was calculated using an absolute value:
SELECT to_char(SUM(@"amount"),'9,999,999,999D99') FROM "checkbook";
The largest transactions were found with:
SELECT
"payee",
"department",
to_char("payment_date", 'Mon DD, YYYY') AS "date",
to_char("amount", '999,999,999')
FROM "checkbook"
ORDER BY "amount" DESC
LIMIT 6
The transactions by quarter:
SELECT
"year",
date_part('quarter', "payment_date") AS "quarter",
to_char(SUM(@"amount"),'9,999,999,999D99')
FROM "checkbook"
GROUP BY 1, 2
ORDER BY 1, 2
And by day of the week:
SELECT
date_part('isodow', "payment_date") AS "day",
to_char(SUM(@"amount"),'9,999,999,999D99')
FROM "checkbook"
GROUP BY 1
ORDER BY 1
To count the leading digits:
SELECT
left("amount"::text, 1) AS "digit",
count("id")
FROM "checkbook"
WHERE "amount" >= 1.0
GROUP BY 1
And the percentages:
SELECT
left("amount"::text, 1) AS "digit",
to_char(count("id")::real / (SELECT count("id")FROM "checkbook" WHERE "amount" > 1.0) * 100, '99D99') AS "%"
FROM "checkbook"
WHERE "amount" >= 1.0
GROUP BY 1