Analytics

Debugging public data - county codes in Texas

If you sort a list of Texas counties by county name, the counties will not be in order by county number. Texas Center for Healthcare Statistics Every county in the United States has a FIPS code. These standards codes are usful for mapping and joining additional datasets. Texas state agencies do not bother with these FIPS codes. Instead, each state agency publishes data using their own county codes. The county codes of one state agency may not match other agencies’ county codes. I recently spent a few hours extremely confused by this, because I used the formula published by the Texas Department of State Health Services on data published by the Texas Health and Human Services, resulting in incorrect data. ...

Lessons learned on dashboard design for early-stage apps

What follows are a few takeaways from a recent app launch, where I had some successes and some things I had to fix. I also maintain dashboards for a wide set of clients on a different project that will be overhauled soon, and would like to apply these learnings during the redesign. The three guiding business questions for initial design are: How are we getting users? What our users are finding value in when using the application? ...

Joint & conditional probabilities with pd.crosstab

pd.crosstab is one of those built-in functions in the Pandas API that I forget about routinely. I instinctively reached for df.groupby('x')['y'].count().unstack(), but when I wanted to normalize the values, it takes more and more steps to get where I wanted. This was a nice straightforward overview of the pd.crosstab function. To document for myself, below, create a sample correlated DataFrame with integer columns ActiveUsers and CompletedProfile. import pandas as pd import numpy as np # following code from Github Copilot np.random.seed(0) n = 1000 # Number of samples p = 0.7 # Probability of True in the first column rho = 0.8 # Correlation col1 = np.random.choice([True, False], size=n, p=[p, 1-p]) col2 = np.where(col1, np.random.choice([True, False], size=n, p=[rho, 1-rho]), np.random.choice([True, False], size=n, p=[1-rho, rho])) df = pd.DataFrame({'ActiveUsers': col1, 'CompletedProfile': col2}) ActiveUsersCompletedProfile0TrueTrue1FalseFalse2TrueFalse3...... Sample of the constructed DataFrame ...

Identifying outbound IP addresses of a Azure Function

Here is Microsoft’s documentation page on the topic. The only reason I am documenting this is because the Azure Resource Explorer failed to load consistently across devices for me. Some tutorials I found, like this one, seem to be outdated and the outbound IP addresses weren’t listed in any Properties menu in the Function App in the Azure Portal. The Azure CLI method was what worked for me: az functionapp show --resource-group <GROUP_NAME> --name <APP_NAME> --query outboundIpAddresses --output tsv az functionapp show --resource-group <GROUP_NAME> --name <APP_NAME> --query possibleOutboundIpAddresses --output tsv I wanted this as I was hoping to whitelist the IP addresses in the database firewall to allow for scheduling some SQL scripts to run daily. The issue with this is the following, copied from the above linked Microsoft documentation: “because of autoscaling behaviors, the outbound IP can change at any time when running on a Consumption plan or in a Premium plan.” While at least a few people say whitelist these IPs anyway, the recommended course of action is to set up a virtual network within Azure. ...

SNAP State Performance Indicators Datasets

I started to post SNAP performance indicators from the USDA in machine-readable format, aggregating and cleaning the data from the many PDF tables. This repository includes quick-access to cleaned, FIPS-linked CSV long-format tables of yearly state reported indicators for Application Processing Timeliness, Program Error Rates, and Case and Procedural Error Rates (coming soon). USDA Descriptions as follows: Application Processing Timeliness “measures the timeliness of states’ processing of initial SNAP applications. The Food and Nutrition Act of 2008 entitles all eligible households to SNAP benefits within 30 days of application, or within 7 days, if they are eligible for expedited service.” ...

Friday Links

- A series of YouTube videos on how “practical effects” in movies are highlighted by directors, actors, and viewers, but actually are building blocks to extremely realistic computer generated visual effects. I didn’t even realize much of what he showed was technically possible: it makes sense that performing a stunt in real life can give VFX artists valuable information about how it should look when it is computer generated, but the fact that oftentimes the stunt itself is replaced with CGI is astounding; see, for instance, the Top Gun: Maverick replacement jets. ...