pd.crosstab is one of those built-in functions in the Pandas API that I forget about routinely. I instinctively reached for df.groupby('x')['y'].count().unstack(), but when I wanted to normalize the values, it takes more and more steps to get where I wanted.
This was a nice straightforward overview of the pd.crosstab function. To document for myself, below, create a sample correlated DataFrame with integer columns ActiveUsers and CompletedProfile.
import pandas as pd import numpy as np # following code from Github Copilot np.random.seed(0) n = 1000 # Number of samples p = 0.7 # Probability of True in the first column rho = 0.8 # Correlation col1 = np.random.choice([True, False], size=n, p=[p, 1-p]) col2 = np.where(col1, np.random.choice([True, False], size=n, p=[rho, 1-rho]), np.random.choice([True, False], size=n, p=[1-rho, rho])) df = pd.DataFrame({'ActiveUsers': col1, 'CompletedProfile': col2}) ActiveUsersCompletedProfile0TrueTrue1FalseFalse2TrueFalse3...... Sample of the constructed DataFrame
...