Would it be possible to look at a much larger number of samples during analyze,then look at the variation in those to generate a reasonable number ofpg_statistic "samples" to represent our estimate of the actual distribution?More datapoints for tables where the planner might benefit from it, fewer where it wouldn't.
Maybe it would be possible to take note somewhere of the percentage of occurence of the most common value (in the OP's case, about 3%), in which case a quick decision can be taken to use the index without even looking at the value, if we know the most common one is below the index use threshold...