ration and exploitation. eps greedy approach; each time use threshold and probbaility to pick whether the optimal one or non-optimal
thompsten sampling/ UCB: from historical data we can have a distirbution for each color of button; then draw one sample from each distribution; and pick max of the sample
multi-armed bandid problem
2022/10/19 offer listed as "research phd" for the Binds Ads team