#TIL pymc.Metropolis always assume symmetric proposal distribution (default to normal), therefore does not implement Hastings correction. For a custom asymmetric proposal distribution, PyMC's out-of-the-box Metropolis algorithm will give incorrect sampling. We need to write a custom step_method.
Posts by Ridho Akbar
Mathematically deriving Metropolis-Hastings algorithm from Markov-Chain's detailed balance equation makes me feel like going back to school. It's thermodynamics.
Making my repository, which is my journal in studying MCMC public. Theory, equations, and interpretations as well as code and animations/visualization can be found there.
github.com/ridhoma/mont...
There is another stronger condition that is the fundamental of Metropolis-Hastings algorithm called Detailed Balance, which will be my next topic.
All codes are available in
github.com/ridhoma/mont...
This is a very fundamental concept for developing MCMC sampling algorithm. Because we want the sampling process to converge to our target distribution, we must make sure the sampling process;
1. Satisfy Markov property / be a Markov-Chain
2. Irreducible
3. Aperiodic
Moreover, Markov-Chain must be irreducible and aperiodic to guarantee reaching a Stationary Distribution. Irreducible: every state can be reached from any other state. Aperiodic: the chain does not cycle through states in a fixed period.
This interesting properties only applies if the random walk follows a Markov process, that is the probability of transition to the next node/state depends only to the current state. Such graph we may call a Markov-Chain
No matter which node we started at, with long enough simulation time, the distribution of visit frequency will converge to the same values called Stationary Distribution. This also applies with larger graphs.
The simplest illustration of Markov Chain is a random walk on a graph. Given edge Qij as probability of moving from node-i to node-j, where Qii represents probability of staying at the same node-i. Simulate the random walk, and at each time step, count how many times each node has been visited
Moreover, it took only 3 minutes to run this. Soo.. i'm satisfied.
In the scheme of learning MCMC, this exercise is a good familiarization to the concept of rejection sampling, a simple example for Markov-Chain based state proposal, and good introduction to Metropolis-Hasting algorithm
Finally, compare the final solution with greedy algorithm and the result is quite good. It gives 12% shorter distance than greedy. According to ChatGPT, greedy usually gives sub-optimal solution that is 10%-20% global optimum. So our solution might be close ๐
During the first 1000 iterations, Temperature is kept high to maximize the search space. Then it's gradually lowered to start the optimization. Finally temperature is kept low to fine tune the final solution
It took me quite a number of trials to fine-tune the To and the annealing function until I'm satisfied with the outcome. I ended up with this annealing schedule.
This cooling down process is called annealing in material science, hence the algorithm is called Simulated Annealing. The success of finding the optimal solution depend on the initial temperature (To) and the annealing schedulefunction, T(t) = f(To, t)
We repeat the process and after each iteration we will slowly reduce the temperature (slowly cool down the system). Until the temperature reach ~0 we stop the process and whatever the solution we have is our final solution.
We can control how likely we accept the worse solution using temperature parameter T. The higher the temperature, the more likely we accept route even though it is worse than the current route. This encourage exploration of the search space and help escaping local minima
Here "energy" is just a fancy term for "total travel distance" of the TSP route. We will accept the proposal if it gives lower energy than the current solution (distance improves). Otherwise, we may still accept it but with probability exponentially drop as the energy difference increase.
The algorithm basically starts with a random initial state/solution. Then we change the solution with some perturbation function, we call the the proposed next solution. We either accept/reject the proposal depending the energy of the proposed state compared to current state.
Long time no update.. Quite busy lately in the work. But here it is. Applying Markov-Chain Monte Carlo to estimate large Traveling Salesman Problem (N=100).
It's quite fascinating that a very simple algorithm relying on random search can be used to approximate an NP-Hard optimization problem.
Studying MCMC from scratch, so far everything is basically physics ???
Wkwk siap suhu
Bayesian Analysis generally consists of 2 essentials:
1. Assumption/priors building
2. Numerical method to get the posterior
I find this course from CMU a good starting point to learn #2 fundamental levels. I will use this as my guide
gi1242.codeberg.page/cmu-math-cs-...
I'd also like to make my github repo public (with all the codes, visualization, equations, and comments), but probbaly later. Dirapihin dulu, malu kalau berantakan
However, this randomness behavior is not bad at all. Using a technique called Simmulated Annealing, we can take advantage of this behavior and use Monte-Carlo to efficiently solve optimization problem, e.g. Traveling Salesman Problem. Something I will post next time
However, Monte-Carlo is not the best method to estimate pi because it converges very slowly, with rate of 1/โn. Not only that, as more random points are added we might randomly ruin a previously better estimation and end up getting a worse estimation.
In the circle-in-a-square model, the reality to reconstruct is a "circle in a square". But with only 1000 points, it barely looks like a circle. With 10,000 now can see some circle. Finally with 100,000 the circle looks clear. Notice as the circle becomes clearer, the more accurate is our pi
The Law of Large Number is the theoretical foundation of Monte Carlo method. This law states that as the number of random samples increases, the average of the samples will be closer to reality (math term: true value / expected value). The closer to the reality the better our estimation accuracy
In the case of estimating pi above, we throw bunch of points randomly into a 1x1 square., accept (blue) them if they fall inside the quadrant (quarter-circle), otherwise reject them (gray). The number of accepted points is proportionate to the area of the circle, allowing us to estimate pi.
Monte Carlo is essentially trying to estimate certain variables/parameters by generating a large number of random samples and accepting/rejecting each based on some criteria. The proportion of accepted samples will reveal the information needed to estimate the parameters.
First post in BlueSky. This place seems like perfect to share my learning process of MCMC methods for Bayesian Analysis.
I'll start with the very basics, the first chapter of any books: Stochastically estimating pi using the circle-in-a-square model.