pyotc.otc_backend.policy_iteration.dense package¶
Submodules¶
pyotc.otc_backend.policy_iteration.dense.approx_tce module¶
- pyotc.otc_backend.policy_iteration.dense.approx_tce.approx_tce(P, c, L, T)[source]¶
Approximates the Transition Coupling Evaluation (TCE) vectors g and h using a truncation-based approximation of the exact TCE method.
- Parameters:
P (np.ndarray) – Transition matrix of shape (dx*dy, dx*dy).
c (np.ndarray) – Cost vector of shape (dx*dy,) or (dx*dy, 1).
L (int) – Maximum number of iterations for computing the cost vector g.
T (int) – Maximum number of iterations for computing the bias vector h.
- Returns:
Approximated average cost (gain) vector of shape (dx*dy,). h (np.ndarray): Approximated bias vector of shape (dx*dy,).
- Return type:
g (np.ndarray)
pyotc.otc_backend.policy_iteration.dense.entropic module¶
Entropic Optimal Transition Coupling (OTC) solvers.
Implements variants of the OTC algorithm using entropic regularization. Includes both a custom Sinkhorn implementation and one based on the POT library.
References
Section 5, “Optimal Transport for Stationary Markov Chains via Policy Iteration” (https://www.jmlr.org/papers/volume23/21-0519/21-0519.pdf)
- - logsinkhorn
A self-implemented log-scaled Sinkhorn solver.
- - ot_sinkhorn
Sinkhorn solver from POT library.
- - ot_logsinkhorn
Sinkhorn solver from POT library in log scale.
- - ot_greenkhorn
Sinkhorn solver of greedy version from POT library.
- pyotc.otc_backend.policy_iteration.dense.entropic.entropic_otc(Px, Py, c, L=100, T=100, xi=0.1, method='logsinkhorn', sink_iter=100, reg_num=None, get_sd=False, silent=True)[source]¶
Solves the Entropic Optimal Transition Coupling (OTC) problem between two Markov chains using approximate policy iteration and entropic regularization.
This method alternates between approximate coupling evaluation and entropic coupling improvement (via Sinkhorn iterations), until convergence.
- Parameters:
Px (np.ndarray) – Transition matrix of the source Markov chain of shape (dx, dx).
Py (np.ndarray) – Transition matrix of the target Markov chain of shape (dy, dy).
c (np.ndarray) – Cost function of shape (dx, dy).
L (int) – Number of iterations for computing the cost vector g in approx_tce.
T (int) – Number of iterations for computing the bias vector h in approx_tce.
xi (float) – Scaling factor for entropic cost adjustment in entropic_tci.
method (str) – Method for the Sinkhorn algorithm. Must choose from [‘logsinkhorn’, ‘ot_sinkhorn’, ‘ot_logsinkhorn’, ‘ot_greenkhorn’]. Default is ‘logsinkhorn’. See ‘Methods’ above for details.
sink_iter (int) – Number of iterations for ‘logsinkhorn’ method. Maximum number of Sinkhorn iterations for other methods from POT library. Used in the entropic TCI step.
reg_num (float) – Entropic regularization term, used only for methods from POT package.
get_sd (bool) – If True, compute best stationary distribution using linear programming.
silent (bool) – If False, print convergence info during iterations and running time
- Returns:
Expected transport cost under the optimal transition coupling. P (np.ndarray): Optimal transition coupling matrix of shape (dx*dy, dx*dy). stat_dist (Optional[np.ndarray]): Stationary distribution of the optimal transition coupling of shape (dx, dy),
or None if get_sd is False.
- Return type:
exp_cost (float)
pyotc.otc_backend.policy_iteration.dense.entropic_tci module¶
- pyotc.otc_backend.policy_iteration.dense.entropic_tci.entropic_tci(h, P0, Px, Py, xi, solver_fn)[source]¶
Performs entropic Transition Coupling Improvement (TCI) using log-domain Sinkhorn algorithm.
For each (i, j) state pair from the product space of two Markov chains, this function solves a local entropic optimal transport problem based on the bias vector h.
- Parameters:
h (np.ndarray) – Bias vector of shape (dx*dy,).
P0 (np.ndarray) – Previous transition coupling matrix of shape (dx*dy, dx*dy).
Px (np.ndarray) – Transition matrix of the source Markov chain of shape (dx, dx).
Py (np.ndarray) – Transition matrix of the target Markov chain of shape (dy, dy).
xi (float) – Scaling factor for entropic cost adjustment.
solver_fn (callable) – A function solves the optimization and provides a transport plan. Specified in ‘entropic_otc’.
- Returns:
Updated transition coupling matrix of shape (dx*dy, dx*dy).
- Return type:
np.ndarray
pyotc.otc_backend.policy_iteration.dense.exact module¶
- pyotc.otc_backend.policy_iteration.dense.exact.exact_otc(Px, Py, c, stat_dist='best')[source]¶
Computes the optimal transport coupling (OTC) between two stationary Markov chains represented by transition matrices Px and Py, as described in Algorithm 1 of the paper: “Optimal Transport for Stationary Markov Chains via Policy Iteration” (https://www.jmlr.org/papers/volume23/21-0519/21-0519.pdf).
The algorithm iteratively updates the transition coupling matrix until convergence by alternating between Transition Coupling Evaluation (TCE) and Transition Coupling Improvement (TCI) steps.
For a detailed discussion of the connection between the OTC problem and Markov Decision Processes (MDPs), see Section 4 of the paper. Additional background on policy iteration methods for solving average-cost MDP problems can be found in Chapters 8 and 9 of “Markov Decision Processes: Discrete Stochastic Dynamic Programming” by Martin L. Puterman.
- Parameters:
Px (np.ndarray) – Transition matrix of the source Markov chain of shape (dx, dx).
Py (np.ndarray) – Transition matrix of the target Markov chain of shape (dy, dy).
c (np.ndarray) – Cost function of shape (dx, dy).
stat_dist (str, optional) – Method to compute the stationary distribution. Options include ‘best’, ‘eigen’, ‘iterative’ and None. Defaults to ‘best’.
- Returns:
Expected transport cost under the optimal transition coupling. R (np.ndarray): Optimal transition coupling matrix of shape (dx*dy, dx*dy). stat_dist (np.ndarray): Stationary distribution of the optimal transition coupling of shape (dx, dy).
Returns (None, None, None) if the algorithm fails to converge.
- Return type:
exp_cost (float)
pyotc.otc_backend.policy_iteration.dense.exact_tce module¶
Original Transition Coupling Evaluation (TCE) methods from: https://www.jmlr.org/papers/volume23/21-0519/21-0519.pdf
- pyotc.otc_backend.policy_iteration.dense.exact_tce.exact_tce(R, c)[source]¶
Computes the exact Transition Coupling Evaluation (TCE) vectors g and h using the linear system described in Algorithm 1a of the paper “Optimal Transport for Stationary Markov Chains via Policy Iteration” (https://www.jmlr.org/papers/volume23/21-0519/21-0519.pdf).
The method solves a block linear system involving the transition matrix R and cost vector c. If the system is not full rank, a pseudo-inverse (pinv) is used as fallback.
- Parameters:
R (np.ndarray) – Transition matrix of shape (dx*dy, dx*dy).
c (np.ndarray) – Cost vector of shape (dx*dy, dx*dy).
- Returns:
Average cost (gain) vector of shape (dx*dy,). h (np.ndarray): Total extra cost (bias) vector of shape (dx*dy,).
- Return type:
g (np.ndarray)
Notes
If the matrix A is singular or ill-conditioned, the solution uses np.linalg.pinv, which may lead to numerical instability.
Make sure Pz is a proper stochastic matrix (rows sum to 1).
pyotc.otc_backend.policy_iteration.dense.exact_tci_lp module¶
Original Transition Coupling Improvements (TCI) methods from: https://jmlr.csail.mit.edu/papers/volume23/21-0519/21-0519.pdf
Use scipy.linprog (LP solver) library to solve optimal transport problem.
pyotc.otc_backend.policy_iteration.dense.exact_tci_pot module¶
Original Transition Coupling Improvements (TCI) method from: https://www.jmlr.org/papers/volume23/21-0519/21-0519.pdf
Use the python optimal transport (POT) library to solve optimal transport problem.
- pyotc.otc_backend.policy_iteration.dense.exact_tci_pot.exact_tci(g, h, R0, Px, Py)[source]¶
Performs the Transition Coupling Improvement (TCI) step in the OTC algorithm.
This function attempts to update the current coupling transition matrix R0 based on the evaluation vectors g and h obtained from the Transition Coupling Evaluation (TCE).
- Parameters:
g (np.ndarray) – Gain vector from TCE of shape (dx*dy,).
h (np.ndarray) – Bias vector from TCE of shape (dx*dy,).
R0 (np.ndarray) – Current transition coupling matrix of shape (dx*dy, dx*dy).
Px (np.ndarray) – Transition matrix of the source Markov chain of shape (dx, dx).
Py (np.ndarray) – Transition matrix of the target Markov chain of shape (dy, dy).
- Returns:
Improved transition coupling matrix of shape (dx*dy, dx*dy).
- Return type:
R (np.ndarray)
- pyotc.otc_backend.policy_iteration.dense.exact_tci_pot.setup_ot(f, Px, Py, R)[source]¶
This improvement step updates the transition coupling matrix R that minimizes the product Rf element-wise. In more detail, we may select a transition coupling R such that for each state pair (x, y), the corresponding row r = R((x, y), ·) minimizes rf over couplings r in Pi(Px(x, ·), Py(y, ·)). This is done by solving the optimal transport problem for each state pair (x, y) in the source and target Markov chains. The resulting transition coupling matrix R is updated accordingly.
This function uses the POT (Python Optimal Transport) library to solve the optimal transport problem for each (x, y) state pair and updates the transition coupling matrix.
- Parameters:
f (np.ndarray) – Cost function reshaped as of shape (dx*dy,).
Px (np.ndarray) – Transition matrix of the source Markov chain of shape (dx, dx).
Py (np.ndarray) – Transition matrix of the target Markov chain of shape (dy, dy).
R (np.ndarray) – Transition coupling matrix to update of shape (dx*dy, dx*dy).
- Returns:
Updated transition coupling matrix of shape (dx*dy, dx*dy).
- Return type:
R (np.ndarray)