In quantile regression, the generalized asymmetric Laplace (GAL) distribution is commonly used to provide a robust framework treated within a Bayesian context. Stan, a flexible probabilistic programming language, does not natively provide built-in functionalities for all user-specific distributions such as the GAL. However, by using custom function definitions, one can extend Stan's native capabilities and implement the GAL distribution for quantile regression. This guide lays out the detailed steps to code such a model, covering the necessary components and details of each block in the Stan program.
The data block includes the observed variables and additional inputs necessary for the model. Typically, you supply the number of observations, the dependent variable, the predictor matrix, and the quantile level (τ). The quantile level is a real number in the interval (0, 1), indicating the quantile objective for the regression.
In the parameters block, you declare the unknown parameters of the model. These include the regression coefficients, the location parameter, the scale parameter, and an asymmetry parameter. The GAL distribution requires careful constraints, such as enforcing the scale parameter to be positive and the asymmetry parameter to lie between 0 and 1.
The model block declares the full likelihood function of the observed data given the parameters. Since GAL is not a native distribution in Stan, you include a custom likelihood function (lpdf) written in the functions block. Here, both the likelihood contributions and priors are combined. This block ensures that every observation's likelihood is computed using the defined GAL density function.
The generated quantities block allows the simulation of posterior predictions from the model. This block can further enhance model checks and predictions, generating quantities such as predicted responses at each observation.
Since the GAL distribution is not available by default, you need to define the log probability density and, if needed, a random number generator. The custom gal_lpdf
function will typically compute the log-likelihood for data given parameters such as location (μ), scale (σ), the asymmetry parameter (κ), and the quantile level (τ). The structure of this function involves calculating a correction term based on whether the observation is below or above the location parameter.
The gal_rng
function, while not mandatory for model fitting, is useful if you want to generate samples drawn from the GAL distribution for posterior predictive checks. This function typically incorporates a uniform random draw and applies transformations to model the asymmetric behavior of the distribution.
Below is a detailed example of Stan code for generalized asymmetric Laplace quantile regression. This code integrates a separate functions block that defines both the gal_lpdf
and gal_rng
functions, followed by the data, parameters, model, and generated quantities blocks.
// Define custom functions for GAL distribution
functions {
// Log probability density function for the GAL distribution
real gal_lpdf(real y, real mu, real sigma, real kappa, real tau) {
// A transformed parameter alpha, controlling asymmetry
real alpha = (1 - kappa) / (kappa);
// The absolute deviation scaled by sigma
real dev = fabs(y - mu) / sigma;
// Compute the log-likelihood differently based on the sign of deviation
if (y < mu) {
return log(kappa) + log(tau) - log(sigma) - dev * (alpha + 1);
} else {
return log1p(-kappa) + log1p(-tau) - log(sigma) - dev * (alpha + 1);
}
}
// Random number generator function for the GAL distribution
real gal_rng(real mu, real sigma, real kappa, real tau) {
real u = uniform_rng(0, 1);
real alpha = (1 - kappa) / (kappa);
if (u < tau) {
return mu - sigma * pow((u / tau), (1.0 / (alpha + 1))) * (1 - kappa);
} else {
return mu + sigma * pow(((1 - u) / (1 - tau)), (1.0 / (alpha + 1))) * kappa;
}
}
}
data {
int<lower=1> n; // Number of observations
int<lower=1> p; // Number of predictors
matrix[n, p] X; // Predictor matrix
vector[n] y; // Response variable
real<lower=0, upper=1> tau; // Quantile level (target quantile)
}
parameters {
vector[p] beta; // Regression coefficients
real mu; // Intercept or location parameter
real<lower=0> sigma; // Scale parameter
real<lower=0, upper=1> kappa; // Asymmetry parameter (controls skewness)
}
model {
// Prior distributions for the parameters
beta ~ normal(0, 1);
mu ~ normal(0, 1);
sigma ~ cauchy(0, 1);
kappa ~ uniform(0, 1);
// Likelihood specification using the custom GAL lpdf
for (i in 1:n) {
// Linear predictor: deterministic component of the model
real eta = X[i] * beta + mu;
// Increment target density using the custom GAL log density function
target += gal_lpdf(y[i] | eta, sigma, kappa, tau);
}
}
generated quantities {
vector[n] y_pred; // Posterior predictions for each observation
for (i in 1:n) {
// Predict by computing the linear predictor; more complex generation can use gal_rng
y_pred[i] = X[i] * beta + mu;
}
}
This Stan code begins with a functions block where the gal_lpdf
function is defined to compute the log probability of an observation given the GAL parameters. The data is then supplied in the data block, setting up the number of observations, covariates, response variable, and the quantile level. In the parameters block, key parameters such as coefficients, the location parameter, scale, and the asymmetry parameter are declared with appropriate constraints.
The model block incorporates prior distributions for the parameters and uses a loop over observations to compute the likelihood with the defined gal_lpdf
. Finally, the generated quantities block returns posterior predictions, which can later be used for model diagnostics and posterior predictive checks.
Block | Purpose | Details |
---|---|---|
functions | Custom Distribution Functions | Define gal_lpdf and gal_rng to compute the likelihood and generate samples from the GAL distribution |
data | Data Specification | Input data includes predictors, response variable, number of observations, and the target quantile level |
parameters | Parameter Declaration | Coefficients, intercept (location), scale, and asymmetry parameter are declared with necessary constraints |
model | Likelihood and Priors | Priors for the parameters are specified; the GAL likelihood is implemented using the custom lpdf function |
generated quantities | Posterior Predictions | Generates predicted values for posterior predictive checks |
The functions defined for the GAL distribution, particularly gal_lpdf
, can involve operations such as power calculations, absolute values, and logarithms, all of which might introduce computational overhead. When working with large datasets or complex models, profiling the Stan code and optimizing these custom functions may be necessary to ensure efficient sampling.
Correct parameter constraints are crucial. For example, the scale parameter (sigma) must remain strictly positive, and the asymmetry parameter (kappa) is bounded between 0 and 1. Failing to enforce these constraints could result in inefficient sampling or computational problems such as divergence during Hamiltonian Monte Carlo.
After fitting the model, it is important to perform standard diagnostics such as checking for divergences, effective sample sizes, and R-hat values to confirm convergence. Using the generated quantities to simulate posterior predictions further aids in assessing the model's performance.
For those interested in exploring more advanced aspects of quantile regression and Bayesian modeling using Stan, consider the following resources. These references discuss related topics ranging from the theoretical underpinnings of the asymmetric Laplace distribution to practical issues encountered in implementing such models in Stan.