8. Safe and Strong Screening for Generalized LASSO
Safe [GVR11] and strong [TBF+12] scoring rules may be used to filter out variables for LASSO regression. These approaches are a type of feature selection and involve inner-products to reduce the feature space. The safe scoring rule is defined as follows.
\(|x_j^T y| < \lambda - ||x_j||_2 ||y||_2 \dfrac{\lambda_{\mathrm{max}} - \lambda}{\lambda_{\mathrm{max}}}\)
The strong scoring rule tends to remove more variables and is defined as follows.
\(|x_j^T y| < 2\lambda - \lambda_{\mathrm{max}}\)
For these inequalities,
\(x_j\) is the vector of values for the j-th,
\(y\) is the vector of the dependent variable,
\(\lambda_{\mathrm{max}} = \mathrm{max}_i |x_i^T y|\), and
\(\lambda\) is the regularization parameter.
8.1. Data
Let’s simulate some data.
The variables \(X_1, X_2, X_3, X_4, X_5\) are sampled as follows.
\(X_1 \sim \mathcal{B}(1, 0.8)\)
\(X_2 \sim \mathcal{B}(1, 0.4)\)
\(X_3 \sim \mathcal{B}(1, 0.6)\)
\(X_4 \sim \mathcal{B}(1, 0.3)\)
\(X_5 \sim \mathcal{B}(1, 0.2)\)
The dependent variable \(y\) is sampled as follows.
\(z = 2.0 + 0.5 X_1 + 0.8 X_2\)
\(p = \dfrac{1.0}{1.0 + \exp(-z)}\)
\(y \sim \mathcal{B}(1, p)\)
There are 10,000 sampled data points.
[1]:
import numpy as np
import random
from scipy.stats import binom
random.seed(37)
np.random.seed(37)
N = 10_000
x1 = np.random.binomial(1, 0.8, N)
x2 = np.random.binomial(1, 0.7, N)
x3 = np.random.binomial(1, 0.5, N)
x4 = np.random.binomial(1, 0.3, N)
x5 = np.random.binomial(1, 0.2, N)
X = np.hstack((
x1.reshape(-1, 1),
x2.reshape(-1, 1),
x3.reshape(-1, 1),
x4.reshape(-1, 1),
x5.reshape(-1, 1)
))
z = 2.0 + 0.5 * x1 + 0.8 * x2
p = 1.0 / (1.0 + np.exp(-z))
y = binom.rvs(1, p)
8.2. Feature selection
Let’s compute what we need for the inequalities.
[2]:
x_ip = np.abs(X.T.dot(y)) # inner-product |x_i^T y|
lambda_max = np.max(x_ip) # maximum value of inner-products
lambda_v = lambda_max - (0.1 * lambda_max) # lambda is selected to be lambda_max - (0.1 lambda_max)
x_norm = np.array([np.linalg.norm(X[:,i], 2) for i in range(X.shape[1])]) # l2 norms of each variable/column
y_norm = np.linalg.norm(y, 2) # l2 norm of y
[3]:
lambda_v
[3]:
6833.7
[4]:
lambda_max
[4]:
7593
[5]:
x_ip
[5]:
array([7593, 6770, 4740, 2834, 1890], dtype=int64)
[6]:
x_norm
[6]:
array([89.35323161, 83.98214096, 70.79548008, 54.91812087, 44.754888 ])
[7]:
y_norm
[7]:
97.22139682189307
8.2.1. Safe scoring rule
\(\lambda - ||x_j||_2 ||y||_2 \dfrac{\lambda_{\mathrm{max}} - \lambda}{\lambda_{\mathrm{max}}}\)
[8]:
lambda_v - x_norm * y_norm * ((lambda_max - lambda_v)/(lambda_max))
[8]:
array([5964.99540119, 6017.21389479, 6145.41645378, 6299.77835781,
6398.58672739])
The safe scoring rule inequality.
\(|x_j^T y| < \lambda - ||x_j||_2 ||y||_2 \dfrac{\lambda_{\mathrm{max}} - \lambda}{\lambda_{\mathrm{max}}}\)
As you can see, \(X_1\) and \(X_2\) are kept, but \(X_3\), \(X_4\) and \(X_5\) are removed by the inequality.
[9]:
x_ip < lambda_v - x_norm * y_norm * ((lambda_max - lambda_v)/(lambda_max))
[9]:
array([False, False, True, True, True])
8.2.2. Strong scoring rule
\(2\lambda - \lambda_{\mathrm{max}}\)
[10]:
2 * lambda_v - lambda_max
[10]:
6074.4
The strong scoring rule inequality.
\(|x_j^T y| < 2\lambda - \lambda_{\mathrm{max}}\)
Again, as you can see, \(X_1\) and \(X_2\) are kept, but \(X_3\), \(X_4\) and \(X_5\) are removed by the inequality.
[11]:
x_ip < 2 * lambda_v - lambda_max
[11]:
array([False, False, True, True, True])