7. Safe and Strong Screening for Generalized LASSO

Safe [GVR11] and strong [TBF+12] scoring rules may be used to filter out variables for LASSO regression. These approaches are a type of feature selection and involve inner-products to reduce the feature space. The safe scoring rule is defined as follows.

$$|x_j^T y| < \lambda - ||x_j||_2 ||y||_2 \dfrac{\lambda_{\mathrm{max}} - \lambda}{\lambda_{\mathrm{max}}}$$

The strong scoring rule tends to remove more variables and is defined as follows.

$$|x_j^T y| < 2\lambda - \lambda_{\mathrm{max}}$$

For these inequalities,

• $$x_j$$ is the vector of values for the j-th,

• $$y$$ is the vector of the dependent variable,

• $$\lambda_{\mathrm{max}} = \mathrm{max}_i |x_i^T y|$$, and

• $$\lambda$$ is the regularization parameter.

7.1. Data

Let’s simulate some data.

The variables $$X_1, X_2, X_3, X_4, X_5$$ are sampled as follows.

• $$X_1 \sim \mathcal{B}(1, 0.8)$$

• $$X_2 \sim \mathcal{B}(1, 0.4)$$

• $$X_3 \sim \mathcal{B}(1, 0.6)$$

• $$X_4 \sim \mathcal{B}(1, 0.3)$$

• $$X_5 \sim \mathcal{B}(1, 0.2)$$

The dependent variable $$y$$ is sampled as follows.

• $$z = 2.0 + 0.5 X_1 + 0.8 X_2$$

• $$p = \dfrac{1.0}{1.0 + \exp(-z)}$$

• $$y \sim \mathcal{B}(1, p)$$

There are 10,000 sampled data points.

[1]:

import numpy as np
import random
from scipy.stats import binom

random.seed(37)
np.random.seed(37)

N = 10_000

x1 = np.random.binomial(1, 0.8, N)
x2 = np.random.binomial(1, 0.7, N)
x3 = np.random.binomial(1, 0.5, N)
x4 = np.random.binomial(1, 0.3, N)
x5 = np.random.binomial(1, 0.2, N)

X = np.hstack((
x1.reshape(-1, 1),
x2.reshape(-1, 1),
x3.reshape(-1, 1),
x4.reshape(-1, 1),
x5.reshape(-1, 1)
))

z = 2.0 + 0.5 * x1 + 0.8 * x2
p = 1.0 / (1.0 + np.exp(-z))
y = binom.rvs(1, p)


7.2. Feature selection

Let’s compute what we need for the inequalities.

[2]:

x_ip = np.abs(X.T.dot(y))                    # inner-product |x_i^T y|
lambda_max = np.max(x_ip)                    # maximum value of inner-products
lambda_v = lambda_max - (0.1 * lambda_max)   # lambda is selected to be lambda_max - (0.1 lambda_max)

x_norm = np.array([np.linalg.norm(X[:,i], 2) for i in range(X.shape[1])])  # l2 norms of each variable/column
y_norm = np.linalg.norm(y, 2)                                              # l2 norm of y

[3]:

lambda_v

[3]:

6833.7

[4]:

lambda_max

[4]:

7593

[5]:

x_ip

[5]:

array([7593, 6770, 4740, 2834, 1890])

[6]:

x_norm

[6]:

array([89.35323161, 83.98214096, 70.79548008, 54.91812087, 44.754888  ])

[7]:

y_norm

[7]:

97.22139682189307


7.2.1. Safe scoring rule

$$\lambda - ||x_j||_2 ||y||_2 \dfrac{\lambda_{\mathrm{max}} - \lambda}{\lambda_{\mathrm{max}}}$$

[8]:

lambda_v - x_norm * y_norm * ((lambda_max - lambda_v)/(lambda_max))

[8]:

array([5964.99540119, 6017.21389479, 6145.41645378, 6299.77835781,
6398.58672739])


The safe scoring rule inequality.

$$|x_j^T y| < \lambda - ||x_j||_2 ||y||_2 \dfrac{\lambda_{\mathrm{max}} - \lambda}{\lambda_{\mathrm{max}}}$$

As you can see, $$X_1$$ and $$X_2$$ are kept, but $$X_3$$, $$X_4$$ and $$X_5$$ are removed by the inequality.

[9]:

x_ip < lambda_v - x_norm * y_norm * ((lambda_max - lambda_v)/(lambda_max))

[9]:

array([False, False,  True,  True,  True])


7.2.2. Strong scoring rule

$$2\lambda - \lambda_{\mathrm{max}}$$

[10]:

2 * lambda_v - lambda_max

[10]:

6074.4


The strong scoring rule inequality.

$$|x_j^T y| < 2\lambda - \lambda_{\mathrm{max}}$$

Again, as you can see, $$X_1$$ and $$X_2$$ are kept, but $$X_3$$, $$X_4$$ and $$X_5$$ are removed by the inequality.

[11]:

x_ip < 2 * lambda_v - lambda_max

[11]:

array([False, False,  True,  True,  True])