The Do Loop

Robust principal component analysis in SAS

Recently, I was asked whether SAS can perform a principal component analysis (PCA) that is robust to the presence of outliers in the data. A PCA requires a data matrix, an estimate for the center of the data, and an estimate for the variance/covariance of the variables. Classically, these estimates are the mean and the… Continue reading Robust principal component analysis in SAS

The Do Loop

Random segments and broken sticks

A classical problem in elementary probability asks for the expected lengths of line segments that result from randomly selecting k points along a segment of unit length. It is both fun and instructive to simulate such problems. This article uses simulation in the SAS/IML language to estimate solutions to the following problems: Randomly choose a… Continue reading Random segments and broken sticks

The Do Loop

3 ways to visualize prediction regions for classification problems

An important problem in machine learning is the “classification problem.” In this supervised learning problem, you build a statistical model that predicts a set of categorical outcomes (responses) based on a set of input features (explanatory variables). You do this by training the model on data for which the outcomes are known. For example, researchers… Continue reading 3 ways to visualize prediction regions for classification problems

The Do Loop

Bootstrap estimates in SAS/IML

The bootstrap method consists of the following steps: Compute the statistic of interest for the original data Resample B times from the data to form B bootstrap samples. B is usually a large number, such as B = 5000. Compute the statistic on each bootstrap sample. This creates the bootstrap distribution, which approximates the sampling… Continue reading Bootstrap estimates in SAS/IML

The Do Loop

Test for the equality of two proportions in SAS

A SAS customer asked how to use SAS to conduct a Z test for the equality of two proportions. He was directed to the SAS Usage Note “Testing the equality of two or more proportions from independent samples.” The note says to “specify the CHISQ option in the TABLES statement of PROC FREQ to compute… Continue reading Test for the equality of two proportions in SAS

The Do Loop

Summary statistics and t tests in SAS

Students in introductory statistics courses often use summary statistics (such as sample size, mean, and standard deviation) to test hypotheses and to compute confidence intervals. Did you know that you can provide summary statistics (rather than raw data) to PROC TTEST in SAS and obtain hypothesis tests and confidence intervals? This article shows how to… Continue reading Summary statistics and t tests in SAS