SAS Learning Post

Standardizing phone numbers using SAS

SAS Certifiations, SAS Guide, SAS TutorialsHow many times have you entered a phone number on a web page, only to be told that you did not type it the “correct” form? I find that annoying. Don’t you?

In my latest book, Cody’s Data Cleaning Techniques, 3rd edition, I show how to convert a phone number in any form and convert it to a standard form. Because not everyone will buy and read this book (well, I could hope), I thought I would share this technique with you on this blog.

Let’s start out with a small data set (Numbers) that contains standard 10 digit US phone numbers. You can run the DATA step yourself, if you wish to “follow along.”

data Numbers;
input Phone $15.;
#(210) 567-9451

Yes, this looks like a mess. Look how easy it is to extract the digits from the number and, if you wish, convert the number into the common form, for example: (###)###-####.

data Convert;
set Numbers;
length Digits $ 10 Standard $ 14;
Digits = compress(Phone,,’kd’);
Standard = cats(‘(‘,substr(Digits,1,3),’)’,substr(Digits,4,3),

The key to this program is the COMPRESS function with the two modifiers ‘k’ (keep) and ‘d’ (digits) as the third argument to the function. SAS added a third argument to the older COMPRESS function starting with SAS 9. This argument allows you to specify modifiers such as ‘a’ (all upper- and lowercase letters – alpha), ‘d’ (all digits), and ‘k’ (keep the specified characters and remove everything else). The variable Digits is a character variable that contains only the digits in the variable Phone (see listing below). To create the variable Standard, you concatenate all the components necessary to create the required standard form. The CATS function concatenates all of its arguments after first stripping leading and trailing blanks. You could have used the standard || or !! to perform this operation but I like to show off some of the great SAS functions. By the way, be sure to specify the length of the variable Standard, because if you don’t, the default length for the result of any of the CATS function is 200.

Here is a listing of data set Convert:

SAS Certifiations, SAS Guide, SAS Tutorials

Keep the COMPRESS function in mind when you encounter problems similar to this one.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s