SAS Users

Collaborative distributed data management using SAS Viya

This article in the SAS Viya series will explore how to accomplish distributed data management using SAS Viya. In my next article, we will discuss how SAS programmers can collaborate with their open source colleagues to leverage SAS Viya for distributed data management.

SAS Certifications Guide, SAS Certifications Tutorials and Materials, SAS Certifications

Distributed Data Management

SAS Viya provides a robust, scalable, cloud ready distributed data management platform. This platform provides multiple techniques for data management that run distributive, i.e. using all cores on all compute nodes defined to the SAS Viya platform. The four techniques we will explore here are DATA Step, PROC DS2, PROC FEDSQL and PROC TRANSPOSE. With these four techniques SAS programmers and open source programmers can quickly apply complex business rules that stage data for downstream consumption, i.e., Analytics, visualizations, and reporting.

The rule for getting your code to run distributed is to ensure all source and target tables reside in the In-Memory component of SAS Viya i.e., Cloud Analytic Services (CAS).

Starting CAS

The following statement is an example of starting a new CAS session. In the coding examples that follow we will reference this session using the key word MYSESS. Also note, this CAS session is using one of the default CAS library, CASUSER.

SAS Certifications Guide, SAS Certifications Tutorials and Materials, SAS Certifications

Binding a LIBNAME to a CAS session

Now that we have started a CAS session we can bind a LIBNAME to that session using the following syntax:

SAS Certifications Guide, SAS Certifications Tutorials and Materials, SAS Certifications

Note: CASUSER is one of the default CAS libraries created when you start a CAS session. In the following coding examples we will utilize CASUSER for our source and target tables that reside in CAS.

To list all default and end-user CAS libraries, use the following statement:

SAS Certifications Guide, SAS Certifications Tutorials and Materials, SAS Certifications

DATA Step

The DATA Step is one of the basic building blocks of SAS programming. It is used to apply business rules to data that will be used for analysis, visualizations and reporting.

For DATA Step code to run distributed in CAS the rules are:

  1. All source and target tables must reside in CAS
  2. You must use a two level name i.e. casuser.table, when identifying your source and target tables

SAS Certifications Guide, SAS Certifications Tutorials and Materials, SAS Certifications

PROC DS2

DS2 is a SAS programming language that is appropriate for advanced data manipulation. DS2 is included with Base SAS and shares core features with the SAS DATA step. DS2 exceeds the DATA step by adding variable scoping, user-defined methods, ANSI SQL data types, and user-defined packages. The DS2 SET statement accepts embedded FedSQL syntax, and the runtime-generated queries can exchange data interactively between DS2 and any supported database. This allows SQL preprocessing of input tables, which effectively combines the power of the two languages.

For PROC DS2 code to run distributed in CAS the rules are:

  1. All business rules to apply to the source data are defined as a DS2 THREAD program
  2. The PROC DS2 DATA program must declare a THREAD program
  3. The source and target tables must reside in CAS
  4. Unlike DATA Step, with PROC DS2 you use the SESSREF= parameter to identify which CAS environment the source and target tables reside in

SAS Certifications Guide, SAS Certifications Tutorials and Materials, SAS Certifications

PROC FEDSQL

FedSQL provides a common SQL syntax across all data sources. FedSQL is a vendor-neutral SQL dialect that accesses data from various data sources without having to submit queries in the SQL dialect that is specific to the data source. In addition, a single FedSQL query can target data in several data sources and return a single result table.

For PROC FEDSQL to run in CAS the rules are:

All source and target tables must reside in CAS and you use a two level name to reference these tables
Like PROC DS2 we must identify the CAS session using the parameter SESSREF=

SAS Certifications Guide, SAS Certifications Tutorials and Materials, SAS Certifications

PROC TRANSPOSE

The TRANSPOSE procedure creates an output data set by restructuring the values in a SAS data set, transposing selected variables into observations. The TRANSPOSE procedure can often eliminate the need to write a lengthy DATA step to achieve the same result. Further, the output data set can be used in subsequent DATA or PROC steps for analysis, reporting or further data manipulation.

For PROC TRANSPOSE to run in CAS the rules are:

  1. All source and target tables must reside in CAS
    a. Like DATA Step you use a two-level name to reference these tables

SAS Certifications Guide, SAS Certifications Tutorials and Materials, SAS Certifications

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s