All Data Languages are the Same

Before you begin to think about data and data languages, it is important to recognize that all data languages are essentially the same. I know that some may differ with me on this. I also know that some may argue that some are better than others, this is not what I’m saying either. What I am saying is that they all “behave” or “function” the same. Read on to understand further.

The Components of the Language

Each of the languages have the same components, whether you are working with DAX, R, Python, SQL, or just about any other language. Each language has its own versions of the same componenets:

What

This tells the language what you want exactly. In some cases it’s a column name, calculation, scalar expression, or other returned value. This is how the language understands what to return to you and what you want to see. In the SQL world, this is the “SELECT” clause in the statement. While R (in the dplyr package) can use the SELECT() function, or the $ operator to define the returned columns. Python works similar to R but has its own nuances. DAX’s CALCULATE() function expects the “what” to be defined in the first half of the expression.

From/Where

This is how one defines where from to pull the information. In this portion of the statement/expression is usually tabular raw data. In SQL this is the “FROM” clause while DAX has this in the filter component of the CALCULATE() expression. In the R language, this is usually the starting point of the expression as your statement starts with the data source (data frame, tibble, or another form of tabular data). The hardest part about writing an expression in any language is defining this portion of the from/where because this has to be constructed either visually or mentally. In the from/where portions of your statement, one must think about the whole data model and what would be included. This could include joins/connections/relationships to be able to build out a more robust data set that contains ALL the information necessary. While other times, it is more important to pair down the data and reduce the data being examined.

Inclusions/Exclusions

This portion of the statment can get very nuanced. This is because the visualization of the data model and the types of data can either be defined here or in the “where/from” section. For this I will talk about the SQL world. Suppose you wanted to look at sports data and you wanted to examine team performance, but only teams that had players with specific attributes. In the SQL world something like this would be accomplished by defining unique list of teams that have one or more players which match the criteria. From there, you could join that list in an “INNER JOIN” which would exclude every other team, then all subsequent information would only be for the teams in question (there are multiple ways to accomplish this, but that’s the basics). In some cases, you may want to look at the line-level inclusions and exclusions in the “WHERE” clause. While DAX can be entered in the “filter” portion of the calculate statement (which as referenced above can be multi-purpose). In the R language, the inclusions and exclusions can be in the FILTER() function in the dplyr package or can be simply defined in the statement.

This topic could go on much longer (and future edits may come) but I wanted to share this to be able to help bridge the gap between languages and methodologies. If we as people put up these barriers to learn we cut ourselves off from great opportunities and we will lose the ability to do great things! So lets keep on learning and growing!

Leave a Reply

Your email address will not be published. Required fields are marked *