From the course: Time Series Modeling in Excel, R, and Power BI

Time series models

- [Instructor] Time series models are a type of data model that lets us analyze data points or observations over time. Line charts serve as one of the most effective data visualizations for illustrating how time series models work. On the horizontal X-axis, we have the date, time, or date dimension we're measuring our data over, in chronological order from left to right. We'll put the data values we're measuring over time on the vertical Y-axis. We'll see a single line if we're evaluating a single series of data points. Conversely, if we're evaluating more than one time series, we'll see other lines on the same chart. We're going to start with our monthly New York City weather data. We want to use the average monthly temperatures, which is a single field in the data called TAVG, along with the date dimension for each month in the date field. We don't need the other fields in our time series model, so let's just delete them to make our data cleaner and easier to work with. Let's also add another time series to our data for the freezing point at 32 degrees Fahrenheit by adding another column to our data table. Notice that our data table is already in chronological order. However, if it's not, we can sort it in chronological order so the X-axis of the line chart properly displays once we create the visual. To create a line chart, let's select all three columns, then we'll choose to insert a line chart from the top ribbon. This creates a line chart with these three fields. We expect to see the date by the month on the X-axis and the two lines for the average temperature and the freezing point as separate lines on the Y-axis. If we want to change the data that appears, we can select the chart, then right click, and we'll choose select data. In the pop-up window, we can move the fields around. For example, we know that we want to see the temperature and the freezing point in the legend series as individual lines. We can choose to edit the series to select their line names and their associated values. And the categorical, the horizontal axis labels, we want to put the dates filled, which we already see here. We want to make sure that it does include all the dates for the analysis, which we do see in the chart already. After we create the line chart, we can see how the average monthly temperature in New York City oscillates every year, so it's hotter in the summer and colder in the winter. We can then change the colors of the lines. For example, if we click on actual temperatures, we see that it highlights all the data points. Then we'll right click and choose format data series And we'll make our line a black color. We can also add text labels for the x and y axes from the formatting menu of our selected line chart. We'll choose access titles, we'll then double click into the X-axis title and name it year, and do the same for the Y axis to label it temperatures and degrees Fahrenheit. And we'll call this actually months, or we'll call this date. Say month and year. If we want to change the title, we can just click into the existing one Excel automatically added for us to rename it average monthly temperatures in New York City. Now let's start creating our time series model in R. In GitHub let's copy the raw CSV view of the same New York City monthly weather data. In RStudio, let's connect to this data by storing it in a variable in our R script called DF Monthly. The acronym DF is short for data frame. In this case, let's add monthly to the end of the variable name to make it clear we're referring to the monthly weather data. This will later help us distinguish it from other variables we create within our R code. We'll then need to assign something to this new variable with the less than, followed by the dash sign. We'll use the REIT.CSV R function to assign our new variable, the data frame output we get from reading the online GitHub file with a string in quotation marks. You can also use the same notation to connect to a CSV file stored on your own computer by swapping the URL within the quotations to the folder path followed by the CSV file name on your computer. We can see what this new table looks like after we run the coach, create a new variable using the view function. We can put this in our script, or if we want to keep our code less cluttered, we can also run it in the RStudio console below. We can also view our data frame by double clicking on it in the RStudio variable explorer window, in the top right. In R, we can create a chart for our data frame variable using the plot function Within the plot function, let's specify we're putting the date field from the DF monthly variable on the X-axis and the TAVG average temperature field on the Y-axis. We're specifying each of the field names from our data frame variable after the dollar sign. We'll also set the type of our plot to align by setting the type parameter equal to L. When we run this plot there, we run into problems with an error message in red text indicating we're referencing undefined values. Using the HUD function displays the first six rows of the data. In the console, we see an N/A in the TAVG field for the first month, which is why we see an error message when we plot our data. If we run the tail function on the same data frame to see the last six rows, we'll also see an N/A for the last month. Later in this chapter, we'll use R functions to account for these N/As in the same data so we can then run them in time series models.

Contents