The IRS publishes many interesting datasets derived from our filings. When we file the tax return (Form 1040,) we need to fill out the address. Yep, the IRS uses the data to estimate the yearly migration. Let’s assume we have the following data points for 2014, and 2015 filings:

We can see that one person left Hawaii to Washington: +1 for Washington and -1 for Hawaii. Yep, the IRS has the tax filers’ address, and they did just that to millions of tax filers. The best thing about this is that they published their finding way back to 1990 at this site.

So, we will take a look at the State of Washington migration data from 2011 to 2014. But before that, as usual, we need to load libraries and set up the theme.

But before we deal with IRS data, I’d like to get the mapping done first. published associated latitude and longitude in each State here. I then put that in an Excel format, and imported to the environment.

Well, that doesn’t look good. The author of a fiftystater package moved Alaska and Hawaii to be under California. Therefore, we need to change their coordinates.

That’s much better. We are done with the mapping. Next, IRS published the data (link) in a CSV format separate into inflow and outflow. Therefore we need to load eight files to R.

After running the for loop()  , there should be eight data frames in the environment.

Unfortunately, IRS published the structure of data is appropriate for viewing in Excel. Therefore we need to wrangle with the data.

Okay. The unwanted rows are gone. Still, the format is far from good. Since we have eight files which require the same wrangling tasks, I’ll write a function to perform the tasks.

Next, we aggregate the data.

We are ready to plot the charts.

It seems like people move from California and Oregon to Washington the most. The grey circle at lower right corner represent foreign migrants which IRS doesn’t specify the country of origin. So I just put them at the edge. What about outflow?

Hm, putting the arrows in look quite fancy. But I’d think it is quite harder to look. Nevertheless, we need better visualization. I feel like the numbers of inflow and outflow are, in fact, about the same. Let’s visualization again but this time with net migration.

Initially, the IRS sort the data by Number of Return. So, we cannot just merely subtract Inflow with Outflow. For example, if California has the highest inflow, it then will be the first entry. But if Oregon has the highest outflow, we cannot subtract them.

First, let’s sort them.

Although we don’t have any new State in the past ten years, I still want to make sure that there is no error. So, let’s do this.

All 222 observations’ State and Year matched perfectly. So, we can just merely subtract them in a new dataset.

Alright, let’s see the net migration.

It seems like Washington State has a surplus migration from almost every State except North Dakota, South Dakota, and Idaho. Also, there is a slight deficit in foreigner category.

Well, I am a bit surprised. Amazon, Microsoft, Costco, and Starbucks have headquarters/campuses. So I wouldn’t be surprised if they will attract talents and have a net surplus. But South Dakota? Why would Washingtonians move to South Dakota? Is it the shale boom? We may have some more info if we take a look at a times series plot.

Oh, things have improved. In 2011, Washington State had the net deficit with the three States. But it gradually decreased over the years. Idaho’s net deficit is the most notable as the Net_Migration has steeply increased in 2013 and 2014. Let’s compare the actual inflow and outflow for greater detail.

All three States exhibit a downward trend on both Inflow and Outflow. But the downward trend is strong in Idaho in which in 2013 and 2014, the Outflow dropped below Inflow.

TL;DR From 2011 to 2014, the State of Washington can attract Americans more than it drives away. However, after summing the inflow and outflow, the State of Washington has a net deficit with Idaho, North Dakota, and South Dakota.