Using the basics of Python Pandas to conduct an analysis on SAT and ACT data and providing a solution to the College Board (hypothetically). My very take at a data science problem.

According to an article by Education Week, in 2016 more than 2 million students, which is roughly 64% of high school graduates took the ACT compared to 1.64 million students who took the SAT. That gives you an idea of the issue I will be talking about today, which is the participation rate of SAT in the United States and how we can help improve it relative to ACT.

SAT vs ACT participation histograms

The histograms above depict the difference in participation rates between the two, where ACT has about 18 states having 100% participation rates and SAT having about 14 states having close to no one participating.

To get a better picture of what I am talking about I mapped it out below using Tableau. The way we look at the left graph, dark orange is 100% participation of SATs to dark blue being 0% participation. Right away we can see a clear indication that central America is dominated by ACT and the west coast and east coast are mainly SAT.

Participation Map

After more research, I was able to find out that some states required students to take a certain test. In the map on the right, blue states required ACT, orange states required SAT, and red required one or the other. From this we can inference one of the reasons as to why the SAT participation map looked the way it does because most ACT required states are in central America and required SAT in the east cost.

Participation Scatter plots

Something that was interesting in my findings was that in states that had high participation rates in SAT, they had low average total scores. And states that had low participation rates had higher average total scores, same could be said for ACT as well. Another way to see this is through this scatter plot where we see the higher SAT scores negatively correlated to its participation rate. The same with ACT scores. Generally, students would rather just stick with one. But, my take on this is that whichever state that had lower participation rate and high scores had only the prepared and ambitious students take the test.

Reading distribution for both tests

As an example, above is the distribution for one of the subjects in each test. There are three things that we look for in describing the distributions: the shape of the distribution, the spread of the distribution and the central tendency of the distribution.

Starting off with the shape of all the distributions, I would say that most of the distributions are normally distributed or even slightly positively skewed, aside from the participation of both tests. If we took a look at the mean and the median, we can see that most of the means are above the median of each categories. This supports the shape of the distribution leaning towards positive skew and helps tell us about the central tendencies of these distributions. Also, all of the standard deviations don’t seem to be too large which tells me that the spread is leaning towards more of the tight end of distributions. Another thing to point out would be that the peak of the distributions for each test tends to be at the mean but then dips and pops up again when it moves towards maximum value. In my opinion, there tends to be a good chunk of the participants that do especially well or at least above average. Perhaps these are states that have heavier education that better prepare their students in these categories.


Lastly, what can we take from this. Through my research we are able to see that SAT and ACT participation rates conflicted with each other and I would like to suggest a couple ways to improve SAT participation rates.

First, ACT has been known to work with a handful of states to provide free test taking for students during school days, SAT has been slow and should work towards this too. By establishing a cooperation or contract with more states to provide free tests and free education, this could provide more popularity.Also, SAT had a redesign of their test back in 2015 so one reason for a temporary decline in participation could be because students had felt uneasy or unprepared for the new test. So, perhaps it just take time for students feel confident about the test again and we will need to take a look at the data in the coming years.

Finally, if they really can’t enhance the participation rates in the states, SAT has a very strong presence internationally and so College Board can use this as a strength and dominate more in that aspect. All in all, the data is great for distinguishing how well the tests themselves are doing but doesn’t provide useful information to help improve SAT participation. There are many other factors we need to consider to make a better suggestion.

Data Science/Business/Project Management Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store