Chronic diseases, such as heart disease, cancer and diabetes are among the most common and costly health problems. According to the Center of Disease Control and Prevention (CDC), “half of all American adults have at least one chronic condition and almost one of three have multiple chronic conditions”. Apart from the cost burden, heart disease and cancer are the top one and two causes of death in the country. While, diabetes ranks a little lower as top killer (#7 in 2011), its cost burden and effect on patients’ quality of life make it a top concern for physicians. For these reasons in 2014 the CDC released a study on the national indicators of chronic diseases and some of the findings are cause for concern.
Cancer: The average mortality due to different cancer types (over the years 2008-2012) show that lung cancer is the most lethal form (Figure 1).
Unfortunately, there is no routine screening protocol for lung cancer and the disease is not uncovered until the patient has symptoms. For many types of cancer, early detection can be a vital factor in determining if the patient survives. This is the case for female breast cancer, which is shown to have the second highest mortality rate. Early detection of breast cancer is achieved through routine mammography screening. Unfortunately, the indicators show that the use of mammography among the most vulnerable age group of women, ages 50-74, has gone down from 2012 to 2014. The reason for this is not clear, and further research is needed. It is possible that there has been a change in the how insurance companies cover mammography screening.
Diabetes: The indicator for the prevalence of diabetes may also be following an unfavorable trend. The percent of adults diagnosed with diabetes went up in 2014 compared to the two previous years (Figure 2).
However, there is a caveat here. The year 2014 was also the first year of national open enrolment for health insurance through the Affordable Care Act. Hence, while the increase in diagnosis that year may be an indicator that more people are getting diabetes, at least some of the rise in diabetes diagnoses in 2014 may be due to the fact that more people were insured and were getting tested that year. One of the striking trends in the prevalence of diabetes is the strong geographical factor. Figure 3 shows that the percent of the adult population that have diabetes by state, varies drastically on location. An adult living in West Virginia is twice as likely to develop diabetes as one living in Utah. Hence, when it comes to diabetes, where you live matters! This trend underscores the fact that type-2 diabetes is preventable and that changes in diet and lifestyle can have a big effect on the chance of developing the disease.
Cardiovascular Disease: The number of deaths in the country from cardiovascular disease has been steadily climbing since 2011 (Figure 2).
However, like diabetes, where you live matters. Also, when comparing the map of the prevalence of diabetes with the map of the prevalence of cardiovascular disease, some connections arise. Particularly, the states that had the highest rates of diabetes also showed some of the highest rates in cardiovascular disease. However, the converse relationship does not hold. For example, Michigan has a high rate of cardiovascular disease but not of diabetes.
While, the national chronic disease indicators show a number of disheartening trends, it is important to emphasize that we can do something about them. Type-2 diabetes and cardiovascular disease can in many cases be prevented with proper alterations in diet and lifestyle. Cancer mortality can be reduced by diligent screening. Chronic diseases, more than any other ailments, can be attacked by concentrating on the prevention side of the medical equation. And a sustainable national healthcare policy must underline this approach.
Choosing a high school is one of the first big decisions in life. With over 400 public high schools in New York City, the families of students can be overwhelmed by the long list of schools, each of which promises secondary education for students with curriculum ranging from biology to engineering to musical theater. The options seem endless. Using the public high school data sets released by NYC Department of Education, I made this shiny app to visualize the information from these data sets.
There are 5 data sets available from NYC OpenData portal: NYC school district map, 2016 Department of Education(DOE) high school directory, SAT results from 2010 and 2012, DOE high school survey results from 2011. Even though school survey and SAT are conducted every year, summaries of more recent SAT and high school surveys are not yet available on the website.
The app’s user interface contains 3 tabs: interactive map, school info and school district comparison. The map tab (figure below) shows the map of New York City, overlaid with school district map and public high school locations. The markers representing schools are color-coded in 3 colors: blue for schools prioritize the corresponding school districts, green for schools open to all New York City students, red for special high schools, usually international high schools for new residents of NYC.The panel on the right displays 3 histograms; the top one is the distribution of average SAT scores for NYC high schools in 2010 and 2012, the middle one shows the distribution of 2011 high school survey rating, and the bottom one presents the numbers of students enrolled in NYC high schools in 2016.
When a school on the map is selected via mouse-click. A pop-up displaying name and address of the selected school will show up at the school’s location on map. The school’s name, address, contact info along with SAT scores, survey rating and student number will also be showed on the information panel on the right. The histograms on the panel will be updated and the school’s standing among NYC public schools can be visualized on the graphs(figure below).
The school info tab(figure below) provides detailed information of individual schools. The drop-down menus at the top can be used to filter the list of schools based on borough and neighborhood along with the search box. Once a school is selected in the data table, details of the school, such as contact information, nearby transits and school programs will be displayed below.
The School District Comparison tab(figure below) allows users to compare different school districts. The drop down menu contains NYC’s 32 school districts, selecting them will filter the school list. NYC DOE labeled 9 public high schools as “Gifted and Talented” to meet the needs of gifted students, these 9 schools are open to all NYC students whilst being the most selective public high schools. These schools are also added to the selection list as “Gifted and Talented” to compare with district schools. There are 3 measurements available to compare the selected school districts: 2012 SAT score, 2010 SAT score and 2011 school survey. If SAT scores are picked, 4 box plots will be displayed to compare the cumulative, reading, math and writing between the schools in the 2 chosen school district; if school survey is picked, the 4 box plots will compare the school districts in 4 aspects of the survey: safety and respect, communication, engagement and academic expectation.
I had this question after making the app: since 2 ways to measure a school’s academic performance, SAT score and academic expectation aspect of school survey, are present in the data, which one should you trust more to gauge the school with? Why are some schools with average SAT scores having very good academic ratings in the survey? Let’s look at the plot below. In this plot of 2011 survey academic expectation rating vs 2012 SAT score, a linear trend line is added along with 2 parameters to evaluate the fit: the very small P-value of coefficients indicates that there is a positive relationship between survey rating and SAT score, the small adjusted R-squared value indicates the linear fit does not represent the full relationship between them. SAT score is an objective evaluation of schools while school survey is a subjective review representing how well a school’s academic performance match the reviewer’s expectation.
Visualizing the Game Style and Shooting Performance among Superstars via NBA Shot-log
Contributed by Xinyuan Wu.
In the NBA, a top player makes around a thousand shots during the entire regular season. A question worth asking is: What information can we get by looking at these shots? As a basketball fan for more than 10 years, I am particularly interested in discovering facts that can not be directly seen on live TV. When I was surfing on web last week, I found a data set called NBA shot-log from Kaggle. This data summarizes every shot made by each player during the games in the 14/15 regular season along with a variety of features. I decided to perform an exploratory visualization with this data. Now Let’s dive into the shot-log, and see what interesting information we can discover in terms of game style and shooting performance among NBA players. I focused this analysis on Stephen Curry, James Harden, Lebron James and Russell Westbrook, who are ranked 1-4 in the MVP ballot in 2014-to-2015 season and undoubtedly superstars in the league.
Data Obtaining and Processing
Data cleaning, feature creating and graph processing were performed using R. The package used for generating graphs is ggplot2. The R code for data cleaning and feature creation can be found here.
Figure 1. Shot density plot with respect to shot distance. The graph above demonstrates the distribution of the shot attempts by each player versus shot distance. All four players have a local maximum centered at around 5 feet and 25 feet, corresponding to lay-up region and three-point region. Curry has the shot density leaning towards three-point zone while James shot more shots at the paint zone, indicating different play style between two players. It can also be seen that Westbrook uses two-point jumper frequently, as suggested by the peak at around 17 feet.
Figure 2. Violin plot that summarizes shot accuracy for each player.
The above violin plot summarizes the the shot accuracy for each player throughout the season. Based on the visual inspection of this plot, Curry and James have relatively stable shot accuracy compared to Harden and Westbrook (as suggested by a wider shape).
Figure 3. Boxplot that describes the shot accuracy with respect to match result.
After seeing the summary of shot attempt and shot accuracy, let’s explore how these values behave when other factors are taken into account. Let’s divide the shot accuracy according to the match result. From the plot, Curry, James and Westbrook display a large gap between the won games and the lost games. In contrast, Harden shows a relatively small accuracy gap.
Figure 4. The shot number and shot accuracy with respect to date.
Then let’s look at how the shot number and accuracy change over the season timeline. Westbrook tends to make more shots at the end of the season, during which time Oklahoma City Thunder is fighting for the last playoff position. From the graph on the right, Curry and James have relatively stable shot accuracy throughout the timeline, while the accuracy of Harden and Westbrook seems to have greater variance.
Figure 5. Number of shots with respect to touch time.
Now let’s see the number of shots plotted against touch time. Curry performed more shot at a very short touch time, indicating his catch-and-release shooting style. In contrast, Westbrook tends to have the ball in hand for a few seconds before taking the shot.
Figure 6. Shot accuracy with respect to shot distance.
An interesting phenomenon was observed when plotting shot accuracy against the shot distance. As shown above, the shot accuracy decreases from the lay-up region to around 10 feet. For Curry, James and Westbrook, although value of accuracy differ with each other, they all have a local maximum at around 14 feet. Let’s call this region the comfortable zone. On the other hand, the accuracy peak of Harden extends out of the three-point line, which is different with the others. When the comfortable zone is passed, the accuracy for all players decreases monotonically.
Figure 7. Density plot with respect to shot distance and closest defender distance.
When combining defender distance into figure 1, we get a contour plot that can give us a general feeling about the play style of each player. From the plot on the left, it can be seen that at lay-up region, the contour plot for Westbrook lies below the one for Curry, meaning that Westbrook tends to make more tough lay-ups than Curry. To my surprise, Westbrook is even more aggressive at the rim than Lebron James. Figure 8. Shot number and shot accuracy with respect to opponent and players. From the heat map above, we can view the number of shots and shot accuracy with respect to each opponent. For example, Westbrook made more shots when playing against New Orleans Pelican and Portland Trail Blazers, and Harden had poor accuracy when playing against Boston Celtics. Figure 9. The shot accuracy after made shots. The top graph combines all shots, while the bottom graph takes only three point shots into account. Some people believes that making one shot will affect the accuracy of the next shot. Based on the shot-log, we can actually explore this effect. A set of plots has been generated. For each player, the left most red bar represents the shot accuracy of all shots right after missing one shot. The green, blue, and purple bars represent the shot accuracy after making 1, 2 and 3 consecutive shots. It is interesting to note that, almost for all players under study, having one shot made seems to have a negative effect on the following shot. The more consecutive shots are made, the lower the accuracy of the next shot. When only three-point shots are taken into account, this trend still holds true for Curry and Lebron James.
Takeaways and Future Direction
From these graphs, we can see that four stars have dramatically different play styles. For example, Stephen Curry tends to perform catch and quick release, while Russell Westbrook prefers to attack the rim with ball in hand. In terms of shot accuracy, Stephen Curry and Lebron James have a more stable performance than Harden and Westbrook. Interestingly, in most cases, hitting one shot tends to have a negative effect on the next shot. A deeper exploration is needed for more detail about this phenomenon. For the future direction, focusing on the defender side of the data is a potentially interesting extension. Further more, we could also apply machine learning techniques to predict the probability of hitting a shot.