PART 2 – The dataset and first exploration

After hitting golf balls with our fairway wood golf club on the driving range, we got data and we need to select the relevant ones.

So we have decided to keep and build our dataset with this :

  • ball Carry
  • ball Speed
  • ball Apex
  • ball Launch Angle
  • ball Dispersion

We have a dataset which was cleaned and normalized with Python process. This dataset now looks like this for balls hit by a our fairway wood golf club: 

Carry (meters)Speed (km/h)Apex (meters)Launch Angle (degrees)Dispersion (meters)
188 1312614.85.3
1761263720.737.6

We have this scatter plot for 10 golf shots. Of course, we could do it with 100 or 1000 golf shots but time, cost and light bulbs in hands were our limits:

In order to answers the questions related of our part 1 article, we wanted to draw correlation between these data. In golf, we want to improve distance and accuracy. So we have built a Heat Map in order to detect and define the correlation between each caracteristics. A Heap Map is is a graphical representation of data where values are depicted by color.

Regarding our dataset, the more the color is white, the stronger the correlation between caracteristics of our data set is high.

Here, we can observe that the Carry distance (meters) is hightly correlated to Ball Speed and Launch Angle.

We can notice also that the Carry distance is less correlated to the Apex (how high fly the ball).

Then, it helps us to decide which features of the dataset is important. For this, we had a limit of a correlation coefficient of 0,4. It means that we exclude all features where correlation coefficient is not strong enough to be relevant.

We can then tell that Ball Speed and the Launch angle have nearly no impact on the Dispersion of the ball.