It was the seventh day of advanced mathematical and statistics class. Today, Gary introduced us to a new dataset which consists of Armed Conflict Location and Event Data (ACLED) on the regions India and USA.
We got a solid introduction to the dataset, discussing the variables involved, such as location, event type, date, and the involved parties. Then, we discussed on clustering methods, specifically K-means clustering. It’s particularly useful in our dataset because we can group regions based on the similarity of conflict types or intensity. We talked about choosing the number of clusters, the importance of centroids, and the iterative process of refining the clusters until they converge.
It was clear that applying clustering algorithms to the ACLED dataset could yield some valuable results.
The questions I have are:-
- How might the nature of conflict (e.g., protest, violence, battles) affect the clustering results in different regions?
- Can we use the clusters to predict future conflict hotspots or provide insights for conflict prevention?
- How are the locations of conflicts distributed across different states or regions within India and the USA? Are certain areas more prone to conflict?
- Are there any missing or incomplete data points in the dataset, and how should they be handled during analysis?