Using an Edge-dual Graph and k-connectivity to Identify Strong Connections in Social Networks

  March 1, 2008      Analytics, Law Enforcement
Li Ding, and Brandon. Dixon, in Proc. ACM Southeast Regional Conference 2008, Auburn, Alabama, US 2008

The goal of this paper is to use edge-dual graph transformation techniques to improve the accuracy of social network analysis (SNA). SNA is used in law enforcement to determine if relationships exist among potential suspects, and to identify just what those relationships might be. Relationships can be family, friends, past associates, cell mates and even prison enemies. The paper presented results that showed that this transformation had a very high potential for increasing the accuracy of relationship search routines.

Improved Variable and Value Ranking Techniques for Mining Categorical Traffic Accident Data

  December 1, 2005      Analytics, Law Enforcement, Motor Vehicles, Traffic Safety
Wang, H., A. Parrish, R. Smith and S. Vrbsky, Expert Systems with Applications, Volume 29, 2005, pp. 795-806.

This paper reviews the use of two new metrics for the process of assessing the significance of attributes in a database when two subsets of the data are compared. Traditional statistical techniques are useful, and the sample size in public safety databases usually allows the normal approximation to the binomial distribution to be used in comparing proportionate values. For example, the comparison of the proportion of alcohol related crashes on Saturdays would show an very highly significantly higher proportion than that for non-alcohol related crashes. However the new metrics go a step further than this in that they provide a clear intuitive grasp to the user as to exactly how much more is occurring, not in terms of proportions but in terms of number of crashes (for the traffic safety example). The metric is called Maximum Gain, and it measures directly the number of crashes over and above that which is typically expected. This provides a clear indication to the user of just what the potential gain is by applying a countermeasure related to the attribute (e.g., applying selective enforcement on Saturdays). It is not realistic to think that this gain would include all of the crashes for the attribute value; rather, it is realistic to view the maximum gain to be the total over-represented amount.

Strategies to Improve Variable Selection Performance

  June 1, 2005      Analytics
Wang, H., A. Parrish, R. Smith, S. Vrbsky, Proceedings of the 2005 International Conference on Information and Knowledge Engineering, Las Vegas, June 2005

This paper compares a “row major order” data structure that is used standard relational databases against the transposed “column major order” data structure used by CARE. These data structures are described in detail, as were the various filtering methods that could be employed. Performance tradeoffs between the two data structures demonstrated a clear advantage of the column major order over the traditional storage approaches.

Utilizing Commodity Hardware and Software to Distribute a Real-World Application: Maximizing Reuse While Improving Performance

  June 1, 2005      Analytics
Davis, M., R. Smith, B. Dixon, A. Parrish and D. Cordes, Software: Practice and Experience, Volume 35, no. 7, June 2005.

This research delved into the current use of the commodity computing hardware, which is motivated by a dramatic increase in the performance to price ratio. The research evaluated the performance of a statistical analysis application in a ten-node off the shelf computing cluster. The study had two stems: (1) examining the various network topologies, and (2) minimizing the software modifications required in distributing the application. The general conclusion was that when reuse of existing code is feasible, performance can be dramatically increased by the combined use of parallel computing and commodity components.

Variable Selection and Ranking for Analyzing Automobile Traffic Accident Data

  April 1, 2005      Analytics, Motor Vehicles, Traffic Safety
Wang, H., A. Parrish, R. Smith and S. Vrbsky, Proceedings of the 2005 ACM Symposium on Applied Computing (AI Track), April 2005, pp. 36-41.

This paper explores a data mining process in which the original dataset is first transformed through a variable subset selection process followed by the application of a machine learning algorithm. A variable ranking technique, called the Sum of Maximum Gain Ratio (SMGR), is applied. This technique computes a score that is based on the over-representation of attribute values. Essentially, SMGR is the ratio of the number of cases that could potentially be reduced by an effective countermeasure to the total number of cases associated with the over-represented value. SMGR was shown empirically to provide comparable results to alternative techniques, but it had significantly improved runtime performance.