Mining colocation service are well-established on spatial objects with categorical labels, which capture the phenomenon that objects with certain labels are often located in close geographic proximity. Similar to frequent itemsets, co-location patterns are defined based on a support measure which quantifies the popularity (or prevalence) of a pattern candidate (a label set). Quite a few support measures exist for defining co-location patterns and they share an idea of counting the number of instances of a given label set C as its support, where an instance of C is an object set whose objects carry all the labels in C and are located close to one another.
Unfortunately, these measures suffer from various weaknesses, e.g., some fail to capture all possible instances while some others overlook the cases when multiple instances overlap. In this paper, we propose a new measure called Fraction-Score whose idea is to count instances fractionally if they overlap. Compared to existing measures, Fraction-Score not only captures all possible instances, but also handles the cases where instances overlap appropriately (so that the supports defined are more meaningful and consistent with the desirable antimonotonicity property). To solve the mining colocation problem based on Fraction-Score, we develop efficient algorithms which are significantly faster than a baseline that adapts the stateof-the-art. We conduct extensive experiments using both real and synthetic datasets, which verified the superiority of Fraction Score and also the efficiency of our developed algorithms.
Weaknesses of Existing Support Measures
The partitioning-based approach uses a grid to partition the space into many cells, constructs for each cell a transaction involving all objects within the cell. And then defines supports based on the generated transactions as if they are on conventional transaction data. With this approach, only those instances within individual cells are considered, while those across cells are missed since two objects within distance d but across cell boundaries are ignored. Suppose the grid as indicated by the dashed lines in the figure is used. The object set:
R3;
B1;
They corresponds to an instance of the label set, but since the objects are located in different cells, there would be no generated transactions which involve this instance, and thus it is missed. You can visit us at https://blanc-mining.com/