Data sets can be imagined as "clouds" of data points in a multidimensional space. These points are generally differently distributed: more widely scattered in one area and denser in another. CA is used to identify the denser areas efficiently, grouping the data in a certain number of significant subsets on the basis of this criterion. Each subset corresponds to a category.
"Think of a database of facial photographs ", explains
"We tried to devise a more efficient algorithm than those currently used, and one capable of solving some of the classic problems of CA", continues Laio.
More in detail...
"Our approach is based on a new way of identifying the centre of the cluster, i.e., the subsets", explains
To find out if a place is a city we can ask each inhabitant to count his "neighbours", in other words, how many people live within 100 metres from his house. Once we have this number, we then go on to find, for each inhabitant, the shortest distance at which another inhabitant with a greater number of neighbours lives. "Together, these two data", explains Laio, "tell us how densely populated is the area where an individual lives and the distance between individuals who have the most neighbours. By automatically cross-checking these data, for the entire world population, we can identify the individuals who represent the centres of the clusters, which correspond to the various cities". "Our algorithm performs precisely this kind of calculation, and it can be applied to many different settings", adds Rodriguez.
The performance of the procedure proved to be optimal: "we tested our mathematical model on the Olivetti Face Database, an archive of facial photographs, obtaining highly satisfactory results. The system recognised most individuals correctly, and never produced 'false positive' results", comments Rodriguez. "This means that in some cases it failed to recognise a subject, but it never once confused one individual with another. Compared to other similar methods, ours was particularly effective in eliminating outliers, that is, those data points that are so very different from the others that they tend to skew the analysis".
Keywords for this news article include:
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Cantwell Targets Gender Gap in Small-Business Loans
- Americans Still Pessimistic Despite Economic Growth
- Parra Joins Exclusive Club of Hispanic CEOs
- Axxis Solutions Appoints Benites as CEO
- Pending Home Sales in U.S. Rise in Hopeful Sign
- Visual Search Sounds Cool, Remains Elusive
- Texans Look for Perry-Cruz Showdown in 2016
- Chrysler Gets Nod as a Top Employer for Hispanic Women
- Josh Gordon Loses Appeal, Out for Season
- U.S. Banks' Earnings Rose 5.2 Percent in Q2