News Column

Recent Findings in Machine Learning Described by H.Y. Shen and Colleagues (Token list based information search in a multi-dimensional massive...

June 30, 2014

Recent Findings in Machine Learning Described by H.Y. Shen and Colleagues (Token list based information search in a multi-dimensional massive database)

By a News Reporter-Staff News Editor at Robotics & Machine Learning -- Investigators publish new report on Machine Learning. According to news originating from Bentonville, Arkansas, by VerticalNews correspondents, research stated, "Finding proximity information is crucial for massive database search. Locality Sensitive Hashing (LSH) is a method for finding nearest neighbors of a query point in a high-dimensional space."

Our news journalists obtained a quote from the research, "It classifies high-dimensional data according to data similarity. However, the 'curse of dimensionality' makes LSH insufficiently effective in finding similar data and insufficiently efficient in terms of memory resources and search delays. The contribution of this work is threefold. First, we study a Token List based information Search scheme (TLS) as an alternative to LSH. TLS builds a token list table containing all the unique tokens from the database, and clusters data records having the same token together in one group. Querying is conducted in a small number of groups of relevant data records instead of searching the entire database. Second, in order to decrease the searching time of the token list, we further propose the Optimized Token list based Search schemes (OTS) based on index-tree and hash table structures. An index-tree structure orders the tokens in the token list and constructs an index table based on the tokens. Searching the token list starts from the entry of the token list supplied by the index table. A hash table structure assigns a hash ID to each token. A query token can be directly located in the token list according to its hash ID. Third, since a single-token based method leads to high overhead in the results refinement given a required similarity, we further investigate how a Multi-Token List Search scheme (MTLS) improves the performance of database proximity search. We conducted experiments on the LSH-based searching scheme, TLS, OTS, and MTLS using a massive customer data integration database. The comparison experimental results show that TLS is more efficient than an LSH-based searching scheme, and OTS improves the search efficiency of TLS."

According to the news editors, the research concluded: "Further, MTLS per forms better than TLS when the number of tokens is appropriately chosen, and a two-token adjacent token list achieves the shortest query delay in our testing dataset."

For more information on this research see: Token list based information search in a multi-dimensional massive database. Journal of Intelligent Information Systems, 2014;42(3):567-594. Journal of Intelligent Information Systems can be contacted at: Springer, Van Godewijckstraat 30, 3311 Gz Dordrecht, Netherlands. (Springer -; Journal of Intelligent Information Systems -

The news correspondents report that additional information may be obtained from H.Y. Shen, Wal Mart Stores Inc, Bentonville, AR 72716, United States. Additional authors for this research include Z. Li and T. Li.

Keywords for this news article include: Arkansas, Bentonville, United States, Machine Learning, North and Central America

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC

For more stories covering the world of technology, please see HispanicBusiness' Tech Channel

Source: Robotics & Machine Learning

Story Tools Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters