Peer Reviewed Journal via three different mandatory reviewing processes, since 2006, and, from September 2020, a fourth mandatory peer-editing has been added.
Hashing algorithms long have been widely adopted
to design a fast address look-up process which involves a search through a large database to find a record associated with a given key. Hashing
algorithms involve transforming a key inside each target data to a hash value hoping that the hashing would render the database a uniform distribution with respect to this new hash value. The close the final distribution is to uniform, the less search time would be required when a query is made. When the database is already key-wise uniformly distributed, any regular hashing algorithm, such as bit-extraction, bit-group XOR, etc., would easily lead to a statistically perfect uniform distribution after the hashing. On the other hand, if records in the database are instead not uniformly distributed as in almost all known practical applications, then even
different regular hash functions would lead to very different performance. When the target database has a key with a highly skewed distributed value, performance delivered by regular hashing algorithms usually becomes far from desirable. This paper aims at designing a hashing algorithm to achieve the highest probability in leading to a uniformly distributed hash result from a non-uniformly distributed database. An analytical pre-process on the original database is first performed to extract critical information that would significantly benefit the design of a better hashing algorithm. This process includes sorting on the bits of the key to prioritize the use of them in the XOR hashing sequence, or in simple bit extraction, or even a combination of both. Such an ad hoc hash
design is critical to adapting to all real-time situations when there exists a changing (and/or expanding) database with an irregular non-uniform distribution. Significant improvement from simulation results is obtained in randomly generated data as well as real data.