Peer Reviewed Journal via three different mandatory reviewing processes, since 2006, and, from September 2020, a fourth mandatory peer-editing has been added.
Chatbots and the technology behind them are widely used in many places and in various ways. Retrieval Augmented Generation AI framework has gained its popularity by its linking of large language model with private dataset. It enables one to run AI locally and privately with the most updated information and knowledge. In this report, we aim to improve the local private chatbot response time by using a cache. From our experimental results, the majority of time spent during the query process is in the generation of the response. The response time can be significantly improved when there is a hit on the cache system which enables us to return the response to the user immediately without going through the generation step. In this report, we focus our efforts on improving the turnaround time of the generation step. The cache is organized into categories which can be used for efficient searching. User’s query information such as query string, embedding information, and its response are recorded and stored in the cache. Experiment results are presented and the issues of speed up of request response turnaround time is addressed.