Fine-Tuning CLIP model with Classified Book Cover Images in PySpark

The cover of a book serves as the initial point of contact with potential readers and provides crucial insights into its content. The classification of book genres based on the information can provide immense benefits to modern retrieval systems as well as improve the generation of book covers.

Our research aims to classify books into related genres and generate book covers using CLIP. We train and adopt a classification model to predict genres for the books in Pyspark, and feed the resulting dataset of book title prompts and predicted genres into CLIP to generate book covers. CLIP is a generative model that creates images based on textual descriptions. The model employs a vanilla transformer to generate embeddings and a vanilla diffusion model to produce images from those embeddings.

Github Page

Final Report