
As a Graduate Researcher, I worked on a solo project for Prof. Jay Golden at Dynamic Sustainability Lab, where I was doing sentiment analysis of public opinions on the blockchain approach for carbon credit markets on Twitter.
The project's goals are to analyze tweets in English that can help understand public opinion on blockchain and its role in the transition to a net-zero economy around the world and find possible correlations between the geo-political and demographical background of those Tweets.
First, the project collects data on tweets about either current general perception or blockchain's role in the transition to a net-zero carbon economy. After I preprocessed and tokenized the data, I used unsupervised learning (word2vec with K-Means) to split preprocessed data into a desirable number of sentiments by considering keywords and hashtags. Finally, I used NLTK’s Sentiment Intensity Analyzer (VADER) and pre-trained BERT model to compare the sentiment polarity of every tweet to the results of my custom model and average the results. For the topic modeling, I was using a custom-trained Latent Dirichlet Allocation (LDA) model to find the most optimal number of topics and used ChatGPT to improve the coherence part of the results.
I plan to prepare all my findings for a research paper publication. While doing this research, I learned a lot about various NLP techniques and their applications as well as learned just how powerful some of the deep-learning language models are. I also coded everything in Python, which strengthened my knowledge of some NLP libraries and ML pipelines.
