Conclusion

Key Findings

Our research primarily delved into technology-related discussions on Reddit, given the increasing importance of technology in today’s world. We focused on extracting comments and submissions from three key subreddits: r/technology, r/Futurology, and r/news. While r/technology and r/Futurology represent user-generated content about technology, r/news provides insights from media sources. Our investigation followed three distinct methodologies to unveil our final findings.

Exploratory Data Analysis (EDA)

In our EDA phase, we analyzed author distribution in r/technology and r/Futurology to determine if certain authors dominated discussions. Additionally, we examined discussion volumes in these subreddits before and after significant tech events, focusing on major events related to OPENAI. Similarly, we investigated the stock prices of the top five tech corporations to assess the influence of major events. Furthermore, we explored frequently discussed technology themes in news articles and tracked changes in discussion volume over time. Our findings revealed that while a small number of authors dominate discussions in submission posts, this trend is less prevalent in comment sections. Additionally, major events like those involving OPENAI influence both stock prices and discussion volumes to some extent. Notably, AI emerged as the most discussed theme in news articles, underscoring its significance in today’s world.

Natural Language Processing (NLP)

In the NLP phase, we conducted detailed research on word terms and tokens. We identified the most frequent technical products in r/technology and extracted keywords for the most discussed technologies. These keywords helped us identify the most commonly discussed products in the subreddit. We repeated this process for r/news and calculated mean sentiment scores for different products based on post contents. Time-series analysis of sentiment scores revealed varying attitudes towards different products. Products like Copilot and ChatGPT garnered the highest positive sentiment, while others like drone and Aurora received more negative feedback. Interestingly, after the introduction of ChatGPT, public and media opinion towards AI-related products showed an overall positive trend.

Machine Learning (ML)

In our ML analysis, we aimed to predict the stock prices of big tech corporations using sentiment scores from public and media opinions. We evaluated the performance of various models based on RMSE and identified important features influencing stock prices, such as the sentiment score of ChatGPT. Additionally, we explored whether machines could differentiate between submission posts from public opinion and those from media opinion. Our results demonstrated promising accuracy, with the Naive Bayes model achieving approximately 75% accuracy.

Overall, our research highlights the significant influence of major tech events, particularly the introduction of ChatGPT, on the technology landscape. As of February 2023, public and media sentiment towards AI products like ChatGPT remains predominantly positive.

Limitations and Future Improvements

While conducting our EDA and NLP tasks, we manually identified keywords for each theme and technical product. Although we also checked word frequency in each subreddit, it’s possible that we missed some keywords, introducing potential bias. Despite our efforts to include the most important patterns in our classification tasks, this limitation should be acknowledged.

Additionally, the time frame of our Reddit dataset spans from January 2021 to February 2023. This period may not fully capture the trends of certain technical products, especially AI-related ones, which gained more popularity after 2023. Over time, people’s sentiments toward AI products may undergo some changes following their release. These shifts could be influenced by various factors such as user experiences, advancements in technology, or evolving perceptions within the community. As a result, our conclusions regarding these products or themes may be biased toward short-term effects rather than long-term trends. Having a longer and more up-to-date dataset would provide a more comprehensive understanding.

Next Steps

Moving forward, we plan to address these limitations and further improve our analysis:

  1. Obtain Latest Data: Utilize APIs or other methods to acquire the latest Reddit data. By repeating our analysis with updated data, we can identify any significant changes or trends that differ from our current findings. Additionally, we will examine the long-term discussion volume of technology-related posts and news to capture evolving trends over time.

  2. Refine Keyword Lists: Develop more accurate lists of technical themes and products to reduce bias in our analysis. This involves tokenizing and analyzing all post contents to identify frequent words and patterns. Human intervention may be required to identify the most relevant keywords effectively, though this process may be time-consuming.

  3. Explore Other Subreddits: Expand our analysis to include other technology-related subreddits on Reddit. While we focused on ‘technology’ and ‘Futurology’ subreddits initially, there are numerous other potentially valuable subreddits to explore. By examining a broader range of communities, we can gain a more comprehensive understanding of the discussions and trends across Reddit’s tech-related communities.