Using GPT Models for Qualitative and Quantitative News Analytics in the 2024 US Presidential Election Process

Using GPT Models for Qualitative and Quantitative News Analytics in the 2024 US Presidential Election Process

PDF version

Generative Pre-trained (GPT) Models can be effectively used for summarizing and analyzing of different text datasets including news articles. Large Language Models (LLMs), due to their transformer structure with attention mechanisms, can help analyze complex texts and reveal different text styles. LLMs, such as ChatGPT, demonstrate high efficiency in the analysis of complex texts. GPT models introduce new features compared to conventional transformer-based language models. One of them is zero-shot and few-shot learning, where the model performs well with only a few training examples or even no examples at all, relying solely on the instructions describing what should be done. Another important feature is reasoning, where a model can generate new patterns and conclusions based on an input prompt and the facts known by the model which were not included into it directly during the training process. One of the approaches of using LLMs is based on retrieval-augmented generation (RAG), which uses the results from other services, e.g. relational databases, semantic search, graph database, in the input prompt for the LLM. In this case, the response can be treated as the combination of external results and the LLM's internal knowledge.

In this study, we consider an approach using the Google Search API and the GPT- 4o model for qualitative and quantitative analysis of news through retrieval-augmented generation (RAG). This approach will be applied to the analysis of news about the 2024 US presidential election. The main goal is not to predict the 2024 US election outcomes or draw any political conclusions. Rather, we focus on extracting news information and analyzing it using the GPT model, with the potential for further application at subsequent levels of analytics.

Methodology

News data for the analysis was received from web resources in two steps:

1. Retrieiving URLs for relevant web resources using Google Search API. For this purpose, Python library Google API Clent was used.

2. Extracting information from web resources given their URLs. For this purpose SeleniumURLLoader from LangChain python library was used.

For searching relevant web resources, the search query ’Kamala Harris AND Donald Trump’ was used, along with search options to specify time periods and news resources. The following news resources were used for separate searches: ’Web sites’, ’The New York Times’, ’CNN’, ’The Washington Post’, ’Fox News’, ’NBC News’, ’Reuters’, ’ABC News’, ’Bloomberg’. Web resource ’Web sites’ refers to the top web sites for the search query without specifying a web source. We conducted searches for the following time periods: ’2024-08-01’–’2024-08-15’, ’2024-08-16’–’2024-08-31’, ’2024-09-01’–’2024-09-15’, ’2024-09-16’–’2024-09-30’, ’2024-10-01’–’2024-10-15’. The URLs of web resources were grouped by time periods and by sources. Then, data from the found URLs were extracted using SeleniumURLLoader. In tolal, 436 web resources were loaded. The data from each extracted resource were analyzed using Open AI with GPT-4o model. The text data for the analysis were included in a created prompt for the GPT-4o model, where we specified the options for the analysis, e.g. what and how to summarize, which points to highlight, what quantitative scores should be generated. We also specified the instructions for the output format which should be in JSON with an appropriate structure. The instruc- tions specified the generation of probability scores for candidates to be elected, sentiments scores, and descriptions for each candidate under consideration. Main narratives and key points for each candidates were also requested. In response to the OpenAI API, we will receive qualitative and quantitative analytical data generated by the GPT-4o model, grouped by time periods and web resources. We also created a prompt for the second level of the analysis using RAG approach. In this prompt, we specified the instructions for the qualitative analysis of data, generated by the GPT-4o model at the first level for separate analysis of each web resource. In the prompt instructions, we requested to summarize data grouped by time periods and web resources. Separately, we instructed the model to analyze qualitative trends and to generate sentiment scores.

Qualitative Results

The following are the results of the RAG approach for the qualitative analysis using GPT- 4o model grouped by time periods and web resources:

Results grouped by time periods

Dates: 2024-08-01 – 2024-08-15

Summary: This period in the 2024 US presidential election features Kamala Harris and Donald Trump as the primary candidates. Harris is lauded for her leadership and prevent- ing disruptive chants at rallies, while Trump draws controversy for labeling January 6 as a ’day of love’. Influencers include Bill de Blasio and Charlamagne Tha God, who shape narratives surrounding the candidates’ rhetoric and events such as Harris’s selection of Tim Walz as running mate gain traction. The rhetoric is polarized, with Harris focusing on justice and equality, and Trump prioritizing economic revival and anti-establishment themes. Harris’ probability score: 0.505. Trump’s probability score: 0.495. Harris’ positive sentiments: Presidential demeanor; rule-of-law respect; Democratic support. Trump’s positive sentiments: Strong base support; charismatic leadership. Harris’ negative sentiments: Policy inconsistencies; aggressive exchanges during protests. Trump’s negative sentiments: Controversial stances on January 6 and personal attacks. Harris’ cites: Democrats believe in the rule of law. Trump’s cites: January 6 was a day of love. Harris’ main narratives: Focus on legality and justice, projecting a presidential image. Trump’s main narratives: Maintains influence through controversial and populist rhetoric. Favorite candidate summary: Harris slightly favored due to broader voter appeal in key demographics.

Dates: 2024-08-16 – 2024-08-31

Summary: During this period, Kamala Harris and Donald Trump engage in competitive campaigning as the 2024 US presidential election approaches. Harris gains traction with demographic support although Trump earns enthusiasm from conservatives. Debate preparation highlights both candidates’ readiness to address economic issues and immigration. Key figures like Musk influence Trump’s strategy, while Harris focuses on moderate policies and union appeal. Swing states remain tightly contested. Harris’ probability score: 0.52. Trump’s probability score: 0.48. Harris’ positive sentiments: Commitment to economic reform, positive Democratic momentum. Trump’s positive sentiments: Strong base support, effective low-propensity voter reach. Harris’ negative sentiments: Economic and policy criticism from moderates. Trump’s negative sentiments: Controversies related to rhetoric and perceived instability. Harris’ cites: Commitments to family economic support. Trump’s cites: Past achievements, assurances from platforms like X. Harris’ main narratives: Focus on moderate governance and economic policies. Trump’s main narratives: Economic revival, populism, and social critiques. Favorite candidate summary: Harris slightly ahead due to focused moderate appeal and successful engagement with younger demographics.

Dates: 2024-09-01 – 2024-09-15

Summary: Kamala Harris inches ahead in the 2024 presidential race amid strong debate performances and strategic endgame moves. Endorsements from Taylor Swift and dis- cussions around reproductive rights bolster Harris’s appeal. Trump’s campaign gears up with core conservative outreach and critique of U.S. foreign policy. Swing state dynamics, particularly in the midwest, will heavily influence ultimate electoral outcomes. Although both candidates face narratives of change and continuity from their party bases, Harris capitalizes on broader social justice themes. Harris’ probability score: 0.525. Trump’s probability score: 0.475. Harris’ positive sentiments: Endorsements, growth in swing state appeal. Trump’s positive sentiments: Charisma, experienced campaign trail presence. Harris’ negative sentiments: Concerns over strategic plans and viability. Trump’s negative sentiments: Controversial scribe/remarks, undermining leadership image. Harris’ cites: Focusing on justice, unity, and social growth. Trump’s cites: Strong leadership, economic robustness. Harris’ main narratives: Social justice, inclusivity, governance foresight. Trump’s main narratives: Continuity of economic strength, sovereignty themes. Favorite candidate summary: Harris edges out due to strategic endorsements and key swing state focus.

Dates: 2024-09-16 – 2024-09-30

Summary: Harris and Trump maintain tight competition in the 2024 presidential race, engaging in strategic debates over issues such as immigration, economic policy, and foreign affairs. Harris sees growth in endorsements from party deferrers, while Trump’s base remains resilient amidst controversies and legal pressures. Both candidates are evenly matched across crucial battleground states, turning debates and media encounters into imperative campaign tools. Harris’ probability score: 0.5. Trump’s probability score: 0.5. Harris’ positive sentiments: Robust debate preparation and endorsements. Trump’s positive sentiments: Economic security focus and strong partisan support. Harris’ negative sentiments: Criticism about policy execution and border plans. Trump’s negative sentiments: Rhetoric issues, questions surrounding fitness. Harris’ cites: Highlighting democratic integrity. Trump’s cites: Promising economic recovery. Harris’ main narratives: Reforming economy, advancing social policies. Trump’s main narratives: Highlighting economic achievements and strong foreign policy. Favorite candidate summary: Neither candidate emerges as a clear frontrunner given equal campaign leveraging on major elector matters.

Dates: 2024-10-01 – 2024-10-15

Summary: As election day approaches, the race between Harris and Trump intensifies with both engaging in strategic public appearances and emphasizing contrasting economic and social narratives. Harris showcases transparency and democratic values, leading in some polls, while Trump retains solid support through economic promises and a forceful personal campaign style. The debate over transparency, health, and policy details remain focal in shaping public opinion. Harris’ probability score: 0.52. Trump’s probability score: 0.48. Harris’ positive sentiments: Transparency and bipartisan endorsements. Trump’s positive sentiments: Vigor in campaigning and policy emphasizing. Harris’ negative sentiments: Linked criticisms to ongoing policies. Trump’s negative sentiments: Transparency concerns and controversial statements. Harris’ cites: Protecting democracy and public welfare. Trump’s cites: Addressing national economy and security. Harris’ main narratives: Unity, social justice, and proactive governance. Trump’s main narratives: Economic revival and direct leadership. Favorite candidate summary: Harris slightly favored due to poll momentum and broader appeal across key demographics.

Results grouped by web resources

Web resource: Web sites

Summary: General resources depict the election battle between Harris and Trump as a competitive event with varying influences like media personalities and political endorsements. Issues ranging from economic policy to healthcare form core discussions around candidate capabilities. Harris receives affirmations from both expected and unexpected sectors, while Trump’s steadfast support highlights his enduring appeal in specific demographics. Harris’ positive sentiments: Dynamic leadership depiction with diverse endorsements. Trump’s positive sentiments: Strong voter engagement and loyal demographic reach. Harris’ negative sentiments: Concerns regarding issue clarity and youthful outreach. Trump’s negative sentiments: Divisive remarks and portrayal mishandling. Harris’ cites: Paramount in delivering justice and equity. Trump’s cites: Economic achievements and self-presentation. Harris’ main narratives: Adapting diversity, economic security, and social reform. Trump’s main narratives: Resilience through economic frames and direct policy.

Web resource: The New York Times

Summary: The New York Times discusses detailed insights involving political strategy, media effectiveness, and personalized voter outreach. Both candidates utilize diverse outreach platforms to strengthen campaign narratives, with public opinion swaying based on critical incident analysis (e.g., debate performance, endorsements). Harris focuses on recent political traction and narrative inclusivity, while Trump remains vital in reinforcing economic and governance issues crucial to his base. Harris’ positive sentiments: Increased visibility and endorsement, wider electoral appeal. Trump’s positive sentiments: Rich engagement in consistent themes of economic highlights. Harris’ negative sentiments: Intersection between historical administration critiques and new mandates. Trump’s negative sentiments: Challenges involving rhetoric and adherence to political narratives. Harris’ cites: Appealing narrative inclusions fortifying democratic structures. Trump’s cites: Strategizing around economic revival initiatives. Harris’ main narratives: Engagement and reinforcing inclusivity. Trump’s main narratives: Economic verification, school of resilience.

Web resource: CNN

Summary: CNN analyses and reports showcase a politically charged environment where Harris garners support from statements and debates while Trump relies on established rhetoric around stability and economic prioritization. Both engage in media-fueled con- tests of leadership validation and public engagement, presenting near equal electoral prospects at this stage. Harris’ positive sentiments: Democratic integrity advocacy, media appearance strength. Trump’s positive sentiments: Economic and authoritative dominance emphasized. Harris’ negative sentiments: Policy positioning evaluations and alternative handling. Trump’s negative sentiments: Detractors identifying inconsistent statements and past legacy scrutiny. Harris’ cites: Advocating societal reform and resolutions. Trump’s cites: Economics-centered dialogues. Harris’ main narratives: Pushing democratic restoration and societal reform. Trump’s main narratives: Centre of stabilizing economic interests.

Web resource: The Washington Post

Summary: The Washington Post reflects consistent campaign-centered dialogue with candidate differences ranging from economic management to border policy focus. Harris benefits from dual-party endorsements boosting profile themes, while Trump aggressively campaigned on economic security. Competitive terms revolve from strategic appeal to core contrasts in social values perception. Harris’ positive sentiments: Enrichment from growing cross-supportive endorsements. Trump’s positive sentiments: Strong baseline support and economic focal viewpoints. Harris’ negative sentiments: Performance critiques highlighting administration engagement. Trump’s negative sentiments: According perceptions of doubt primarily through rhetoric review. Harris’ cites: Successfully engaging in political collaborations. Trump’s cites: Projecting economic dichotomies or statements.

Harris’ main narratives: Strengthening bridges in diverse party coalition ethos. Trump’s main narratives: Economic endurance and stability themes.

Web resource: Fox News

Summary: Fox News navigates the political complexities with a focus on electoral system confidence and candidate speech engagement. Both Harris and Trump employ varied approaches across public appearances to address core policies and rally support. Harris drives forward on narratives depicting change while Trump sustains political traditionalism frameworks. Voter attention toward robust engagement and narrative reliability continues to dictate polling impacts. Harris’ positive sentiments: Effective foundational change presentation. Trump’s positive sentiments: Assurance in long-term economic stability traditions. Harris’ negative sentiments: Skepticism faced regarding existing establishment and institutional relations. Trump’s negative sentiments: Exaggerated position interpretations promoting dis- sent. Harris’ cites: Narrative growth around the infusion of change policy initiatives. Trump’s cites: Engagement with economic vitality longevity. Harris’ main narratives: Essential to broadening change policy promotions. Trump’s main narratives: Consistency themes across prior and ongoing economic im- plications.

Web resource: Reuters

Summary: Reuters’ insights highlight voter engagement meticulously carved through issue deep-dives including climate, economic revival with voter trust dependent on alignment shifts. Each candidate employs tailored rhetoric aimed at navigating pivotal swing outcomes. Public perception hinges on media impressions sustained by strategic and issue- focused campaign trail narratives. Harris’ positive sentiments: Resilient social justice and reform commitment visibility. Trump’s positive sentiments: Demographic connectivity in line with strong policy stances. Harris’ negative sentiments: Exploration of intense policy outcome requirements. Trump’s negative sentiments: Rhetoric stretched across controversial policy lines. Harris’ cites: Policy-oriented informational sharing. Trump’s cites: Slogans addressing core economic positions. Harris’ main narratives: Continued drive for societal reform and economic innovation. Trump’s main narratives: Thematic engagements emphasizing economic security bear- ings.

Web resource: NBC News

Summary: NBC News evaluates the electoral contest with perspectives on integrity, transparency, and state engagement vital for verifying contested campaign positions. Can- didates invoke strong public-facing narratives on a multitude always hitting core socio- political policies essential to voters. Trump echoes confidence and stability remarks while Harris advances contemporary remedies for challenges in the civic and social sphere. Harris’ positive sentiments: Enhanced remedy and reform advocacy. Trump’s positive sentiments: Core community confidence underpinning security measures. Harris’ negative sentiments: Position relations linked to previous institutional leadership boundaries. Trump’s negative sentiments: Adaptation pressures in processing flexibility regarding positional changes. Harris’ cites: Societal development and problem solution initiatives. Trump’s cites: Security interventions and core valuations. Harris’ main narratives: Formative change advocacy for the social pillar. Trump’s main narratives: Upholding stability and identifying supportive demographic concentrations.

Web resource: Bloomberg

Summary: Bloomberg’s in-depth campaign reporting focuses on strategic fundraising, key endorsements, and critical voter affinities in the Harris-Trump electoral face-off. Identifying newsmaker influence and economic priority in decisively impacting electoral bases is pivotal. Advanced socio-economic ties structured through clarification of strategic legacies define disrupt-or-support trajectories for both contenders. Harris’ positive sentiments: Policy-driven fundraising surge and voter envelopment. Trump’s positive sentiments: National economic assurance; fortified tenure affirmations. Harris’ negative sentiments: Complexity interactions within new and recurring endorsement structures. Trump’s negative sentiments: Long-standing alignment surfaces controversial resonances. Harris’ cites: Community-focused fundraising advantage narratives. Trump’s cites: Resilient economic outreach perspectives. Harris’ main narratives: Infrastructure-centered equity and innovative civic influence. Trump’s main narratives: Tenure continuity and traditional economic reform platforms.

Trend summary

Summary: Over time, the race between Kamala Harris and Donald Trump showcases fluctuating leads across demographic and battleground sectors, reflecting the substantial environmental shifts in campaign emphasis. Poll and voter sentiment analysis highlight uncertainties attributed to external influences and strategic decisions from each campaign. Harris’ trend summary: Harris trends positively with diverse faction gains, leveraging strategic endorsements and voter engagement techniques to reinforce democratic stances while grappling with policy continuity divergences. Trump’s trend summary: Trump slowly addressing economic strengths and stability concerns while confronting rhetoric critiques and incongruities, balancing core support with broader voter appeal. Harris’ main narratives: Advocating foundational social developments together with broad policy connection frameworks. Trump’ main narratives: Economic emphasis with prevailing regulations ensuring an enduring voter connection. Favorite candidate’s summary: In summation, current trends favor Harris slightly based on strategic voter alignment, dynamic rallying figures, and committed ideological rapport widening her electoral reach.

Quantitative Results

Distributions of quantitative scores

On the first level of RAG approach, GPT-4o model generated scores for positive and neg- ative sentiments as well as probability to be elected for each candidates. The distribution of sentiment quantitative scores in different time periods are presented by boxplots:


Article content
Figure 1: Candidates’ positive scores for time periods


Article content
Figure 2: Candidates’ negative scores for time periods

The similar distribution grouped by web resources are presented on the next Figures:


Article content
Figure 3: Harris’ positive sentiments scores in web resources


Article content
Figure 4: Trump’s positive sentiments scores in web resources


Article content
Figure 5: Harris’ negative sentiments scores in web resources


Article content
Figure 6: Trump’s negative sentiments scores in web resources

Probabilities of candidates to be elected are presented by boxplots on the next Figures:


Article content
Figure 7: Probability for Harris to be elected in different web resources


Article content
Figure 8: Probability for Trump to be elected in different web resources
Article content
Figure 9: Mean values for the probability of candidates to be elected in web resources

As one can see these probability practically equal, at the same time sentiment scores have more high volatility and are different for both candidates.

Bayesian regression for score trends

To analyze trends, we can use Bayesian regression. This approach allows us to receive a posterior distribution of model parameters by using conditional likelihood and prior distribution. The probabilistic approach makes it possible to receive the probability density function for the target variable. To analyze trends for sentiment scores, we used a Bayesian regression model as follows:

Article content

where α parameter describes the bias (intercept) of the trend, which can be treated as the initial score when t = 0 in the time period under consideration, β parameter describes the slope for upward or downward score trends. To solve Bayesian models, numerical Monte- Carlo methods are used. Gibbs and Hamiltonian sampling are popular methods for finding posterior distributions of the parameters in probabilistic models. Bayesian inference makes it possible to obtain probability density functions for model parameters and estimate the uncertainty which is important in risk assessment analytics. For Bayesian inference calculations, we used PyStan package for Stan platform for statistical modeling. For time independent variables, we used a list of indexes for time periods [0,1,2,3,4] which correspond to the following time periods:’2024-08-01’–’2024-08-15’, ’2024-08-16’– ’2024-08-31’, ’2024-09-01’–’2024-09-15’, ’2024-09-16’–’2024-09-30’, ’2024-10-01’–’2024-10- 15’. Boxplots for probability distributions of α and β parameters for sentiment trends of both candidates are shown in the next Figures:


Article content
Figure 10: Distribution for α parameter for candidates’ positive sentiment score trend


Article content
Figure 11: Distribution for β parameter for candidates’ positive sentiment score trend


Article content
Figure 12: Distribution for α parameter for candidates’ negative sentiment score trend


Article content
Figure 13: Distribution for β parameter for candidates’ negative sentiment score trend

The next Figures show the trends for different time periods and web resources. Points on the figures represent sentiment scores for specified time periods, lines connect the mean values of the sentiment scores, and the dashed line shows the mean values of the Bayesian regression for the points of specified time periods.


Article content
Figure 14: Positive sentiment score trends for web resources


Article content
Figure 15: Negative sentiment score trends for web resources

Conclusion

In this study, we consider the approach of using Google Search API and GPT-4o model for qualitative and quantitative analysis of news through retrieval-augmented generation (RAG). This approach was applied to analyse news about the 2024 U.S. presidential election process. Different news sources for different time periods were analyzed. Quantitative sentiment scores generated by the GPT model were analyzed using Bayesian regression to get trend lines. The distributions found for the regression parameters allow for the analysis an uncertainty in the election process. The approach does not aim to predict the election outcome, as it does not take into account the specific features of the U.S. election system, nor does it analyze the geospatial and state structure of quantitative scores. However, it can be used as part of a complex analytical approach. The results show that probabilities to be elected for both candidates are very similar, despite differences in their sentiment scores. One of the main goals of this study is to provide qualitative and quantitative analytical information to political news experts for further analysis. The obtained results demonstrate that using a GPT models for news analysis can yield informative qualitative and quantitative analytics, providing important insights which can be used in the next stages of presidential election process analytics.

Disclaimer

The approach, ideas, and results shared in this study are for academic purposes only and are not intended to inform real-world conclusions or recommendations.

PDF version



To view or add a comment, sign in

More articles by Bohdan Pavlyshenko

Insights from the community

Others also viewed

Explore topics