首页 > 解决方案 > 如何从数据集中的列中获取特定整数,并将每个整数分成一个变量

问题描述

在下面的代码中,我得到了某些新闻标题的极性。然后将极性添加到预测的股票价格中以使其更准确。目前,我被困在如何从数据集中获取某个整数。

这是我下面的代码:# 导入库

import math
import pandas_datareader as web
import numpy as np
import pandas as pd
import sys
import os
import datetime
from datetime import date
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
from textblob import TextBlob
import matplotlib.pyplot as plt
import csv
plt.style.use('fivethirtyeight')

#Write a csv file with all the news
with open('AppleNewsSentiment6.csv', 'w', newline='') as f:
  writer = csv.writer(f)
  writer.writerow(['Date', 'Link', 'Headline', 'Description'])
  writer.writerow(['04/28/2021', 'https://www.apple.com/newsroom/2021/04/apple-introduces-airtag/', 'Apple is good', 'Apple is good'])
  writer.writerow(['04/29/2021', 'https://www.apple.com/newsroom/2021/04/imac-features-all-new-design-in-vibrant-colors-m1-chip-and-45k-retina-display/', 'iMac features all-new design in vibrant colors, M1 chip, and 4.5K Retina display', 'New Imac'])
  writer.writerow(['04/29/2021', 'https://www.apple.com/newsroom/2021/04/apple-commits-430-billion-in-us-investments-over-five-years/', 'Apple commits $430 billion in US investments over five years', 'Apple invested 430 billion into the USA'])
  writer.writerow(['04/29/2021', 'https://www.nbcnews.com/shopping/tech-gadgets/apple-new-2021-products-n1265706', 'Apple unveils new 2021 products: iPad Pro, AirTag and more', 'Apple has new products'])
  writer.writerow(['04/30/2021', 'https://www.npr.org/2021/04/26/990943261/apple-rolls-out-major-new-privacy-protections-for-iphones-and-ipads', 'Apple is horrible', 'Apple is not improving privacy'])
  writer.writerow(['04/30/2021', 'https://www.npr.org/2021/04/26/990943261/apple-rolls-out-major-new-privacy-protections-for-iphones-and-ipads', 'Apple is Bad', 'Apple is bad'])

#Load the data
from google.colab import files
files.upload()

#Store the data
dataF = pd.read_csv('AppleNewsSentiment6.csv')
#Set the date as the index
dataF = dataF.set_index(pd.DatetimeIndex(dataF['Date'].values))
#Show the data
dataF

#Create a function to get the polarity
def getPolarity(text):
  return TextBlob(text).sentiment.polarity

#Add 1 column to the dataset
dataF['Polarity'] = dataF['Headline'].apply(getPolarity)

#Show the data set with the new column
dataF

#Create a function to compute negitive, neutral, and positive sentiments
def getSentiment(score):
  if score < 0:
    return 'Negitive' 
  elif score == 0:
    return 'Neutral'
  else:
    return 'Positive'

#Create a new column with these values
dataF['Sentiment'] = dataF['Polarity'].apply(getSentiment)

#Show the data
dataF

#Plot and visualise the sentiment count
plt.title('Sentiment Analysis')
dataF['Sentiment'].value_counts().plot(kind='bar')
plt.xlabel('Sentiments')
plt.ylabel('Counts')
plt.show()

#Plot the sum of the polarity for each date
plt.figure(figsize=(12.33, 4.5))
plt.title('Sentiment sum/time')
polarity = dataF.groupby(['Date']).sum()['Polarity']
plt.plot(polarity.index, polarity)

#Show the sum of the polarity for each date
polarity

#Get the count of the articles
polarity_count = dataF.groupby(['Date']).count()['Polarity']
#Show the data
polarity_count

#Show the average sentiment for each day
polarity_avg = polarity / polarity_count
polarity_avg

#Plot the average sentiment over time
plt.figure(figsize=(13, 4.5))
plt.plot(polarity_avg.index, polarity_avg)

我有更多代码来预测任何股票的价格,但我只需要知道如何从数据集中获取值。

这是下面的数据集:

Date
04/28/2021    **0.700000**
04/29/2021    **0.161616**
04/30/2021   **-0.850000**
Name: Polarity, dtype: float64

我想获取被“星”包围的每个整数,并设置一个等于每个整数的变量。有没有办法做到这一点?谢谢你。

更新:我尝试过正则表达式;他们不工作。

标签: pythonpredictstock

解决方案


推荐阅读