首页 > 解决方案 > 无法将编码的 jason 数据发送到 spark

问题描述

我想做的是从 csv 文件中读取数据并将其发送到服务器,这是我正在写的内容,但是在服务器中我什么也没得到,要么产生了错误,我得到的都是空白

def __init__(self):
    self.host = 'localhost'
    self.port = 12345

def SetUpConnection(self):
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind((self.host, self.port))
    s.listen(1)
    print('\nListening for a client at',self.host , self.port)
    self.conn, self.addr = s.accept()
    print('\nConnected by', self.addr)

def FireStream(self):  
        try:
            print('\nReading firedata...\n')
            with open('FireData-Part2.csv') as f:
                with open('ClimateData-Part2.csv') as c:
                    FireData_csv = csv.reader(f)
                    Fire_headers = next(FireData_csv)
                    ClimateData_csv = csv.reader(c)
                    Climate_headers = [x.strip() for x in next(ClimateData_csv)]                             # read from second line in csv file 

                    while(True):
                        every_5_fire = next_n_lines(f,5)
                        every_1_climate = next(c)
                        every_5_fire.append(every_1_climate)

                        row = {}
                        row['firedata']=[] 
                        for each in every_5_fire:
                            alist = each.split(',')
                            dataType = alist[0]     
                            if dataType == 'cdata':           
                                for column in range(len(alist)):
                                    row[Climate_headers[column]] = alist[column]           
                            if dataType == 'fdata':   
                                    firedata = {}
                                    for column2 in range(len(alist)): 
                                        firedata[Fire_headers[column2]]= alist[column2]               
                                    row['firedata'].append(firedata)
                        out = json.dumps(row)# encode it before sending   
                        self.conn.send(bytes(out,encoding = "utf8"))
                        print('Sending line',out)
                        sleep(1)  # ensure every 1 second 5 of firedata is generated 
                    print('End Of Stream fire.')
        except socket.error:
            print ('Error Occured.\n\nClient disconnected.\n')

服务器代码只是像这样打印出来:

sc = SparkContext.getOrCreate()

if (sc is None):
    sc = SparkContext(appName="MongoDBApp")
ssc = StreamingContext(sc, 5)

host = "localhost"
port = 12345

lines = ssc.socketTextStream(host, int(port))

# output = lines.map(lambda s: json.loads(s))

#lines.foreachRDD(lambda rdd: rdd.foreachPartition(sendRecord2))

lines.pprint()

ssc.start()

即使我尝试了 json.loads 也无法正常工作,这太奇怪了,任何人都可以提供帮助将非常感激!当我尝试一整晚时,我很沮丧

jason 的样本数据如下所示:

{"firedata": [{"DataType": "fdata", "Latitude": "-35.541", "Longitude": "143.311", "Surface Temperature (kelvin)": "336.3", "Power": "62", "Confidence": "82", "Surface Temperature (Celcius)": "63"}, {"DataType": "fdata", "Latitude": "-35.554", "Longitude": "143.307", "Surface Temperature (kelvin)": "326.8", "Power": "23.8", "Confidence": "67", "Surface Temperature (Celcius)": "53"}, {"DataType": "fdata", "Latitude": "-35.543", "Longitude": "143.316", "Surface Temperature (kelvin)": "340.4", "Power": "84.2", "Confidence": "86", "Surface Temperature (Celcius)": "67"}, {"DataType": "fdata", "Latitude": "-37.708", "Longitude": "145.1", "Surface Temperature (kelvin)": "327.8", "Power": "16.2", "Confidence": "80", "Surface Temperature (Celcius)": "54"}, {"DataType": "fdata", "Latitude": "-35.646", "Longitude": "142.282", "Surface Temperature (kelvin)": "305.6", "Power": "11.8", "Confidence": "65", "Surface Temperature (Celcius)": "32"}], "DataType": "cdata", "Station": "948700", "Air Temperature(Celcius)": "19", "Relative Humidity": "56.8", "WindSpeed  (knots)": "7.9", "Max Wind Speed": "11.1", "MAX": "   72.0*", "MIN": "  61.9*", "Precipitation": " 0.00I\n"}

我没有看到我的 json 数据有任何错误

标签: jsonapache-sparkpysparkencodeserversocket

解决方案


太奇怪了,这个问题是通过在发送之前在json字符串数据中添加“\n”来解决的。基本上,它看起来像这样:

out = json.dumps(row) + "\n" 
self.conn.send(bytes(out,encoding = "utf8"))

我花了几天时间弄清楚,但我完全不知道为什么会这样,任何人都可以提供一些线索吗?


推荐阅读