python - 如何在python中加入两列,而一列每行都有一个url,另一列有url最后一部分的列表
问题描述
我有两列:
一个看起来像:
"Link": "https://url.com/item?variant=",
"Link": "https://url2.com/item?variant=",
"Link": "https://url3.com/item?variant=",
第二个看起来像:
"link extension": ["1","2"],
"link extension": ["1","2"],
"link extension": ["1","1","3"],
我想要做的是将它们组合在一起,以便我的链接列看起来像这样:
"Link": "https://url.com/item?variant=1"
"Link": "https://url.com/item?variant=2"
"Link": "https://url2.com/item?variant=1"
"Link": "https://url2.com/item?variant=2"
"Link": "https://url3.com/item?variant=1"
"Link": "https://url3.com/item?variant=2"
"Link": "https://url3.com/item?variant=3"
然而,我是 Python 的初学者——甚至是 Pandas 的初级水平。我试图找到答案,我遇到了 map/append 选项,但它们似乎都没有抛出不同的 TypeError
任何关于阅读内容/阅读地点的帮助或建议都会非常有帮助。
先感谢您。
这是我的基本代码:
def parse(self, response):
items = response.xpath("//*[@id='bc-sf-filter-products']/div")
for item in items:
link = item.xpath(".//div[@class='figcaption product--card--text under text-center']/a/@href").get()
yield response.follow(url=link, callback=self.parse_item)
def parse_item(self, response):
Title = response.xpath(".//div[@class='hide-on-mobile']/div[@class='productTitle']/text()").get()
Item_Link = response.url
n_item_link = f"{Item_Link}?variant="
idre = r'("id":\d*)' #defining regex
id = response.xpath("//script[@id='ProductJson-product']/text()").re(idre) #applying regex
id1 = [item.replace('"id":', '') for item in id] #cleaning list of url-ids
id2 = id1[1:] #dropping first item
test = n_item_link.append(id2) # doesn't work
test2 = n_item_link.str.cat(id2) # doesn't work either
yield{
'test':test,
'test2':test2
}
解决方案
# recreating the DataFrame
df = pd.DataFrame({
"link": ["https://url.com/item?variant=",
"https://url2.com/item?variant=",
"https://url3.com/item?variant="],
"variants" : [["1","2"],
["1","2"],
["1","1","3"]]
}
)
#creating a new column containg the lenght of each list
df["len_list"] = [len(x) for x in df["variants"].to_list()]
# creating a list of all values in df.variants and converting values to string
flat_list_variants = [str(item) for sublist in df["variants"].to_list() for item in sublist]
# creating a new DataFrame which contains each index replicated by the lenght of df["len_list"]
df_new = df.loc[df.index.repeat(df.len_list)]
# assign the list to a new column
df_new["flat_variants"] = flat_list_variants
#compose the result by sum strings
df_new["results"] = df_new["link"] + df_new["flat_variants"]
推荐阅读
- ip - 主机部分不为零的子网
- firebase - 错误:向 Firebase 发送 ios 推送时出现 InvalidApnsCredential,添加了身份验证密钥和证书
- javascript - Wordpress Appyn 主题加载时间问题
- scala - 将单行文件读入数据框
- flutter - 如何在颤动中为滚动条提供渐变颜色?
- java - 如何在 kotlin 中为测试创建 ApplicationContext?
- oracle - 将布尔列添加到 oracle 数据库的现有表中
- reactjs - 无法在 nextjs 中传递 slug 值 getStaticpath 方法
- angular - 从 @angular/google-maps 获取 google.maps.Map 实例
- hive - 在配置单元中选择和插入时出错 - KryoException