首页 > 解决方案 > 如何在python中加入两列,而一列每行都有一个url,另一列有url最后一部分的列表

问题描述

我有两列:

一个看起来像:

"Link": "https://url.com/item?variant=",
"Link": "https://url2.com/item?variant=",
"Link": "https://url3.com/item?variant=",

第二个看起来像:

"link extension": ["1","2"],
"link extension": ["1","2"],
"link extension": ["1","1","3"],

我想要做的是将它们组合在一起,以便我的链接列看起来像这样:

"Link": "https://url.com/item?variant=1"
"Link": "https://url.com/item?variant=2"
"Link": "https://url2.com/item?variant=1"
"Link": "https://url2.com/item?variant=2"
"Link": "https://url3.com/item?variant=1"
"Link": "https://url3.com/item?variant=2"
"Link": "https://url3.com/item?variant=3"

然而,我是 Python 的初学者——甚至是 Pandas 的初级水平。我试图找到答案,我遇到了 map/append 选项,但它们似乎都没有抛出不同的 TypeError

任何关于阅读内容/阅读地点的帮助或建议都会非常有帮助。

先感谢您。

这是我的基本代码:

def parse(self, response):

    items = response.xpath("//*[@id='bc-sf-filter-products']/div")
    for item in items:
        link = item.xpath(".//div[@class='figcaption product--card--text under text-center']/a/@href").get()
        yield response.follow(url=link, callback=self.parse_item)

def parse_item(self, response):
    Title = response.xpath(".//div[@class='hide-on-mobile']/div[@class='productTitle']/text()").get()
    Item_Link = response.url
    n_item_link = f"{Item_Link}?variant="

    idre = r'("id":\d*)' #defining regex
    id = response.xpath("//script[@id='ProductJson-product']/text()").re(idre) #applying regex
    id1 = [item.replace('"id":', '') for item in id] #cleaning list of url-ids
    id2 = id1[1:] #dropping first item

    test = n_item_link.append(id2) # doesn't work
    test2 = n_item_link.str.cat(id2) # doesn't work either

    yield{
        'test':test,
        'test2':test2
    }

标签: python

解决方案


# recreating the DataFrame

df = pd.DataFrame({
    "link": ["https://url.com/item?variant=",
             "https://url2.com/item?variant=",
             "https://url3.com/item?variant="],
    "variants" : [["1","2"],
                  ["1","2"],
                  ["1","1","3"]]
                   }
                  )

#creating a new column containg the lenght of each list
df["len_list"] = [len(x) for x in df["variants"].to_list()]
# creating a list of all values in df.variants and converting values to string 
flat_list_variants = [str(item) for sublist in  df["variants"].to_list() for item in sublist]

# creating a new DataFrame which contains each index replicated by the lenght of df["len_list"]
df_new = df.loc[df.index.repeat(df.len_list)]
# assign the list to a new column
df_new["flat_variants"] = flat_list_variants
#compose the result by sum strings
df_new["results"] = df_new["link"] + df_new["flat_variants"]

推荐阅读