首页 > 解决方案 > 在特定列的第二行中插入一个空白,python

问题描述

希望在特定列中添加空白,在本例中为列 1 行 2。

图片链接

上图显示了 2 列,在第二列中我需要插入一个空白单元格,即将第 1 列中的三行(带文本)向下推。

我想做的是以下内容:

我正在抓取的网址中有图像,除了一个图像之外,所有图像都有一个图形标题(第一个)。

'figure.article__main-hero article__main-hero-image' 是没有图形标题的图形。我的想法是刮掉这个并返回一个空白,确实如此。问题是如何将该空白插入第二行第 1 列?此时,cap_meta 列表从第二行开始插入第 1 列,需要从第三行开始。发生这种情况时,错误的标题与图像相关联。另一种说法是,如果没有与图形关联的图形标题,则插入空白,如果有,则插入图形标题。

我在这里使用熊猫,但不必如此。

url = 'https://www.homestolove.com.au/resort-style-home-three-birds renovations-22500'
driver.get(url)

url_content = requests.get(url).content.decode('utf-8')
matches_img_url = re.findall(r'img .*?srcset="(.*?)"', url_content)

main_image = []
cap_meta = []
credit = []

for j in driver.find_elements_by_css_selector('figure.article__main- hero article__main-hero-image'):
    main_image.append(j.text)

for i in driver.find_elements_by_xpath('.//figcaption[@class = "content-body__inline-image-caption"]'):
    cap_meta.append(i.text)

for k in driver.find_elements_by_xpath('.//span[@class = "content body__inline-image-credit"]'):
    credit.append(k.text)

#rows = zip(matches_img_url, main_image, cap_meta)

matches_img_df = pd.DataFrame(matches_img_url)
metadata_df = pd.DataFrame(cap_meta)
credit_df = pd.DataFrame(credit)
main_image_df = pd.DataFrame(main_image)

scrapped_urls = pd.concat([matches_img_df, main_image_df, metadata_df, credit_df], ignore_index=True, axis=1)
scrapped_urls.to_csv('scrapped_urls_test2.csv', mode='a', index=False)
 

****************编辑 *******************

url = 'https://www.homestolove.com.au/resort-style-home-three-birds 
renovations-22500'
driver.get(url)

url_content = requests.get(url).content.decode('utf-8')
matches_img_url = re.findall(r'img .*?srcset="(.*?)"', url_content)

data = [{"url": url} for url in matches_img_url]

caption_xpath = './/figcaption[@class="content-body__inline-image- caption"]'
for idx, caption in enumerate(driver.find_elements_by_xpath(caption_xpath)):
    data[idx]["caption"] = caption.text or ""

credit_xpath = './/span[@class = "context body__inline-image-credit"]'
for idx, credit in enumerate(driver.find_elements_by_xpath(credit_xpath)):
    data[idx]["credit"] = credit.text or ""

image_css = "div.article__hero-container"
for idx, hero_image in enumerate(driver.find_elements_by_css_selector(image_css)):
    data[idx]["caption"] = hero_image.text

df = pd.DataFrame(data)
df.to_csv('scrapped_urls_test2.csv', mode='a', index=False)code here

标签: pythonpandasselenium

解决方案


您应该能够使用 Python 的鸭式打字来完成此操作。

url = 'https://www.homestolove.com.au/resort-style-home-three-birds renovations-22500'

url_content = requests.get(url).content.decode('utf-8')
matches_img_url = re.findall(r'img .*?srcset="(.*?)"', url_content)

data = [{"url": url} for url in matches_img_url]

image_css = "figure.article__main-hero article__main-hero-image"
for idx, image in enumerate(driver.find_elements_by_css_selector(image_css)):
    data[idx]["image"] =  image.text

caption_xpath = './/figcaption[@class="content-body__inline-image-caption"]'
for idx, caption in enumerate(driver.find_elements_by_xpath(caption_xpath)):
    data[idx]["caption"] = caption.text or ""

credit_xpath = './/span[@class = "context body__inline-image-credit"]'
for idx, credit in driver.find_elements_by_xpath(credit_xpath):
    data[idx]["credit"] = credit.text or ""

df = pd.DataFrame(data)
df.to_csv('scrapped_urls_test2.csv', mode='a', index=False)

此解决方案的一个关键警告是item.text 必须将其定义为属性。如果它是未定义的,那么这将失败。(如果这是一个需要考虑的极端情况,还有另一种方法可以实现这一点。)

此解决方案依赖于这样一个事实,即许多空数据类型将评估为 Python 中的Noneor的等价物False,这可以被认为是 的“较低优先级” "",因此 Python 将根据需要将相应的变量设置""为 。


推荐阅读