python - How to drop rows of a pyspark dataframe if they're in another dataframe based on the values from two columns?
问题描述
I have two dataframe, one with user and item columns, and another with all user item pairs and their scores.
user| item
and user | item | item2 | rating2 | score
I want to remove all the rows from the second table where the user and item appear in the first dataframe. I can't use subtract since they aren't the same number of columns?
Is this something that could be accomplished with an anti join?
解决方案
df2.join(df1, on=['user', 'item'], how="left_anti")
推荐阅读
- css - sass 嵌套不像我想象的那样工作
- json - 如何使用 gcloud 命令仅从 json 中提取名称字段?
- maxima - 如何获得最大值以有效地简化日志
- yocto - Yocto:如何根据两个选项值确定 SRC_URI 值
- python - 我应该如何检查我的函数中的参数是整数还是浮点数?
- php - 样式化 Google Api GET 请求
- sql-server - 无法访问 dockerized mssql 服务器
- javascript - 有条件地渲染材质复选框
- jupyter-notebook - 在调整 CatBoost 超参数时遇到问题
- octobercms - 10 月通过表单 HTML 将文件上传到 CMS 时出现问题