python - 从 Django 单元测试到 Postgres 的查询在保持活动模式时任意不执行
问题描述
我正在使用 Django ORM 开发 ETL,并且在从本地环境或 Jenkins 运行单元测试时,查询有时只是在数据库中保持活动状态而没有完成。
我并不总是能够重现这个问题。
我正在使用 Django 3.1.7,并且测试在 Django“TestCase”中运行,因此每个测试都在事务中运行。
我正在使用谷歌云托管的 Postgres (Postgres 12) 数据库服务器。
当我不在单元测试中运行 ETL 时,一切运行顺利。当单独运行生成的查询时,它工作得很好并且运行得非常快。
- 表的 models.py 定义:
class DiseaseTarget(models.Model):
id = models.AutoField(primary_key=True, db_index=True)
gene = models.ForeignKey(Gene, on_delete=models.CASCADE, related_name='disease_target_gene', db_index=True)
disease = models.ForeignKey(Disease, on_delete=models.CASCADE, related_name='disease_target_disease', db_index=True)
dev_stage = models.TextField()
class Meta:
managed = True
db_table = 'disease_target'
unique_together = (('gene', 'disease'),)
class Gene(models.Model):
entity = models.OneToOneField(Entity, on_delete=models.CASCADE, primary_key=True, related_name='genes', db_index=True)
symbol = models.TextField(null=True, db_index=True)
class Meta:
managed = True
db_table = 'gene'
class Entity(models.Model):
id = models.AutoField(primary_key=True, db_index=True)
entity_id = models.TextField(db_index=True)
TYPES = [
('cellnode', 'cell'),
('gene', 'gene'),
('geneset', 'geneset')
]
type = models.TextField(choices=TYPES, db_index=True)
name = models.TextField(default=entity_id)
description = models.TextField(blank=True, null=True)
class Meta:
managed = True
db_table = 'entity'
index_together = [
("id", "type"),
]
unique_together = (('entity_id', 'type'),)
- etl代码:
class EtlDiseaseTarget(AbstractDataMartEtl):
def __init__(self, database: str, data_source: str, nrows=None):
self.database = database
self.data_columns = ['disease_name', 'ENTREZID', 'dev_stage_category', 'disease_ontology']
self.data_source = data_source
self.data = self.extract(input_sep='\t', nrows=nrows)
self.transformed_data = {'disease_target': None}
def transform(self):
...
existing_entries = DiseaseTarget.objects.using(self.database).select_related('gene__entity').filter(gene__entity__entity_id__in=self.data.ENTREZID.unique())
...
- 生成的数据库查询:
SELECT "disease_target"."id", "disease_target"."gene_id", "disease_target"."disease_id", "disease_target"."dev_stage", "gene"."entity_id", "gene"."symbol", "entity"."id", "entity"."entity_id", "entity"."type", "entity"."name", "entity"."description", "entity"."created_at", "entity"."updated_at" FROM "disease_target" INNER JOIN "gene" ON ("disease_target"."gene_id" = "gene"."entity_id") INNER JOIN "entity" ON ("gene"."entity_id" = "entity"."id") WHERE "entity"."entity_id" IN ('7124', '1435', '1437', '3594', '3595', '51561', '149233', '3695', '8174', '3676')
- 单元测试代码:
class NoneGroundTruthTest(TestCase):
def test_disease_target(self):
print("\t===========test_disease_target=================")
test_disease_etl = EtlDiseaseTarget(data_source=self.disease_target_path,
database=self.database,
nrows=10)
test_disease_etl.transform()
test_disease_etl.load()
self.assertGreater(
len(list(DiseaseTarget.objects.using(self.database).all()[:10])), 0)
- 来自 settings.py 的数据库设置:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': os.getenv('db_name_ccm'),
'HOST': os.getenv('db_host_ccm'),
'PORT': os.getenv('POSTGRES_PORT'),
'USER': os.getenv('DB_USER'),
'PASSWORD': os.getenv('DB_SECRET_PASSWORD'),
},
}
我不确定如何解决这个问题,它确实干扰了我的开发过程。
解决方案
推荐阅读
- spring - 关联字段不能用于更新项的路径表达式 Spring Data JPA
- java - Intellij IDEA Kotlin - 如何在同一列中发表评论?
- excel - VBA 更新记录的子集
- javascript - 带有对话框的 FileSaver.js
- machine-learning - 协作过滤添加新用户和项目
- javascript - Ckeditor - 在“选择”字段中加载动态字段值
- c# - 使用 Partner Soap Api 在 Salesforce 中获取联系人的帐户信息
- c# - 如何在运行时更改补间动画的 endValueString?
- spring-boot-admin - spring boot 管理日志文件显示 e.headers ['content-range']
- angular - VSTS 轰炸 NPM 安装?