首页 > 解决方案 > 从 Django 单元测试到 Postgres 的查询在保持活动模式时任意不执行

问题描述

我正在使用 Django ORM 开发 ETL,并且在从本地环境或 Jenkins 运行单元测试时,查询有时只是在数据库中保持活动状态而没有完成。

我并不总是能够重现这个问题。

我正在使用 Django 3.1.7,并且测试在 Django“TestCase”中运行,因此每个测试都在事务中运行。
我正在使用谷歌云托管的 Postgres (Postgres 12) 数据库服务器。
当我不在单元测试中运行 ETL 时,一切运行顺利。当单独运行生成的查询时,它工作得很好并且运行得非常快。

class DiseaseTarget(models.Model):
    id = models.AutoField(primary_key=True, db_index=True)
    gene = models.ForeignKey(Gene, on_delete=models.CASCADE, related_name='disease_target_gene', db_index=True)
    disease = models.ForeignKey(Disease, on_delete=models.CASCADE, related_name='disease_target_disease', db_index=True)
    dev_stage = models.TextField()

    class Meta:
        managed = True
        db_table = 'disease_target'
        unique_together = (('gene', 'disease'),)

class Gene(models.Model):
    entity = models.OneToOneField(Entity, on_delete=models.CASCADE, primary_key=True, related_name='genes', db_index=True)
    symbol = models.TextField(null=True, db_index=True)

    class Meta:
        managed = True
        db_table = 'gene'

class Entity(models.Model):
    id = models.AutoField(primary_key=True, db_index=True)
    entity_id = models.TextField(db_index=True)
    TYPES = [
        ('cellnode', 'cell'),
        ('gene', 'gene'),
        ('geneset', 'geneset')
    ]
    type = models.TextField(choices=TYPES, db_index=True)
    name = models.TextField(default=entity_id)
    description = models.TextField(blank=True, null=True)


    class Meta:
        managed = True
        db_table = 'entity'
        index_together = [
            ("id", "type"),
        ]
        unique_together = (('entity_id', 'type'),)
class EtlDiseaseTarget(AbstractDataMartEtl):
    def __init__(self, database: str, data_source: str, nrows=None):
        self.database = database
        self.data_columns = ['disease_name', 'ENTREZID', 'dev_stage_category', 'disease_ontology']
        self.data_source = data_source
        self.data = self.extract(input_sep='\t', nrows=nrows)
        self.transformed_data = {'disease_target': None}

    def transform(self):
        ...
        existing_entries = DiseaseTarget.objects.using(self.database).select_related('gene__entity').filter(gene__entity__entity_id__in=self.data.ENTREZID.unique())
        ...
    SELECT "disease_target"."id", "disease_target"."gene_id", "disease_target"."disease_id", "disease_target"."dev_stage", "gene"."entity_id", "gene"."symbol", "entity"."id", "entity"."entity_id", "entity"."type", "entity"."name", "entity"."description", "entity"."created_at", "entity"."updated_at" FROM "disease_target" INNER JOIN "gene" ON ("disease_target"."gene_id" = "gene"."entity_id") INNER JOIN "entity" ON ("gene"."entity_id" = "entity"."id") WHERE "entity"."entity_id" IN ('7124', '1435', '1437', '3594', '3595', '51561', '149233', '3695', '8174', '3676') 
class NoneGroundTruthTest(TestCase):
    def test_disease_target(self):
        print("\t===========test_disease_target=================")
        test_disease_etl = EtlDiseaseTarget(data_source=self.disease_target_path,
                                            database=self.database,
                                            nrows=10)
        test_disease_etl.transform()
        test_disease_etl.load()
        self.assertGreater(
           len(list(DiseaseTarget.objects.using(self.database).all()[:10])), 0)
 DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': os.getenv('db_name_ccm'),
        'HOST': os.getenv('db_host_ccm'),
        'PORT': os.getenv('POSTGRES_PORT'),
        'USER': os.getenv('DB_USER'),
        'PASSWORD': os.getenv('DB_SECRET_PASSWORD'),
    },
}

我不确定如何解决这个问题,它确实干扰了我的开发过程。

标签: pythondjangopostgresqlunit-testingetl

解决方案


推荐阅读