openstack - Rocky ceilometer memory mteric add memory.usage 数据采集粒度不准确

问题描述

这是我参考的文件

配置/etc/ceilometer/pipeline.yaml，添加如下

sources:
    - name: memory_util_source
      meters:
          - "memory"
          - "memory.usage"
      sinks:
          - memory_util_sink
sinks:
    - name: memory_util_sink
      transformers:
          - name: "arithmetic"
            parameters:
                target:
                    name: "memory.usage"
                    unit: "%"
                    type: "gauge"
                    expr: "100 * $(memory.usage) / $(memory)"
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-low

gnocchi 归档策略展示 ceilometer-low

+---------------------+------------------------------------------------------------------+
| Field               | Value                                                            |
+---------------------+------------------------------------------------------------------+
| aggregation_methods | max, min, mean                                                   |
| back_window         | 0                                                                |
| definition          | - points: 8640, granularity: 0:05:00, timespan: 30 days, 0:00:00 |
| name                | ceilometer-low                                                   |
+---------------------+------------------------------------------------------------------+

Gnocchi 资源内存使用指标测量间隔粒度是每小时。每五分钟只有一个数据粒度，为什么会出现这种奇怪的现象。

标签： openstackceilometer

我尝试了一种变通方法来获取实例的内存利用率，步骤如下。

（1）在/ceilometer/compute/pollsters/instance_stats.py文件中添加如下代码。

class MemoryUtilPollster(InstanceStatsPollster):
    sample_name = 'memory_util'
    sample_unit = '%'
    sample_stats_key = 'memory_util'

（2）修改/ceilometer/compute/virt/libvirt/inspector.py文件中计算实例内存使用的逻辑代码

class LibvirtInspector(virt_inspector.Inspector):

    def inspect_instance(self, instance, duration=None):
        domain = self._get_domain_not_shut_off_or_raise(instance)
    
        memory_used = memory_resident = None
        memory_swap_in = memory_swap_out = None
        memory_stats = domain.memoryStats()
    
        # Stat provided from libvirt is in KB, converting it to MB.
        if 'usable' in memory_stats and 'available' in memory_stats:
            memory_used = (memory_stats['available'] -
                           memory_stats['usable']) / units.Ki
        elif 'available' in memory_stats and 'unused' in memory_stats:
            memory_used = (memory_stats['available'] -
                           memory_stats['unused']) / units.Ki
        if 'rss' in memory_stats:
            memory_resident = memory_stats['rss'] / units.Ki
        if 'swap_in' in memory_stats and 'swap_out' in memory_stats:
            memory_swap_in = memory_stats['swap_in'] / units.Ki
            memory_swap_out = memory_stats['swap_out'] / units.Ki
    
        # Tristack: add memory_util
        memory_total = memory_stats['available'] / units.Ki
        memory_util = int(100 * memory_used / memory_total)
    
        # TODO(sileht): stats also have the disk/vnic info
        # we could use that instead of the old method for Queen
        stats = self.connection.domainListGetStats([domain], 0)[0][1]
        cpu_time = 0
        current_cpus = stats.get('vcpu.current')
        # Iterate over the maximum number of CPUs here, and count the
        # actual number encountered, since the vcpu.x structure can
        # have holes according to
        # https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/libvirt-domain.c
        # virConnectGetAllDomainStats()
        for vcpu in six.moves.range(stats.get('vcpu.maximum', 0)):
            try:
                cpu_time += (stats.get('vcpu.%s.time' % vcpu) +
                             stats.get('vcpu.%s.wait' % vcpu))
                current_cpus -= 1
            except TypeError:
                # pass here, if there are too many holes, the cpu count will
                # not match, so don't need special error handling.
                pass
    
        if current_cpus:
            # There wasn't enough data, so fall back
            cpu_time = stats.get('cpu.time')
    
        return virt_inspector.InstanceStats(
            cpu_number=stats.get('vcpu.current'),
            cpu_time=cpu_time,
            # Tristack: add memory_util
            memory_util=memory_util,
            memory_usage=memory_used,
            memory_resident=memory_resident,
            memory_swap_in=memory_swap_in,
            memory_swap_out=memory_swap_out,
            cpu_cycles=stats.get("perf.cpu_cycles"),
            instructions=stats.get("perf.instructions"),
            cache_references=stats.get("perf.cache_references"),
            cache_misses=stats.get("perf.cache_misses"),
            memory_bandwidth_total=stats.get("perf.mbmt"),
            memory_bandwidth_local=stats.get("perf.mbml"),
            cpu_l3_cache_usage=stats.get("perf.cmt"),
        )

（3）在/ceilometer/compute/virt/inspector.py文件中添加InstanceStats对象的memory_util属性

class InstanceStats(object):
    fields = [
        'cpu_number',              # number: number of CPUs
        'cpu_time',                # time: cumulative CPU time
        'cpu_util',                # util: CPU utilization in percentage
        'cpu_l3_cache_usage',      # cachesize: Amount of CPU L3 cache used
        'memory_util',             # Tristack: add memory_util
        'memory_usage',            # usage: Amount of memory used
        'memory_resident',         #
        'memory_swap_in',          # memory swap in
        'memory_swap_out',         # memory swap out
        'memory_bandwidth_total',  # total: total system bandwidth from one
                                   #   level of cache
        'memory_bandwidth_local',  # local: bandwidth of memory traffic for a
                                   #   memory controller
        'cpu_cycles',              # cpu_cycles: the number of cpu cycles one
                                   #   instruction needs
        'instructions',            # instructions: the count of instructions
        'cache_references',        # cache_references: the count of cache hits
        'cache_misses',            # cache_misses: the count of caches misses
    ]

    def __init__(self, **kwargs):
        for k in self.fields:
            setattr(self, k, kwargs.pop(k, None))
        if kwargs:
            raise AttributeError(
                "'InstanceStats' object has no attributes '%s'" % kwargs)

（4）在setup.cfg文件中添加ceilometer.poll.compute下的memory_util插件

ceilometer.poll.compute =
    memory_util = ceilometer.compute.pollsters.instance_stats:MemoryUtilPollster

（5）打包编译安装ceilometer，打包编译过程参考链接如下。openstack-ceilometer-11.0.1-1.el7.src.rpm 可以在这里找到

# groupadd mockbuild
# useradd mockbuild -g mockbuild
# rpm -ivh openstack-ceilometer-11.0.1-1.el7.src.rpm 
After the installation is complete, the rpm build project is automatically deployed in
/root/rpmbuild/SPECS
/root/rpmbuild/SOURCES

cd /root/rpmbuild/SPECS
rpmbuild -bb openstack-ceilometer.spec

The ceilometer rpm package is in the /root/rpmbuild/RPMS directory, install these packages.

（6）在/etc/ceilometer/gnocchi_resources.yaml文件中添加如下配置。

resources:
  - resource_type: instance
      # Tristack: add memory_util
      memory_util:

（7）在/etc/ceilometer/polling.yaml文件中添加如下配置。

sources:
    - name: some_pollsters
      interval: 300
      meters:
        # Tristack: add memory_util
        - memory_util

（8）在/etc/ceilometer/pipeline.yaml文件中添加如下配置。

sources:
    # Tristack: add memory_util
    - name: memory_util_source
      meters:
          - "memory_util"
      sinks:
          - memory_util_sink
sinks:
    # Tristack: add memory_util
    - name: memory_util_sink
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-low

（9）最后重启openstack-ceilometer-compute服务

openstack - Rocky ceilometer memory mteric add memory.usage 数据采集粒度不准确

问题描述

解决方案

推荐阅读