Skip to main content
Version: NG-2.15

Apache HDFS

Introduction

Apache Hadoop Distributed File System (HDFS) is the primary storage system for the Hadoop ecosystem, designed to store large volumes of data across many machines. It is a distributed file system that provides high throughput access to application data, ensuring scalability and fault tolerance.

Getting Started

Compatibility

vuSmartMaps supports monitoring of Apache HDFS and is tested with version 3.x

Data Collection Method

vuSmartMaps collects Apache HDFS data using an internal data collector. This agent collects data based on the source configuration.

Prerequisites

Inputs for Configuring Data Source

  • Host: The IP Address/FQDN of the linux server. This field is the key to identify each server you add here.
  • Port: Provide the JMX port details
  • interval: Provide the data collection interval
  • DataNode Details: Please provide the DataNode details
  • DataNode IP Address: IP address needs to be a valid IP Address.
  • DataNode Port: Enter a valid port

Firewall Requirement

To collect data from this O11ySource, ensure the following ports are opened:

Source IPDestination IPDestination PortProtocolDirection
vuSmartMaps IPApache HDFS Endpoint8778, 8779TCPOutbound

*Before providing the firewall requirements, please update the port based on the customer environment.

Configuring the Target

We need to add the below two lines in hdfs-env.sh and restart Apache HDFS:

export HDFS_NAMENODE_OPTS="-Dcom.sun.management.jmxremote=true \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.port=8004 \
-javaagent:/usr/share/java/jolokia-jvm-1.6.2-agent.jar=port=8778,host=0.0.0.0,id=namenode"

export HDFS_DATANODE_OPTS="-Dcom.sun.management.jmxremote=true \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.port=8008 \
-javaagent:/usr/share/java/jolokia-jvm-1.6.2-agent.jar=port=8779,host=0.0.0.0,id=datanode"

Configuration Steps

  • Enable the O11ySource.
  • Select the Sources tab and press the + button to add a new Apache HDFS to be monitored.
  • Populate all the configurations.
  • Click on Save to create the instance.

Metrics Collected

NameDescriptionData Type
nameName of the hostString
hostIP address of the serverLowCardinality(String)
@timestampTimestamp when the metric was collectedString
timestampDetailed timestamp with millisecondsDateTime64
hdfs_namenode_blocks_totalCurrent number of allocated blocks in the systemUInt64
hdfs_namenode_capacity_remainingCurrent remaining capacity in bytesUInt64
hdfs_namenode_capacity_totalCurrent raw capacity of DataNodes in bytesUInt64
hdfs_namenode_capacity_usedCurrent used capacity across all DataNodes in bytesUInt64
hdfs_namenode_corrupt_blocksCurrent number of blocks with corrupt replicas.UInt64
hdfs_namenode_estimated_capacity_lost_totalAn estimate of the total capacity lost due to volume failuresUInt64
hdfs_namenode_files_totalCurrent number of files and directoriesUInt64
hdfs_namenode_missing_blocksNumber of missing blocksUInt64
hdfs_namenode_num_live_data_nodesNumber of datanodes which are currently liveUInt64
hdfs_namenode_num_dead_data_nodesNumber of datanodes which are currently deadUInt64
hdfs_namenode_num_stale_data_nodesNumber of datanodes marked stale due to delayed heartbeat.UInt64
hdfs_namenode_pending_deletion_blocksCurrent number of blocks pending deletionUInt64
hdfs_namenode_pending_replication_blocksCurrent number of blocks pending to be replicatedUInt64
hdfs_namenode_scheduled_replication_blocksCurrent number of blocks scheduled for replicationsUInt64
hdfs_namenode_under_replicated_blocksCurrent number of blocks under replicatedUInt64
hdfs_namenode_total_loadCurrent number of connectionsUInt64
hdfs_namenode_volume_failures_totalTotal number of volume failures across all DatanodesUInt64
hdfs_namenode_num_decom_live_data_nodesNumber of datanodes which are currently deadUInt64
hdfs_namenode_num_decom_dead_data_nodesNumber of datanodes which have been decommissioned and are now liveUInt64
hdfs_namenode_num_decommissioning_data_nodesNumber of datanodes in decommissioning stateUInt64
nameName of the hostString
hostIP address of the serverLowCardinality(String)
@timestampTimestamp when the metric was collectedString
timestampDetailed timestamp with millisecondsDateTime64
hdfs_datanode_cache_capacityThe cache capacity of the DataNodeUInt64
hdfs_datanode_cache_usedThe cache used the DataNodeUInt64
hdfs_datanode_dfs_capacityCurrent raw capacity of the DataNodes in bytesUInt64
hdfs_datanode_dfs_usedThe storage space that has been used up by HDFS.UInt64
hdfs_datanode_dfs_remainingThe remaining DataNode disk space left in PercentUInt64
hdfs_datanode_num_blocks_cachedThe number of blocks cached on the DataNodeUInt64
hdfs_datanode_num_blocks_failed_to_cacheThe number of blocks that failed to cache on the DataNodeUInt64
hdfs_datanode_num_blocks_failed_to_uncacheThe number of failed blocks to remove from cache.UInt64
hdfs_datanode_num_failed_volumesNumber of failed volumes.UInt64
hdfs_datanode_blocks_readTotal number of blocks read from DataNodeUInt64
hdfs_datanode_blocks_removedTotal number of blocks removed from DataNodeUInt64
hdfs_datanode_blocks_replicatedTotal number of blocks replicatedUInt64