Version: NG-2.15

Apache HDFS

Introduction

Apache Hadoop Distributed File System (HDFS) is the primary storage system for the Hadoop ecosystem, designed to store large volumes of data across many machines. It is a distributed file system that provides high throughput access to application data, ensuring scalability and fault tolerance.

Getting Started

Compatibility

vuSmartMaps supports monitoring of Apache HDFS and is tested with version 3.x

Data Collection Method

vuSmartMaps collects Apache HDFS data using an internal data collector. This agent collects data based on the source configuration.

Prerequisites

Inputs for Configuring Data Source

Host: The IP Address/FQDN of the linux server. This field is the key to identify each server you add here.
Port: Provide the JMX port details
interval: Provide the data collection interval
DataNode Details: Please provide the DataNode details
DataNode IP Address: IP address needs to be a valid IP Address.
DataNode Port: Enter a valid port

Firewall Requirement

To collect data from this O11ySource, ensure the following ports are opened:

Source IP	Destination IP	Destination Port	Protocol	Direction
vuSmartMaps IP	Apache HDFS Endpoint	8778, 8779	TCP	Outbound

*Before providing the firewall requirements, please update the port based on the customer environment.

Configuring the Target

We need to add the below two lines in hdfs-env.sh and restart Apache HDFS:

export HDFS_NAMENODE_OPTS="-Dcom.sun.management.jmxremote=true \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.port=8004 \
-javaagent:/usr/share/java/jolokia-jvm-1.6.2-agent.jar=port=8778,host=0.0.0.0,id=namenode"

export HDFS_DATANODE_OPTS="-Dcom.sun.management.jmxremote=true \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.port=8008 \
-javaagent:/usr/share/java/jolokia-jvm-1.6.2-agent.jar=port=8779,host=0.0.0.0,id=datanode"

Configuration Steps

Enable the O11ySource.
Select the Sources tab and press the + button to add a new Apache HDFS to be monitored.
Populate all the configurations.
Click on Save to create the instance.

Metrics Collected

Name	Description	Data Type
name	Name of the host	String
host	IP address of the server	LowCardinality(String)
@timestamp	Timestamp when the metric was collected	String
timestamp	Detailed timestamp with milliseconds	DateTime64
hdfs_namenode_blocks_total	Current number of allocated blocks in the system	UInt64
hdfs_namenode_capacity_remaining	Current remaining capacity in bytes	UInt64
hdfs_namenode_capacity_total	Current raw capacity of DataNodes in bytes	UInt64
hdfs_namenode_capacity_used	Current used capacity across all DataNodes in bytes	UInt64
hdfs_namenode_corrupt_blocks	Current number of blocks with corrupt replicas.	UInt64
hdfs_namenode_estimated_capacity_lost_total	An estimate of the total capacity lost due to volume failures	UInt64
hdfs_namenode_files_total	Current number of files and directories	UInt64
hdfs_namenode_missing_blocks	Number of missing blocks	UInt64
hdfs_namenode_num_live_data_nodes	Number of datanodes which are currently live	UInt64
hdfs_namenode_num_dead_data_nodes	Number of datanodes which are currently dead	UInt64
hdfs_namenode_num_stale_data_nodes	Number of datanodes marked stale due to delayed heartbeat.	UInt64
hdfs_namenode_pending_deletion_blocks	Current number of blocks pending deletion	UInt64
hdfs_namenode_pending_replication_blocks	Current number of blocks pending to be replicated	UInt64
hdfs_namenode_scheduled_replication_blocks	Current number of blocks scheduled for replications	UInt64
hdfs_namenode_under_replicated_blocks	Current number of blocks under replicated	UInt64
hdfs_namenode_total_load	Current number of connections	UInt64
hdfs_namenode_volume_failures_total	Total number of volume failures across all Datanodes	UInt64
hdfs_namenode_num_decom_live_data_nodes	Number of datanodes which are currently dead	UInt64
hdfs_namenode_num_decom_dead_data_nodes	Number of datanodes which have been decommissioned and are now live	UInt64
hdfs_namenode_num_decommissioning_data_nodes	Number of datanodes in decommissioning state	UInt64
name	Name of the host	String
host	IP address of the server	LowCardinality(String)
@timestamp	Timestamp when the metric was collected	String
timestamp	Detailed timestamp with milliseconds	DateTime64
hdfs_datanode_cache_capacity	The cache capacity of the DataNode	UInt64
hdfs_datanode_cache_used	The cache used the DataNode	UInt64
hdfs_datanode_dfs_capacity	Current raw capacity of the DataNodes in bytes	UInt64
hdfs_datanode_dfs_used	The storage space that has been used up by HDFS.	UInt64
hdfs_datanode_dfs_remaining	The remaining DataNode disk space left in Percent	UInt64
hdfs_datanode_num_blocks_cached	The number of blocks cached on the DataNode	UInt64
hdfs_datanode_num_blocks_failed_to_cache	The number of blocks that failed to cache on the DataNode	UInt64
hdfs_datanode_num_blocks_failed_to_uncache	The number of failed blocks to remove from cache.	UInt64
hdfs_datanode_num_failed_volumes	Number of failed volumes.	UInt64
hdfs_datanode_blocks_read	Total number of blocks read from DataNode	UInt64
hdfs_datanode_blocks_removed	Total number of blocks removed from DataNode	UInt64
hdfs_datanode_blocks_replicated	Total number of blocks replicated	UInt64

Introduction​

Getting Started​

Compatibility​

Data Collection Method​

Prerequisites​

Inputs for Configuring Data Source​

Firewall Requirement​

Configuring the Target​

Configuration Steps​

Metrics Collected​