Apache HDFS
Introduction
Apache Hadoop Distributed File System (HDFS) is the primary storage system for the Hadoop ecosystem, designed to store large volumes of data across many machines. It is a distributed file system that provides high throughput access to application data, ensuring scalability and fault tolerance.
Getting Started
Compatibility
vuSmartMaps supports monitoring of Apache HDFS and is tested with version 3.x
Data Collection Method
vuSmartMaps collects Apache HDFS data using an internal data collector. This agent collects data based on the source configuration.
Prerequisites
Inputs for Configuring Data Source
- Host: The IP Address/FQDN of the linux server. This field is the key to identify each server you add here.
- Port: Provide the JMX port details
- interval: Provide the data collection interval
- DataNode Details: Please provide the DataNode details
- DataNode IP Address: IP address needs to be a valid IP Address.
- DataNode Port: Enter a valid port
Firewall Requirement
To collect data from this O11ySource, ensure the following ports are opened:
Source IP | Destination IP | Destination Port | Protocol | Direction |
---|---|---|---|---|
vuSmartMaps IP | Apache HDFS Endpoint | 8778, 8779 | TCP | Outbound |
*Before providing the firewall requirements, please update the port based on the customer environment.
Configuring the Target
We need to add the below two lines in hdfs-env.sh and restart Apache HDFS:
export HDFS_NAMENODE_OPTS="-Dcom.sun.management.jmxremote=true \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.port=8004 \
-javaagent:/usr/share/java/jolokia-jvm-1.6.2-agent.jar=port=8778,host=0.0.0.0,id=namenode"
export HDFS_DATANODE_OPTS="-Dcom.sun.management.jmxremote=true \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.port=8008 \
-javaagent:/usr/share/java/jolokia-jvm-1.6.2-agent.jar=port=8779,host=0.0.0.0,id=datanode"
Configuration Steps
- Enable the O11ySource.
- Select the Sources tab and press the + button to add a new Apache HDFS to be monitored.
- Populate all the configurations.
- Click on Save to create the instance.
Metrics Collected
Name | Description | Data Type |
---|---|---|
name | Name of the host | String |
host | IP address of the server | LowCardinality(String) |
@timestamp | Timestamp when the metric was collected | String |
timestamp | Detailed timestamp with milliseconds | DateTime64 |
hdfs_namenode_blocks_total | Current number of allocated blocks in the system | UInt64 |
hdfs_namenode_capacity_remaining | Current remaining capacity in bytes | UInt64 |
hdfs_namenode_capacity_total | Current raw capacity of DataNodes in bytes | UInt64 |
hdfs_namenode_capacity_used | Current used capacity across all DataNodes in bytes | UInt64 |
hdfs_namenode_corrupt_blocks | Current number of blocks with corrupt replicas. | UInt64 |
hdfs_namenode_estimated_capacity_lost_total | An estimate of the total capacity lost due to volume failures | UInt64 |
hdfs_namenode_files_total | Current number of files and directories | UInt64 |
hdfs_namenode_missing_blocks | Number of missing blocks | UInt64 |
hdfs_namenode_num_live_data_nodes | Number of datanodes which are currently live | UInt64 |
hdfs_namenode_num_dead_data_nodes | Number of datanodes which are currently dead | UInt64 |
hdfs_namenode_num_stale_data_nodes | Number of datanodes marked stale due to delayed heartbeat. | UInt64 |
hdfs_namenode_pending_deletion_blocks | Current number of blocks pending deletion | UInt64 |
hdfs_namenode_pending_replication_blocks | Current number of blocks pending to be replicated | UInt64 |
hdfs_namenode_scheduled_replication_blocks | Current number of blocks scheduled for replications | UInt64 |
hdfs_namenode_under_replicated_blocks | Current number of blocks under replicated | UInt64 |
hdfs_namenode_total_load | Current number of connections | UInt64 |
hdfs_namenode_volume_failures_total | Total number of volume failures across all Datanodes | UInt64 |
hdfs_namenode_num_decom_live_data_nodes | Number of datanodes which are currently dead | UInt64 |
hdfs_namenode_num_decom_dead_data_nodes | Number of datanodes which have been decommissioned and are now live | UInt64 |
hdfs_namenode_num_decommissioning_data_nodes | Number of datanodes in decommissioning state | UInt64 |
name | Name of the host | String |
host | IP address of the server | LowCardinality(String) |
@timestamp | Timestamp when the metric was collected | String |
timestamp | Detailed timestamp with milliseconds | DateTime64 |
hdfs_datanode_cache_capacity | The cache capacity of the DataNode | UInt64 |
hdfs_datanode_cache_used | The cache used the DataNode | UInt64 |
hdfs_datanode_dfs_capacity | Current raw capacity of the DataNodes in bytes | UInt64 |
hdfs_datanode_dfs_used | The storage space that has been used up by HDFS. | UInt64 |
hdfs_datanode_dfs_remaining | The remaining DataNode disk space left in Percent | UInt64 |
hdfs_datanode_num_blocks_cached | The number of blocks cached on the DataNode | UInt64 |
hdfs_datanode_num_blocks_failed_to_cache | The number of blocks that failed to cache on the DataNode | UInt64 |
hdfs_datanode_num_blocks_failed_to_uncache | The number of failed blocks to remove from cache. | UInt64 |
hdfs_datanode_num_failed_volumes | Number of failed volumes. | UInt64 |
hdfs_datanode_blocks_read | Total number of blocks read from DataNode | UInt64 |
hdfs_datanode_blocks_removed | Total number of blocks removed from DataNode | UInt64 |
hdfs_datanode_blocks_replicated | Total number of blocks replicated | UInt64 |