Docs > Monitoring and Managing vuSmartMaps > Self Observability > Hyperscale Analytics Dashboard

Hyperscale Analytics Dashboard

Introduction

The Hyperscale Management Dashboard simplifies large-scale system management with four integrated dashboards. The Cluster Analysis Dashboard provides critical insights into performance and resource allocation, while the Data Explorer Dashboard allows for deep data exploration and visualization. The Health and Performance Dashboard keeps a pulse on system stability with real-time monitoring, and the Table Explorer Dashboard ensures database efficiency through detailed table analysis.

This guide will dive into each dashboard, outlining their features and how they enhance system operations.

Accessing the Hyperscale Management Dashboard

To access the Hyperscale Management Dashboard, navigate to the left navigation menu -> Dashboards

On clicking the Dashboards run the search for the Hyperscale.

You will then see the four available dashboards listed:

  1. Hyperscale DB- Data Explorer
  2. Hyperscale DB- Table Explore
  3. Hyperscale DB- Health & Performance
  4. Hyperscale DB- Cluster Analysis

Clicking on any of these dashboards will take you directly to the respective dashboard interface. Now, let’s explore each of these dashboards in detail.

Data Explorer Dashboard

The Data Explorer Dashboard offers a quick and easy way to view table data with just a few clicks. It’s particularly useful for verifying whether a specific table is receiving the most recent data feed.

At the top of each dashboard filter are located:

  1. Database: Allows you to select the database where the table to be verified exists. The default database is “vusmart.” In most cases, this filter won’t need to be changed, as all tables are created in the “vusmart” database by default.
  2. Engine TypeEnables selection of the underlying table engine type, such as Distributed, Materialized View, etc. The default engine type is “Distributed.” Typically, all tables will have a distributed table on top of the base table.
  3. Table NameAllows you to select the table you want to view. The list of tables shown in this filter depends on the selected “Engine Type.” If the table you’re searching for isn’t listed, ensure that you’ve selected the correct table engine type.
  4. Date Column: Lists all the “Date” type columns available in the selected table. By default, the “timestamp” column will be shown if it is available.

Using a stepper tab, you can switch between different dashboards.

Data Explorer

The first panel in the dashboard, Data Trend Based on-timestamp, visualizes trends based on the selected ‘Date column’ from the dataset. It provides a quick overview of how data fluctuates over time. If no data is available in the selected object or the dataset lacks a valid date column, the trend cannot be displayed. This panel is key for identifying patterns or gaps in data over a chosen time period.

The second panel displays data from all available columns in the table, with a column-level filter that allows users to quickly focus on specific data. The data shown is limited to the top 500 records, determined by the ‘ORDER BY’ clause set for the table.

Table Explorer Dashboard

The Table Explorer Dashboard provides a comprehensive overview of a selected table’s structure, including its columns and associated data types, as well as key sizing details. This dashboard is designed to give users a quick snapshot of table composition, helping to identify column properties, data distribution, and overall table size for effective database management.

💡Note: This Dashboard also has similar filters which are available in the “Data Explore” storyboard, with the exception of the Date Column filter.

Table & Column Structure

First section of dashboard is the Table & Column Structure

The Table Info panel within the Table and Column Structure provides essential metadata about the selected table. It includes the following columns:

  • Primary Key: Identifies the unique key that distinguishes each row in the table.
  • Sorting Key: Specifies the column(s) used to sort the data within the table for optimized querying.
  • Partition Key: Displays the key used for partitioning the data, enhancing performance and scalability.
  • Last Modified Time: Indicates the most recent timestamp when the table data was updated.
  • Has Own Data: Shows whether the table contains its own data or if it’s derived from another table or view.
  • Create Script: Provides access to the script used to create the table, allowing for review or replication of the table structure. You can also inspect the value by clicking on the eye button next to it.

The Column Info panel in the Table and Column Structure provides a detailed breakdown of the columns within the selected table.

This section includes the following fields:

  • Col Position: Displays the position of the column in the table, indicating its order.
  • Column Name: Lists the name of each column in the table.
  • Data Type: Specifies the data type for each column, helping to understand how the data is stored and used.
  • Default Value: Shows any default value assigned to the column if a new record does not provide one.
  • Comment: Provides any additional context or annotations regarding the column’s purpose or usage.
  • Compression: Indicates whether compression is applied to the column, which helps optimize storage.
  • Is in Sorting Key: Identifies if the column is part of the sorting key used for efficient query operations.
  • Is in Primary Key: Indicates whether the column is a part of the primary key, which uniquely identifies each row.
  • Precision: For numeric data types, this shows the number of significant digits that the column can store.
  • Scale: For numeric columns, this indicates the number of decimal places the column can store.

Table & Column Sizing Info

The next section of the Dashboard is Table & Column Sizing Info, that provides critical information on table and column sizing, aiding in understanding storage usage and data distribution across partitions. It is divided into four panels:

The first panel is the Table Size Growth by Partition, this panel presents details about the current size of the selected table, along with a trend showing size growth by partition key. If the selected table doesn’t have any partitions, the growth trend cannot be displayed.

The second panel is Table Sizing Info. This table presents an overview of the sizing details for each table.

 It includes the following columns:

  • Table Name: Displays the name of the table.
  • Table Size: Shows the total size of the table on disk.
  • Compressed: Indicates the compressed size of the table.
  • Uncompressed: Shows the size of the table before compression.
  • Compression Ratio: Displays the ratio of compressed to uncompressed size.
  • Total Rows: Shows the total number of rows in the table.
  • No. of Parts: Displays the number of parts in the table.
  • No. of Partitions: Indicates the number of partitions within the table.

Third panel is the Projection Details. This panel provides sizing information for projections, if any are defined for the base table. It helps to understand how projections impact overall storage usage.

The last panel is Column Sizing Info. This table offers a detailed breakdown of each column’s storage details.

It includes the following columns:

  • Col: Column position in the table.
  • Column Name: Name of the column.
  • Data Type: Data type of the column.
  • Col Size: Total size of the column’s data on disk.
  • Total Rows: Number of rows contained in the column.
  • Compressed: Compressed size of the column.
  • Uncompressed: Uncompressed size of the column.
  • Compression %: The percentage of compression achieved for the column.
  • Avg Row Size: Average size of each row in the column.
  • Last Modified Time: Timestamp of the last modification to the column’s data.

These panels provide comprehensive insights into table and column storage, allowing for effective monitoring and management of database resources.

Health & Performance

The Health and Performance Dashboard is designed to monitor the overall health and operational status of the database clusters. It provides insights into key metrics such as cluster performance, uptime, disk usage, and connection statistics. This information is critical for ensuring smooth operations and identifying potential issues before they affect the system.

 

Cluster Overview

The first section is Cluster Overview, this provides a snapshot of the health and performance of each cluster. This section contains multiple panels, which present vital information at a glance.

The first panel is Cluster Overview, This panel lists the clusters along with key details such as the host name, shard number, and replica number. In this screenshot, there are two clusters displayed, each with details about their respective hosts and replica configurations.

It includes the following columns:

  • Cluster: The name of the database cluster being monitored.
  • Host Name: The name of the host server that the cluster is running on.
  • Shard Num: The number of shards assigned to the cluster, representing partitioning within the distributed database.
  • Replica Num: The number of replicas available for the shard, indicating how many copies of the data exist for redundancy and failover purposes.

Cluster Uptime panel shows how long each host within the cluster has been up and running, helping to track the stability and availability of the cluster over time.

The table includes following column:

  • Host Name: The name of the host server being monitored.
  • Up Since: The amount of time the host has been continuously running without interruption, providing insight into system uptime and stability.

The Disk Info panel provides details about the disk usage on each host. It displays the total space available and the free space remaining on each host, giving an overview of storage capacity and potential risks of running out of space.

The table contains the following information:

  • Host Name: The name of the host server being monitored for disk usage.
  • Total Space: The total disk space available on the host server.
  • Free Space: The remaining free disk space available for use, which is important for preventing storage issues.

The TCP Connections panel shows the trend of TCP connections over a selected time range. It provides insights into network connectivity and traffic, allowing users to monitor the number of active connections to the cluster.

Similar to the TCP connections panel, HTTP Connections panel monitors HTTP connections over time. It helps track the web traffic interacting with the cluster, which can indicate the level of user activity or external service requests.

The next panel is CPU Wait Time, This panel displays the average CPU wait time across the selected hosts. High CPU wait times may indicate that the system is struggling to process tasks quickly enough, potentially affecting performance.The graph represents the average wait time for CPU resources on each host, giving insights into system bottlenecks.

Memory Usage panel tracks memory usage across the hosts, highlighting how much memory each host is consuming at a given time. Spikes in memory usage can be an indicator of inefficient processes or potential performance issues. Line graphs display the average memory usage per host over the specified period, allowing users to detect any unusual patterns.

The IO (Input/Output) Wait Time panel monitors the average I/O wait time on the selected hosts. High I/O wait times indicate that the system is waiting for input/output operations to complete, which can slow down overall performance. Graphs track I/O wait times in microseconds, helping to identify any lags in data read/write operations.

The ZK (ZooKeeper) Wait Time panel tracks ZooKeeper wait times across the system. ZooKeeper is a critical service in distributed systems for coordination. Increased wait times can indicate a problem with system coordination or synchronization. Line graphs display the average wait time for ZooKeeper operations, providing insights into system synchronization efficiency.

Data Size Metrics

The Data Size Metrics section within the Health & Performance Dashboard comprises two panels. It is a collection or repository that contains tables related to data size metrics, including both Data Size tables and Error in Data Partitions tables. These tables store information regarding the size and characteristics of data within databases or tables, as well as any errors or exceptions encountered within data partitions.That is the Data Size Panel and Error in Data Parts panel.

Data Size Panel: This panel provides detailed information about the data size metrics for various tables within the system.

 

The following columns are displayed:

  • Database: The name of the database where the table resides.
  • Table Name: The specific name of the table being monitored.
  • # of Rows: The total number of rows present in the table.
  • Compressed: The size of the table when data is compressed.
  • Uncompressed: The size of the table when data is uncompressed.
  • # of Parts: The number of data parts that the table is divided into.
  • Last Modified Time: The timestamp showing the last time the table was modified.
  • PK Size: The size of the primary key for the table.
  • Engine: The storage engine used for the table (e.g., ReplicatedMergeTree or MergeTree).

This table is essential for monitoring and understanding how data is stored, how efficient compression is, and the structure of the data within a cluster.

Error in Data Parts Panel: This panel monitors  any errors in the data parts of the tables being analyzed. In the screenshot, it shows No data, indicating that there are currently no detected issues or errors in the data parts.

This section provides administrators and users with a clear understanding of the size and structure of the tables, as well as any potential issues with data storage that might need attention.

Data Ingestion Metrics

The next section is the Data Ingestion Metrics that  contains eight panels. It is a collection or repository that houses various metrics related to data ingestion processes. It includes panels such as Insert Rate, Inserted bytes per second, Merged Rows Per Second, Merged Uncompressed Bytes Per Second, New Part Creation Frequency, Replication Status, Average Time Taken to Create New Part, and Incoming EPS (Events Per Second). These metrics provide insights into the efficiency, speed, and status of data ingestion operations within a system or application.

Insert Rate: Displays the rate of rows being inserted per second into the ClickHouse database for different EPS instances (chi-clickhouse-vusmart-0-0-0 and chi-clickhouse-vusmart-0-1-0). It helps in monitoring the ingestion speed and identifying any significant spikes or drops.

Inserted Bytes Per Second: Shows the amount of data (in bytes) being inserted per second into the ClickHouse database for the respective EPS instances. This metric provides insights into the volume of data being ingested over time.

Merged Rows Per Second: Tracks the number of rows being merged per second for the EPS instances. Merges are a key part of ClickHouse’s data organization process, and this metric indicates the efficiency and frequency of these operations.

Merged Uncompressed Bytes Per Second: Indicates the volume of uncompressed data (in bytes) that is merged per second. Monitoring this helps to understand how much raw data is being processed during merge operations.

New Part Creation Frequency: Shows how often new parts (data partitions) are being created in the database. A higher frequency of part creation can imply an active data ingestion process or fragmentation that might need optimization.

Average Time Taken to Create a New Part: Displays the average duration (in seconds) it takes to create a new data part. This metric is crucial for understanding the efficiency of data ingestion and part management processes.

Replication Status: Displays the status of replication for the ClickHouse instance chi-clickhouse-vusmart-0-0-0. It lists key metrics such as ReplicasMaxQueueSizeReplicasMaxRelativeDelay, and ReplicasMaxAbsoluteDelay along with their current values. These metrics help monitor the replication lag and queue sizes, ensuring that data is consistently and timely replicated across nodes.

Incoming EPS: Shows the incoming events per second (EPS) for different Kafka-related tables (e.g., kafka_streams_TaskMetrics, kafka_connect_ConnectNodeMetrics_data, etc.). This panel is useful for monitoring the rate at which data is being ingested from Kafka streams into the database, highlighting any spikes or drops in data flow.

Read/Query Metrics

The next section is Read/Query Metrics, it contains 6 panels within it. Read/Query Metrics refers to a set of measurements and statistics that track the performance and usage of queries executed against a database system. These metrics provide insights into how efficiently the database handles read operations, such as retrieving data from tables or executing search queries.

Top 30 Slow Queries: This panel lists the top 30 slowest queries executed on the system, providing detailed insights into query performance. It helps in identifying queries that may require optimization.

The table contains the following columns:

  • Time: The timestamp when the query was executed.
  • Host Name: The name of the host where the query was run.
  • Query ID: A unique identifier for the query.
  • Query: The actual SQL query that was executed.
  • Exec Duration: The total time taken to execute the query, measured in seconds.
  • User: The username associated with the query execution.
  • Read Bytes: The amount of data read during the query execution.
  • Read Rows: The number of rows read during the query execution.
  • Result Bytes: The size of the query result.
  • Result Rows: The number of rows in the query result.
  • Type: The type of the query, indicating whether it finished successfully or encountered an issue.
  • Exception: Details of any exceptions or errors that occurred during the query execution.

Top 30 Queries by Memory Utilization: This panel shows the top 30 queries based on their memory usage, helping to identify memory-intensive queries that may affect system performance.

The table contains the following columns:

  • Host Name: The name of the host where the query was run.
  • Query: The actual SQL query that was executed.
  • Avg Memory Usage: The average amount of memory utilized by the query during its execution.
  • Count: The number of times the query was executed.
  • Queries: A list of Query IDs associated with this specific query type.

Stuck Queries: This panel identifies queries that are running for more than 10 seconds, which could indicate potential issues such as blocking, inefficiencies, or other delays in query processing. In this screenshot, it indicates that there is currently No data, meaning there are no queries that have been running for more than 10 seconds during the observed period.

Avg Query Duration & No. of Requests: This panel tracks the average duration of queries alongside the number of requests being made to the database. It helps to correlate the query performance (in terms of time) with the load (in terms of the number of requests).

QPS (Queries Per Second)This panel monitors the number of queries executed per second. A consistent trend indicates stable query throughput, while fluctuations might require further investigation to ensure the system can handle the load.

Failed QPSThis panel tracks the number of queries that failed per second. Monitoring failed queries is essential for identifying and troubleshooting issues that could impact application performance.

Cluster Analysis

The Cluster Analysis Dashboard provides a comprehensive overview of the database cluster, helping to monitor its overall status and performance. It includes various panels and metrics that give insights into the health and configuration of the cluster

The first sets of panels showcase the following Information:

  • Version: Displays the current version of the database, helping to keep track of updates or the need for upgrades.
  • Server Uptime: Shows how long the server has been running, which can be an indicator of stability or the need for maintenance.
  • Number of Databases: Indicates the total number of databases hosted on the cluster.
  • Number of Tables: Displays the total number of tables across the databases.
  • Number of Rows: Provides a count of the total rows across all tables, which can give a sense of the data volume being handled.
  • Number of Columns: Displays the total number of columns in all tables, reflecting the complexity and breadth of the database schema.

The next panel is Cluster Overview.

This panel provides a detailed view of the cluster’s configuration, including:

  • Cluster: The name of the cluster configuration.
  • Shard Number: Indicates the number of shards in the cluster.
  • Replicated Number: Shows the number of replicas for high availability and fault tolerance.
  • Host Name & Address: Identifies the server name and IP address for each part of the cluster.
  • Port: The port number used for communication with the server.
  • Is Local: Specifies whether the node is local or remote.
  • Errors Count: Tracks the number of errors encountered, helping to identify problematic nodes.
  • Slowdowns Count: Monitors slowdowns in processing, which can indicate performance issues.

The next sets of panel are:

  • Merge Progress Per Table
  • Current Merges
  • Mutations parts remaining
  • Current mutations

Merge Progress Per Table: This panel displays the progress of data merges for each table in the cluster. Merges are essential for optimizing storage and performance in the database by consolidating smaller parts into larger ones. The absence of data may indicate no current merging activities.

Current Merges: Shows the active merge operations happening in the cluster. If no data is present, it means there are no ongoing merges at the moment. This panel helps in monitoring and understanding the merge workload and its impact on the system.

Mutations Parts Remaining: Indicates the number of parts in the database that are pending mutation operations. Mutations are operations like updates or deletes that need to be applied to the data. A higher number of remaining parts could suggest a backlog that may impact performance.

Current Mutations: Lists the active mutation operations, including details like the table name, mutation ID, creation time, completion status, reason, and Fail time. This panel is vital for tracking the progress and success of mutations, especially in ensuring data consistency and integrity.

The Next sets of Panels are Replicated tables by delay.

Panel on the left visualizes the delay in replication for different tables. Replication delay occurs when there is a lag in syncing data between the shard (master) and replica nodes. The visual representation (bars) and the numeric value indicate how much delay is present, which is crucial for maintaining data consistency and availability across the cluster.

The panel next to this provides a detailed, table-by-table breakdown of replication metrics, allowing for precise monitoring of how each table is handling replication.

The following fields will be displayed as follows

  • Table: The name of the table being replicated.
  • Leader: Indicates the number of leaders managing the replication.
  • Readonly: Shows whether the table is in a readonly state (0 means readable and writable, 1 means readonly).
  • Delay: The actual replication delay for the table.
  • Queue Size: Number of items waiting in the replication queue, which can indicate the load or backlog in replication.
  • Inserts in Queue: The count of insert operations waiting to be replicated.
  • Merges in Queue: The count of merge operations waiting to be applied to the replicas.

Conclusion

The Hyperscale Management Dashboard offers a comprehensive solution for managing large-scale systems through its four integrated dashboards: Cluster Analysis, Data Explorer, Health and Performance, and Table Explorer. Each dashboard provides specialized insights and tools, helping to optimize performance, maintain system stability, and enhance database management. By leveraging these dashboards, users can efficiently monitor, analyze, and manage their systems, ensuring seamless operations and effective resource utilization.

Resources

Browse through our resources to learn how you can accelerate digital transformation within your organisation.

Quick Links