📺 See it in action:
If you are an existing AWS Glue user looking to analyze your Spark Applications, then you can follow the steps below to start using the Spark History Server MCP in 5 simple steps.
Follow the Quick Setup instructions to git clone the Spark History Server MCP project on your laptop:
git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server
# Install Task (if not already installed)
brew install go-task # macOS, see https://taskfile.dev/installation/ for others
# Setup and start testing
task install # Install dependenciesYou can follow the AWS Glue public documentation to setup a new self-managed Spark History Server for your AWS Glue Jobs. If you already have a Spark History Server setup, then simply use it and identify its Spark UI URL and port for Step 3.
Edit the MCP Server config to specify SparkUI URL/Port:
Option 1: Spark History Server on EC2
- Identify the SparkUiPrivateUrl or SparkUiPublicUrl (based on your subnet being private or public) from Step 2 and ensure you can open it in a web browser
- Edit SHS MCP Config: config.yaml and add the Spark UI URL and port
glue_ec2:
url: "<SparkUiUrl>:<port>"
verify_ssl: falseNote: Since the URL is self-signed, the MCP server does not need to verify the SSL connection.
Option 2: Spark History Server on Local Docker Container
- Identify and open the Spark UI in your web browser at: http://localhost:18080
- Edit SHS MCP Config: config.yaml to specify the local server information
local:
default: true
url: "http://localhost:18080"task start-mcp-bgYou can use an AI Agent to start interacting with the Spark History MCP server following the steps for Amazon Q CLI or Claude Desktop. For more instructions on other Agents, please refer to the AI Agent Integration section in the main README.