This guide walks you through setting up WebArena environments for Playwright MCP automated testing, including Shopping, Shopping Admin, and Reddit instances.
Section 1 is designed mainly for completing the Playwright-WebArena tasks.
WebArena provides Docker images from multiple sources. Choose the fastest one for your network:
# Option 1: Google Drive (Recommended)
pip install gdown
gdown 1gxXalk9O0p9eu1YkIJcmZta1nvvyAJpA
# Option 2: Archive.org
wget https://archive.org/download/webarena-env-shopping-image/shopping_final_0712.tar
# Option 3: CMU Server
wget http://metis.lti.cs.cmu.edu/webarena-images/shopping_final_0712.tar# Option 1: Google Drive (Recommended)
gdown 1See0ZhJRw0WTTL9y8hFlgaduwPZ_nGfd
# Option 2: Archive.org
wget https://archive.org/download/webarena-env-shopping-admin-image/shopping_admin_final_0719.tar
# Option 3: CMU Server
wget http://metis.lti.cs.cmu.edu/webarena-images/shopping_admin_final_0719.tar# Option 1: Google Drive (Recommended)
gdown 17Qpp1iu_mPqzgO_73Z9BnFjHrzmX9DGf
# Option 2: Archive.org
wget https://archive.org/download/webarena-env-forum-image/postmill-populated-exposed-withimg.tar
# Option 3: CMU Server
wget http://metis.lti.cs.cmu.edu/webarena-images/postmill-populated-exposed-withimg.tardocker load --input shopping_final_0712.tar
# Start container
docker run --name shopping -p 7770:80 -d shopping_final_0712
# Wait for service initialization (2-3 minutes)
sleep 180
# Configure for local access
docker exec shopping /var/www/magento2/bin/magento setup:store-config:set --base-url="http://localhost:7770"
docker exec shopping mysql -u magentouser -pMyPassword magentodb -e "UPDATE core_config_data SET value='http://localhost:7770/' WHERE path IN ('web/secure/base_url', 'web/unsecure/base_url');"
docker exec shopping /var/www/magento2/bin/magento cache:flushAccess: http://localhost:7770
docker load --input shopping_admin_final_0719.tar
# Start container
docker run --name shopping_admin -p 7780:80 -d shopping_admin_final_0719
# Wait for service initialization
sleep 120
# Configure for local access
docker exec shopping_admin /var/www/magento2/bin/magento setup:store-config:set --base-url="http://localhost:7780"
docker exec shopping_admin mysql -u magentouser -pMyPassword magentodb -e "UPDATE core_config_data SET value='http://localhost:7780/' WHERE path IN ('web/secure/base_url', 'web/unsecure/base_url');"
docker exec shopping_admin php /var/www/magento2/bin/magento config:set admin/security/password_is_forced 0
docker exec shopping_admin php /var/www/magento2/bin/magento config:set admin/security/password_lifetime 0
docker exec shopping_admin /var/www/magento2/bin/magento cache:flushAccess: http://localhost:7780/admin
Admin Credentials: admin / admin1234
docker load --input postmill-populated-exposed-withimg.tar
# Start container
docker run --name forum -p 9999:80 -d postmill-populated-exposed-withimg
# Wait for PostgreSQL initialization
sleep 120
# Verify service status
docker logs forum | grep "database system is ready"
curl -I http://localhost:9999Access: http://localhost:9999
For cloud deployments (GCP, AWS, etc.), configure external access:
# Shopping environment
gcloud compute firewall-rules create allow-shopping-7770 \
--allow tcp:7770 --source-ranges 0.0.0.0/0
# Shopping Admin
gcloud compute firewall-rules create allow-shopping-admin-7780 \
--allow tcp:7780 --source-ranges 0.0.0.0/0
# Reddit
gcloud compute firewall-rules create allow-reddit-9999 \
--allow tcp:9999 --source-ranges 0.0.0.0/0# Get external IP
EXTERNAL_IP=$(curl -s ifconfig.me)
# Shopping
docker exec shopping /var/www/magento2/bin/magento setup:store-config:set --base-url="http://${EXTERNAL_IP}:7770"
docker exec shopping mysql -u magentouser -pMyPassword magentodb -e "UPDATE core_config_data SET value='http://${EXTERNAL_IP}:7770/' WHERE path IN ('web/secure/base_url', 'web/unsecure/base_url');"
docker exec shopping /var/www/magento2/bin/magento cache:flush
# Shopping Admin
docker exec shopping_admin /var/www/magento2/bin/magento setup:store-config:set --base-url="http://${EXTERNAL_IP}:7780"
docker exec shopping_admin mysql -u magentouser -pMyPassword magentodb -e "UPDATE core_config_data SET value='http://${EXTERNAL_IP}:7780/' WHERE path IN ('web/secure/base_url', 'web/unsecure/base_url');"
docker exec shopping_admin /var/www/magento2/bin/magento cache:flush# Install cloudflared
wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
sudo mv cloudflared-linux-amd64 /usr/local/bin/cloudflared
sudo chmod +x /usr/local/bin/cloudflared
# Create tunnels
cloudflared tunnel --url http://localhost:7770 # Shopping
cloudflared tunnel --url http://localhost:7780 # Admin
cloudflared tunnel --url http://localhost:9999 # Reddit# Install ngrok
wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
tar xvzf ngrok-v3-stable-linux-amd64.tgz
sudo mv ngrok /usr/local/bin
# Create tunnel (choose port)
ngrok http 7770 # For Shopping- Configure environment variables: make sure the following service credentials are added in
.mcp_env.
PLAYWRIGHT_BROWSER="chromium" # default to chromium, you can also choose firefox
PLAYWRIGHT_HEADLESS="True"- For single task or task group, run
python -m pipeline --exp-name EXPNAME --mcp MCP --tasks PLAYWRIGHTTASK --models MODELHere EXPNAME refers to customized experiment name, MCP refers to playwright or playwright_webarena denpending on the task, PLAYWRIGHTTASK refers to the task or task group selected (see Task Page for specific task information), MODEL refers to the selected model (see Introduction Page for model supported), K refers to the time of independent experiments.
# Check status
docker ps -a | grep -E "shopping|forum"
# View logs
docker logs [container_name] --tail 50
# Restart container
docker restart [container_name]- First load is slow (1-2 minutes for Magento) - this is normal
- Ensure ports are available:
netstat -tlnp | grep -E "7770|7780|9999" - Clear cache after URL changes: Required for Magento environments
# Stop and remove container
docker stop [container_name]
docker rm [container_name]
# Re-deploy (follow steps in Section 3)- Service startup time: Allow 2-3 minutes for Magento, 1-2 minutes for Reddit
- Memory requirements: Ensure Docker has at least 4GB RAM allocated per container
- URL configuration: Must reconfigure base URLs after container restart for external access
- Port assignments:
- 7770: Shopping
- 7780: Shopping Admin
- 9999: Reddit