Palace Card Game with AI

Training and Test Run of the Palace Card Game

Source code GitHub repository.

1. State Representation and Encoding

The game's state is encapsulated within a numerical vector of dimensionality 91. This vector comprehensively represents the current configuration of the game, capturing the essential elements required for decision-making by the AI agent. Specifically, the state vector comprises:

The state encoding follows this structure:

State vector = [
    P1_hand_encoding[15],     // Player 1 hand cards
    P1_faceup_encoding[15],   // Player 1 face-up cards
    P1_facedown_encoding[15], // Player 1 face-down cards
    P2_hand_encoding[15],     // Player 2 hand cards
    P2_faceup_encoding[15],   // Player 2 face-up cards
    P2_facedown_encoding[15], // Player 2 face-down cards
    pile_top[1],               // Top card rank
    seven_rule_active[1]       // Active status of Seven Rule
]

2. Neural Network Architecture

The Deep Q-Network (DQN) employed in this implementation consists of a three-layer neural network designed to approximate the Q-value function, which estimates the expected rewards of actions taken in particular states. The architecture is as follows:

3. Reinforcement Learning Parameters

The training process is governed by several hyperparameters critical to the efficacy and stability of the learning algorithm:

4. Action Space and Decision Making

In each turn, the AI agent selects an action from the available action space, which consists of playing a card from one of the three categories: In Hand, Face Up, or Face Down. The validity of an action is determined based on the game's rules, such as matching or exceeding the rank of the top pile card or playing special cards like '2', '7', or 'Joker'. The DQN predicts the Q-values for each possible action, allowing the agent to select the action with the highest expected reward.

5. Reward Structure and Learning Objectives

The reinforcement learning framework is augmented with a reward system designed to incentivize desirable behaviors and discourage suboptimal actions:

6. The Bellman Equation and Q-Learning

The Bellman Equation serves as the foundation for updating the Q-values within the DQN. It formalizes the relationship between the current Q-value and the expected future rewards:

Q(state, action) = reward + γ * max(Q(next_state, all actions))

This recursive formula allows the agent to iteratively update its value estimates, balancing immediate rewards with long-term gains. For instance, playing a '2' might yield an immediate reward of +1 and facilitate future rewards by clearing the pile, resulting in a higher cumulative Q-value.

7. Deep Q-Network (DQN) Implementation

The `palace_dqn.py` script encapsulates the entire AI implementation using reinforcement learning principles. Below is a detailed breakdown of its components:

7.1. Environment Setup

The `CardGameEnv` class models the game environment, managing the state transitions, rule enforcement, and reward assignments. Key functionalities include:

7.2. Agent Design

The `DQNAgent` class represents the AI agent, responsible for selecting actions, learning from experiences, and optimizing its policy:

7.3. Training Process

The main execution block orchestrates the training of two agents (Player 1 and Player 2) over a specified number of episodes. During each episode:

8. Game Play Example

An illustrative example demonstrates the AI's decision-making process during gameplay:

Turn 1:
- Top card: 6
- Computer's hand: [King, 7, 2]
- Computer evaluates:
  * King → +1 point now, retains a high-value card for future plays
  * 7 → +1 point now, imposes constraints on the next player's move
  * 2 → +1 point now, clears the pile, potentially ending the turn
- Computer selects: 2 (strategic choice to clear the pile and potentially gain more future rewards)

Turn 2:
- Fresh pile, computer can play any card
- Computer plays: King (eliminates a high-value card, maintaining a stronger hand for subsequent turns)

9. Training Process

The AI undergoes an extensive training regimen to hone its strategic capabilities:

  1. Initial Phase (First 100 Games): The agents engage in predominantly random actions, allowing them to explore the state and action spaces without bias.
  2. Intermediate Phase (500 Games): Agents begin to recognize and adopt fundamental strategies based on accumulated experiences and learned rewards.
  3. Advanced Phase (1000 Games): Agents exhibit proficient gameplay, leveraging sophisticated strategies and optimized decision-making processes developed through extensive training.

Technical Overview of `palace_dqn.py`

The `palace_dqn.py` script is the core component that enables the AI agents to learn and play the Palace Card Game effectively. Below is a comprehensive examination of its structure and functionalities:

1. Libraries and Dependencies

Standard libraries and frameworks integral to the implementation include:

2. Constants and Utility Functions

The script defines several constants and helper functions to manage game logic:

3. Environment Class: `CardGameEnv`

This class encapsulates the game environment, handling state management, action execution, and game progression:

4. Agent Class: `DQNAgent`

The `DQNAgent` class defines the AI agent's behavior, encompassing action selection, memory management, and learning:

5. Main Execution Flow

The script's main section orchestrates the interaction between the environment and the agents:

  1. Environment and Agent Initialization: Sets up the game environment and instantiates two DQN agents representing the players.
  2. Training Loop: Iterates over a defined number of episodes, during which agents play games, collect experiences, and update their neural networks through replay.
  3. Testing Phase: Post-training, the agents engage in a deterministic game (with exploration minimized) to evaluate their learned strategies against each other.
  4. Outcome Reporting: Displays the results of the test game, including the winner, reasons for victory, and the final state of each player's cards.

The comprehensive design of `palace_dqn.py` ensures that the AI agents progressively improve their gameplay through iterative learning, leveraging neural networks to approximate optimal strategies within the Palace Card Game's framework.