SMO Optimization Capabilities¶

This note analyzes the Synergetic Meta-Orchestrator’s (SMO) current optimization capabilities, as implemented in its codebase (notably within utils/placement.py and utils.scaling.py). We will describe how these optimization functions are invoked, the specific data they consume, and how their outputs drive SMO’s orchestration actions.

The SMO employs mathematical optimization, primarily using the CVXPY convex optimization library, to make intelligent decisions regarding the deployment and ongoing management of Hyper Distributed Application Graphs (HDAGs). Its current optimization logic is focused on two key areas: service placement and replica scaling.

1. Service Placement Optimization¶

SMO uses two distinct placement algorithms: a simpler “naive” algorithm for initial deployment and a more complex CVXPY-based optimization for re-evaluating placement.

A. Initial Placement (`utils.placement.calculate_naive_placement`)¶

This algorithm determines the first placement of services when an HDAG is initially deployed.

Invocation: Called only once per HDAG, within the services.hdag.graph_service.deploy_graph function immediately after the Graph object is created in the database.
Data Flow (Inputs):
- cluster_capacities: List of available CPU cores per cluster, sourced from SMO’s database (Cluster.available_cpu).
- cluster_accelerations: List of GPU capabilities per cluster (boolean/int), sourced from SMO’s database (Cluster.acceleration).
- cpu_limits: List of CPU core requirements for each service, translated from the HDAG descriptor’s intent.compute.cpu field using utils.intent_translation.tranlsate_cpu.
- accelerations: List of GPU requirements for each service (boolean/int), derived from the HDAG descriptor’s intent.compute.gpu.enabled.
- replicas: Hardcoded to [1, 1, ..., 1] for all services during initial deployment.
Logic: Implements a first-fit-like heuristic, attempting to place each service onto the first available cluster that meets its resource (CPU, GPU) requirements without exceeding capacity.
Data Flow (Outputs):
- Returns a placement matrix (2D list: [[0, 1, 0], [1, 0, 0], ...]).
- This matrix is immediately stored in the graph.placement field (JSONB) in the database.
- It’s converted (via utils.placement.convert_placement) into a service_placement dictionary (e.g., {'serviceA': 'cluster2', 'serviceB': 'cluster1'}).
- This dictionary is used to populate the clustersAffinity and serviceImportClusters fields within the Helm values_overwrite data for each service before the initial helm install command is executed via services.hdag.graph_service.helm_install_artifact.

B. Re-Placement Optimization (`utils.placement.decide_placement`)¶

This CVXPY-based optimization is used when explicitly triggered to re-evaluate the placement of an existing, deployed HDAG.

Invocation: Called within the services.hdag.graph_service.trigger_placement function, which is executed when a request hits the GET /graphs/<name>/placement API endpoint.
Data Flow (Inputs):
- cluster_capacities, cluster_acceleration: Sourced from SMO’s database (Cluster model), reflecting the current known state.
- cpu_limits, acceleration: Sourced from the persisted SMO database records for each service within the HDAG (Service.cpu, Service.gpu).
- replicas: Fetched live from the Karmada control plane using utils.karmada_helper.KarmadaHelper.get_replicas to reflect the actual current replica count of each deployed service.
- current_placement: The existing placement matrix, retrieved from the SMO database (graph.placement). This is used as the y variable in the objective function to penalize changes.
Logic (CVXPY Model):
- Decision Variables: Boolean matrix x[s, e] (1 if service s is placed on cluster e).
- Objective Function: Minimizes w_dep * cp.sum(x) (deployment cost/spread) + w_re * cp.sum(cp.multiply(y, (y - x))) (re-optimization cost/stability penalty). Weights w_dep and w_re are currently hardcoded constants (1).
- Constraints: Ensures each service is on one cluster, respects cluster CPU capacity based on current live replica counts, respects GPU requirements, and potentially handles service dependencies (d[i-1] constraint - exact logic depends on d).
Data Flow (Outputs):
- Returns a new optimized placement matrix.
- This new matrix updates the graph.placement field in the database.
- The code then compares the new placement to the old one for each service. If a service’s placement changes:
  - Its corresponding Helm values_overwrite data (stored in Service.values_overwrite) is updated with the new clustersAffinity and recalculated serviceImportClusters.
  - The helm upgrade command is executed for that specific service via services.hdag.graph_service.helm_install_artifact to apply the change.
- The updated cluster assignments (cluster_placement) are used to restart the background scaling threads with the correct context.

2. Replica Scaling Optimization (`utils.scaling.decide_replicas`)¶

This CVXPY-based optimization runs continuously in background threads for deployed services (if scaling is enabled) to adjust replica counts based on load.

Invocation: Called repeatedly within the utils.scaling.scaling_loop function. This loop runs in a separate thread for each cluster managing services of a given graph, started by services.hdag.graph_service.spawn_scaling_processes after initial deployment or re-placement.
Data Flow (Inputs):
- request_rates: List of current request rates per second for each managed service, fetched live from Prometheus using utils.prometheus_helper.PrometheusHelper.get_request_rate. (Note: image-compression-vo uses noise-reduction‘s rate).
- previous_replicas: The list of replica counts determined in the previous iteration of this loop (or fetched initially from Karmada). Used in the transition cost calculation.
- cpu_limits: List of CPU core limits per service, fetched once at the start of the scaling loop from Karmada using utils.karmada_helper.KarmadaHelper.get_cpu_limit.
- acceleration: List of GPU requirements per service (boolean/int), passed in when the thread starts (originating from HDAG descriptor/Service model). Currently sourced from hardcoded values in spawn_scaling_processes.
- alpha, beta: Coefficients for the linear service capacity model (capacity = alpha * replicas + beta), passed in when the thread starts. Currently sourced from hardcoded dictionaries in spawn_scaling_processes keyed by service name.
- cluster_capacity, cluster_acceleration: The specific capacity and GPU capability of the single cluster this scaling loop is responsible for, passed in when the thread starts (originating from DB Cluster model).
- maximum_replicas: List of maximum allowed replicas per service, passed in when the thread starts. Currently sourced from hardcoded values in spawn_scaling_processes.
Logic (CVXPY Model):
- Decision Variables: Integer list r_current[s] (target replicas for service s on this cluster).
- Objective Function: Minimizes w_util * cp.sum(...) (resource utilization cost, favoring fewer replicas/CPU) + w_trans * cp.sum(...) (transition cost, favoring fewer replica changes). Weights w_util and w_trans are currently hardcoded (0.4).
- Constraints: Respects the capacity of the single cluster this loop manages, ensures deployed capacity meets or exceeds current live request rates (alpha[s] * r_current[s] + beta[s] >= request_rates[s]), enforces minimum (1) and maximum replica counts per service.
Data Flow (Outputs):
- Returns new_replicas, a list of optimal integer replica counts for the services managed by this thread/cluster.
- If a solution is found (problem.status == cp.OPTIMAL):
  - For each service, utils.karmada_helper.KarmadaHelper.scale_deployment is called to instruct Karmada to adjust the deployment to the new replica count.
  - This new_replicas list becomes the previous_replicas input for the next iteration of the loop.
- If no optimal solution is found (e.g., constraints cannot be met, returns None):
  - SMO makes an HTTP GET request to its own /graphs/<name>/placement endpoint, triggering a full graph re-placement evaluation in the hope that moving services might resolve the scaling infeasibility. NB: this seems non-ideal, see the discussion below.

Notes¶

How to replace the REST API call from within the SMO itself?¶

It does not make sense from a software design perspective. It introduces unnecessary overhead, potential reliability issues, and obscures the code flow. A direct function call (after refactoring the core logic) is likely the most appropriate and efficient solution for triggering the re-placement logic from within the scaling loop in the same application process, provided thread safety and application context are handled correctly.

Better Alternatives:

Direct Function Call (Refactored):
- How: Refactor the core logic of trigger_placement into a separate, internal function (e.g., _perform_graph_re_placement(graph_name) within graph_service.py).
- Both the /graphs/<name>/placement API route handler and the scaling_loop would call this internal function directly.
- Considerations: Need to ensure the function called from the background thread handles Flask application context correctly if needed (e.g., for database access using with app.app_context():). State modifications must be thread-safe.
- Benefit: Most efficient, clear code flow, simpler error handling.
Internal Task Queue:
- How: When scaling_loop detects failure, it pushes a “re-placement task” (with the graph name) onto an internal queue (could be a simple Python queue.Queue managed by another thread, or a more robust library like Celery or RQ if SMO needed more advanced background tasking). A dedicated worker processes items from the queue, calling the re-placement logic.
- Benefit: Decouples the scaling loop from the immediate execution of the re-placement, allows for better control (e.g., rate-limiting re-placements), and potentially simplifies concurrency management. However, it adds the complexity of managing the queue system.

Page last modified: 2025-05-12 10:28:50