SMO↔︎Hop3 Improvement Plan¶

To effectively support the H3NI project’s goals of integrating Hop3 and introducing advanced orchestration capabilities (dynamic scaling, energy optimization, compute offloading, AI/ML workflow management), the following specific enhancements and additions to the SMO codebase need to be considered.

I. Core Requirement: Pluggable Optimization Strategies¶

This is the most fundamental set of changes needed to allow H3NI to introduce its specialized logic.

Define Strategy Interfaces for Placement & Scaling:
- Action (SMO): In src/utils/, create Python Abstract Base Classes (ABCs) or formal interfaces for:
  - PlacementStrategy: Would define methods like calculate_initial_placement(hdag_descriptor, available_clusters, service_requirements) and decide_re_placement(current_hdag_state, available_clusters, service_requirements, current_placement_matrix).
  - ScalingStrategy: Would define a method like decide_replicas(current_hdag_service_state, cluster_state, performance_metrics, service_profiles).
- Rationale: Provides a contract for H3NI (and others) to implement custom placement and scaling algorithms.
Refactor graph_service.py to Use Strategy Plugins:
- Action (SMO):
  - Modify deploy_graph to dynamically load or select a PlacementStrategy (e.g., based on a new field in the HDAG descriptor like placementStrategyName: "h3ni_energy_aware" or a global SMO config). The current calculate_naive_placement and decide_placement would become default implementations of this interface.
  - Modify spawn_scaling_processes and scaling_loop to dynamically load or select a ScalingStrategy. The current decide_replicas logic would become a default implementation.
- Rationale: Decouples SMO’s core workflow from specific optimization implementations, allowing H3NI to inject its logic.

II. Supporting H3NI’s Energy Optimization Goal¶

H3NI KPI: “Achieve at least 15% energy efficiency improvement.”

Extend SMO’s Cluster Model and Data Ingestion:
- Action (SMO): Add new fields to the src/models/cluster/cluster.py::Cluster model to store energy-related metadata, e.g.:
  - energy_source_type: Enum('renewable', 'grid_mix', 'fossil')
  - power_usage_effectiveness_pue: Float
  - current_carbon_intensity_gco2kwh: Float (if real-time data is available)
  - is_preferred_for_energy_saving: Boolean
- Action (SMO): Provide an API endpoint (e.g., PUT /clusters/{cluster_name}/energy_profile) or a mechanism for an external system (like Hop3/H3NI) to populate these fields in SMO’s database.
- Rationale: Makes energy characteristics of clusters available to H3NI’s placement strategies.
Ensure Energy Data is Passed to Placement Strategy:
- Action (SMO): Modify graph_service.py so that when a PlacementStrategy is invoked, the full Cluster object (or at least its relevant energy profile data) is passed as part of the available_clusters input.
- Rationale: H3NI’s custom EnergyAwarePlacementStrategy plugin will need this data to make its decisions.

III. Supporting H3NI’s Advanced Scaling & Workflow Management¶

H3NI Goals: “Predictive scaling using machine learning models,” “compute offloading, live migration,” “AI/ML workflow management.”
H3NI KPIs: “Elasticity response time of 5 seconds,” “Proactive resource allocation 90% accuracy.”

Externalize and Parameterize Scaling Model Coefficients (alpha, beta):
- Action (SMO):
  - Remove hardcoded ALPHA, BETA, MAXIMUM_REPLICAS dictionaries from graph_service.py.
  - Modify the src/models/hdag/service.py::Service model to include optional fields for these parameters (e.g., scaling_alpha: Float, scaling_beta: Float, max_replicas_override: Integer).
  - Allow these to be specified in the HDAG service descriptor. If not provided, SMO could fall back to defaults or a simpler scaling approach.
- Rationale: Allows H3NI/Hop3 to provide fine-tuned or learned performance model parameters for each service, crucial for accurate reactive and predictive scaling.
Flexible Input for Scaling Strategy (for Predictive Scaling):
- Action (SMO): The ScalingStrategy interface’s decide_replicas method should accept a performance_metrics argument that is flexible enough to include not just current rates but also predicted future rates.
- Action (H3NI): H3NI’s predictive scaling plugin would then be responsible for generating these predicted rates (e.g., from its ML models) and passing them.
- Rationale: Directly supports H3NI’s predictive scaling goal.
Extend HDAG Intent Model for New Actions (Offloading, Migration):
- Action (SMO):
  - Design a mechanism to extend the HDAG descriptor schema to include new “action intents” or “operational policies” (e.g., offload_policy: { trigger_metric: 'cost', threshold: 'X', target_characteristic: 'low_cost_cluster' }).
  - SMO’s HDAG parsing logic would need to recognize these new intent types.
- Action (H3NI): H3NI would define these new intents and implement corresponding logic within its custom PlacementStrategy or new MigrationStrategy plugins.
- Rationale: Enables H3NI to express triggers and goals for compute offloading and live migration in a way that SMO can understand and pass to the relevant H3NI strategy plugin.
API Endpoint for Triggering Specific H3NI Strategies:
- Action (SMO): Consider a new API endpoint, e.g., POST /graphs/{graph_name}/execute_strategy/{strategy_name}, which could allow Hop3/H3NI to explicitly invoke a custom strategy (like a “compute offload” strategy) with specific parameters.
- Rationale: Provides a direct control path for H3NI to initiate complex operations beyond standard placement/scaling.

IV. Improving Interoperability & Usability for H3NI¶

H3NI Goals: “Establish REST APIs and Python-based plugins for seamless integration,” “Build user-friendly interfaces for managing complex orchestration features,” “Ensure backward compatibility with Hop3’s deployment workflows.”

Well-Documented and Stable SMO APIs:
- Action (SMO): Continue maintaining and improving Swagger/OpenAPI documentation. Ensure API contracts are stable or versioned carefully.
- Rationale: Crucial for H3NI to build reliable client integrations.
Parameterization of Optimization Weights:
- Action (SMO): Allow the weights used in SMO’s default CVXPY objective functions (e.g., w_dep, w_re in placement; w_util, w_trans in scaling) to be configurable, perhaps via global SMO settings or even overridden per HDAG deployment request.
- Rationale: Gives H3NI/Hop3 some control over the behavior of SMO’s default optimizers if H3NI chooses to use them but wants to tune their priorities (e.g., more aggressively favor resource utilization over stability).
Enhanced Status Reporting from SMO:
- Action (SMO): Ensure SMO’s API responses for graph/service status include detailed information about:
  - The currently active placement/scaling strategy.
  - The latest optimization results/decisions.
  - Reasons if an optimization failed.
- Rationale: Allows Hop3 to provide richer feedback to its users about what SMO is doing.

V. General SMO Improvements Supporting H3NI Indirectly¶

Robust Error Handling & Retry Mechanisms (as detailed in “SMO - Potential Improvements”):
- Action (SMO): Implement suggestions from section II of “SMO - Potential Improvements.”
- Rationale: A more resilient SMO provides a more stable platform for H3NI to build upon. Failures in SMO’s interactions with Karmada/Prometheus should be handled gracefully and reported clearly to H3NI if it initiated the action.
Optional API Authentication in SMO:
- Action (SMO): Implement suggestion I.1 from “SMO - Potential Improvements.”
- Rationale: Essential for secure programmatic interaction between Hop3 and SMO in any non-trivial deployment.

Implementation Order & Priority for H3NI¶

From H3NI’s perspective, the highest priority SMO enhancements would likely be:

Pluggable Placement & Scaling Strategy Interfaces (I.1, I.2): This is the gateway for H3NI to inject most of its core differentiating logic.
Mechanism to Pass Custom Data to Strategies (II.1, II.2, III.2): H3NI’s strategies will need access to energy profiles, predicted metrics, etc.
Externalize/Parameterize Scaling Model Coefficients (III.1): Crucial for effective scaling of diverse Hop3 applications.
Optional API Authentication (V.2): For secure integration.

References / See also:

Page last modified: 2025-05-08 18:24:45