SMO↔︎Hop3 Improvement Plan¶
To effectively support the H3NI project’s goals of integrating Hop3 and introducing advanced orchestration capabilities (dynamic scaling, energy optimization, compute offloading, AI/ML workflow management), the following specific enhancements and additions to the SMO codebase need to be considered.
I. Core Requirement: Pluggable Optimization Strategies¶
This is the most fundamental set of changes needed to allow H3NI to introduce its specialized logic.
-
Define Strategy Interfaces for Placement & Scaling:
- Action (SMO): In
src/utils/
, create Python Abstract Base Classes (ABCs) or formal interfaces for:PlacementStrategy
: Would define methods likecalculate_initial_placement(hdag_descriptor, available_clusters, service_requirements)
anddecide_re_placement(current_hdag_state, available_clusters, service_requirements, current_placement_matrix)
.ScalingStrategy
: Would define a method likedecide_replicas(current_hdag_service_state, cluster_state, performance_metrics, service_profiles)
.
- Rationale: Provides a contract for H3NI (and others) to implement custom placement and scaling algorithms.
- Action (SMO): In
-
Refactor
graph_service.py
to Use Strategy Plugins:- Action (SMO):
- Modify
deploy_graph
to dynamically load or select aPlacementStrategy
(e.g., based on a new field in the HDAG descriptor likeplacementStrategyName: "h3ni_energy_aware"
or a global SMO config). The currentcalculate_naive_placement
anddecide_placement
would become default implementations of this interface. - Modify
spawn_scaling_processes
andscaling_loop
to dynamically load or select aScalingStrategy
. The currentdecide_replicas
logic would become a default implementation.
- Modify
- Rationale: Decouples SMO’s core workflow from specific optimization implementations, allowing H3NI to inject its logic.
- Action (SMO):
II. Supporting H3NI’s Energy Optimization Goal¶
- H3NI KPI: “Achieve at least 15% energy efficiency improvement.”
-
Extend SMO’s
Cluster
Model and Data Ingestion:- Action (SMO): Add new fields to the
src/models/cluster/cluster.py::Cluster
model to store energy-related metadata, e.g.:energy_source_type: Enum('renewable', 'grid_mix', 'fossil')
power_usage_effectiveness_pue: Float
current_carbon_intensity_gco2kwh: Float
(if real-time data is available)is_preferred_for_energy_saving: Boolean
- Action (SMO): Provide an API endpoint (e.g.,
PUT /clusters/{cluster_name}/energy_profile
) or a mechanism for an external system (like Hop3/H3NI) to populate these fields in SMO’s database. - Rationale: Makes energy characteristics of clusters available to H3NI’s placement strategies.
- Action (SMO): Add new fields to the
-
Ensure Energy Data is Passed to Placement Strategy:
- Action (SMO): Modify
graph_service.py
so that when aPlacementStrategy
is invoked, the fullCluster
object (or at least its relevant energy profile data) is passed as part of theavailable_clusters
input. - Rationale: H3NI’s custom
EnergyAwarePlacementStrategy
plugin will need this data to make its decisions.
- Action (SMO): Modify
III. Supporting H3NI’s Advanced Scaling & Workflow Management¶
- H3NI Goals: “Predictive scaling using machine learning models,” “compute offloading, live migration,” “AI/ML workflow management.”
- H3NI KPIs: “Elasticity response time of 5 seconds,” “Proactive resource allocation 90% accuracy.”
-
Externalize and Parameterize Scaling Model Coefficients (
alpha
,beta
):- Action (SMO):
- Remove hardcoded
ALPHA
,BETA
,MAXIMUM_REPLICAS
dictionaries fromgraph_service.py
. - Modify the
src/models/hdag/service.py::Service
model to include optional fields for these parameters (e.g.,scaling_alpha: Float
,scaling_beta: Float
,max_replicas_override: Integer
). - Allow these to be specified in the HDAG service descriptor. If not provided, SMO could fall back to defaults or a simpler scaling approach.
- Remove hardcoded
- Rationale: Allows H3NI/Hop3 to provide fine-tuned or learned performance model parameters for each service, crucial for accurate reactive and predictive scaling.
- Action (SMO):
-
Flexible Input for Scaling Strategy (for Predictive Scaling):
- Action (SMO): The
ScalingStrategy
interface’sdecide_replicas
method should accept aperformance_metrics
argument that is flexible enough to include not just current rates but also predicted future rates. - Action (H3NI): H3NI’s predictive scaling plugin would then be responsible for generating these predicted rates (e.g., from its ML models) and passing them.
- Rationale: Directly supports H3NI’s predictive scaling goal.
- Action (SMO): The
-
Extend HDAG Intent Model for New Actions (Offloading, Migration):
- Action (SMO):
- Design a mechanism to extend the HDAG descriptor schema to include new “action intents” or “operational policies” (e.g.,
offload_policy: { trigger_metric: 'cost', threshold: 'X', target_characteristic: 'low_cost_cluster' }
). - SMO’s HDAG parsing logic would need to recognize these new intent types.
- Design a mechanism to extend the HDAG descriptor schema to include new “action intents” or “operational policies” (e.g.,
- Action (H3NI): H3NI would define these new intents and implement corresponding logic within its custom
PlacementStrategy
or newMigrationStrategy
plugins. - Rationale: Enables H3NI to express triggers and goals for compute offloading and live migration in a way that SMO can understand and pass to the relevant H3NI strategy plugin.
- Action (SMO):
-
API Endpoint for Triggering Specific H3NI Strategies:
- Action (SMO): Consider a new API endpoint, e.g.,
POST /graphs/{graph_name}/execute_strategy/{strategy_name}
, which could allow Hop3/H3NI to explicitly invoke a custom strategy (like a “compute offload” strategy) with specific parameters. - Rationale: Provides a direct control path for H3NI to initiate complex operations beyond standard placement/scaling.
- Action (SMO): Consider a new API endpoint, e.g.,
IV. Improving Interoperability & Usability for H3NI¶
- H3NI Goals: “Establish REST APIs and Python-based plugins for seamless integration,” “Build user-friendly interfaces for managing complex orchestration features,” “Ensure backward compatibility with Hop3’s deployment workflows.”
-
Well-Documented and Stable SMO APIs:
- Action (SMO): Continue maintaining and improving Swagger/OpenAPI documentation. Ensure API contracts are stable or versioned carefully.
- Rationale: Crucial for H3NI to build reliable client integrations.
-
Parameterization of Optimization Weights:
- Action (SMO): Allow the weights used in SMO’s default CVXPY objective functions (e.g.,
w_dep
,w_re
in placement;w_util
,w_trans
in scaling) to be configurable, perhaps via global SMO settings or even overridden per HDAG deployment request. - Rationale: Gives H3NI/Hop3 some control over the behavior of SMO’s default optimizers if H3NI chooses to use them but wants to tune their priorities (e.g., more aggressively favor resource utilization over stability).
- Action (SMO): Allow the weights used in SMO’s default CVXPY objective functions (e.g.,
-
Enhanced Status Reporting from SMO:
- Action (SMO): Ensure SMO’s API responses for graph/service status include detailed information about:
- The currently active placement/scaling strategy.
- The latest optimization results/decisions.
- Reasons if an optimization failed.
- Rationale: Allows Hop3 to provide richer feedback to its users about what SMO is doing.
- Action (SMO): Ensure SMO’s API responses for graph/service status include detailed information about:
V. General SMO Improvements Supporting H3NI Indirectly¶
- Robust Error Handling & Retry Mechanisms (as detailed in “SMO - Potential Improvements”):
- Action (SMO): Implement suggestions from section II of “SMO - Potential Improvements.”
- Rationale: A more resilient SMO provides a more stable platform for H3NI to build upon. Failures in SMO’s interactions with Karmada/Prometheus should be handled gracefully and reported clearly to H3NI if it initiated the action.
- Optional API Authentication in SMO:
- Action (SMO): Implement suggestion I.1 from “SMO - Potential Improvements.”
- Rationale: Essential for secure programmatic interaction between Hop3 and SMO in any non-trivial deployment.
Implementation Order & Priority for H3NI¶
From H3NI’s perspective, the highest priority SMO enhancements would likely be:
- Pluggable Placement & Scaling Strategy Interfaces (I.1, I.2): This is the gateway for H3NI to inject most of its core differentiating logic.
- Mechanism to Pass Custom Data to Strategies (II.1, II.2, III.2): H3NI’s strategies will need access to energy profiles, predicted metrics, etc.
- Externalize/Parameterize Scaling Model Coefficients (III.1): Crucial for effective scaling of diverse Hop3 applications.
- Optional API Authentication (V.2): For secure integration.
References / See also:
Page last modified: 2025-05-08 11:32:02