SMO Code Walkthrough¶
This note provides an in-depth walkthrough of the Synergetic Meta-Orchestrator (SMO) Python codebase. We’ll explore its architecture, the nuances of its core functionalities, and the specifics of its interaction with the cloud-native ecosystem, building upon the initial overview.
I. Core Technologies & Architecture: Design Choices and Rationale¶
The SMO’s effectiveness stems from its thoughtful integration of various technologies:
- Flask (Web Framework): Provides a lightweight and flexible foundation for building the REST API. Its simplicity is well-suited for microservice-style applications like SMO.
- Flasgger (API Documentation): Automates the generation of Swagger/OpenAPI specifications from code comments and route definitions (
@swag_from
decorators inroutes/
files). This makes the API discoverable and usable by clients and developers. - PostgreSQL & SQLAlchemy (Database & ORM):
- PostgreSQL offers robust relational database capabilities, including support for JSONB data types, which are heavily used by SMO (e.g., for
graph_descriptor
,placement
,values_overwrite
). JSONB allows for efficient querying and storage of semi-structured data. - SQLAlchemy abstracts database interactions, allowing developers to work with Python objects instead of raw SQL, simplifying data persistence and schema management (defined in
models/
).
- PostgreSQL offers robust relational database capabilities, including support for JSONB data types, which are heavily used by SMO (e.g., for
python-dotenv
(Configuration): Facilitates environment-agnostic configuration by loading key-value pairs from a.env
file. This is standard practice for separating configuration from code, as seen inconfig.py
where database URLs, kubeconfig paths, etc., are sourced from environment variables.- Kubernetes Python Client (K8s Interaction): This official client library is the bridge to the Kubernetes world. SMO uses it to:
- Query Karmada’s custom resources (e.g.,
clusters.karmada.io
inKarmadaHelper
) for cluster status and resource information. - Manage Kubernetes native resources like Deployments (e.g., scaling them in
KarmadaHelper
) through Karmada. - Interact with Submariner custom resources (
clusters.submariner.io
inSubmarinerHelper
) to get inter-cluster networking details. - Manage
PrometheusRule
custom resources (inPrometheusHelper
) for dynamic alert configuration.
- Query Karmada’s custom resources (e.g.,
- CVXPY (Optimization Engine): This is a cornerstone of SMO’s intelligence.
- It allows SMO to formally define its placement and scaling problems as mathematical optimization problems (specifically Mixed-Integer Linear Programs, or MILPs, given the boolean and integer variables).
- This enables SMO to find provably optimal (within the model’s constraints) solutions rather than just relying on heuristics, leading to potentially better resource utilization and performance. This is evident in
utils/placement.decide_placement
andutils/scaling.decide_replicas
.
requests
(HTTP Communication): A standard library for making HTTP calls to external APIs such as Prometheus (for metrics and reload), Grafana (to publish dashboards), and the NFVCL API (for cluster lifecycle management).threading
(Concurrency): Used ingraph_service.spawn_scaling_processes
to run thescaling_loop
for each managed cluster in a separate background thread. This allows SMO to perform continuous, non-blocking scaling operations for multiple HDAGs concurrently.subprocess
for CLI Tools (Helm &hdarctl
):- SMO directly invokes the
helm
CLI (e.g., ingraph_service.helm_install_artifact
) to leverage Helm’s mature packaging and deployment capabilities for Kubernetes applications. This avoids reimplementing Helm’s logic within SMO. - Similarly,
hdarctl
is used viasubprocess
(e.g., ingraph_service.get_descriptor_from_artifact
) to handle OCI artifact pulling and unpacking. - Implication: This creates a dependency on these CLI tools being present in the SMO container’s
PATH
and correctly configured. Error handling for these subprocess calls (as seen inerrors/error_handlers.py
) becomes important.
- SMO directly invokes the
II. Key Code Modules & Functionality: Detailed Exploration¶
1. Application Setup & Configuration (app.py
, config.py
)¶
create_app()
inapp.py
:- The call to
fetch_clusters(...)
during application startup is a proactive measure. It ensures that SMO has an up-to-date view of the available clusters (from Karmada and Submariner) in its own database before it starts processing deployment requests. This allows for quicker initial placement decisions and a consistent internal state. - Registration of blueprints (
cluster
,graph
,os_k8s
,vim
) modularizes the API structure, making it easier to manage and extend.
- The call to
config.py
:- The
KARMADA_KUBECONFIG
andSUBMARINER_KUBECONFIG
variables (defaulting to/home/python/.kube/...
) highlight the expectation that SMO runs in an environment (likely a container) where these kubeconfig files are accessible, typically via volume mounts. INSECURE_REGISTRY
: Essential for development/testing with local OCI registries that don’t use HTTPS.SCALING_ENABLED
andSCALING_INTERVAL
: Provide runtime control over the automated scaling feature, allowing it to be disabled or its frequency adjusted.
- The
2. Database Models (models/
): Persistence Layer¶
The SMO’s relational database schema is fundamental to its operation:
Cluster
(models/cluster/cluster.py
):- Attributes like
available_cpu
,available_ram
,availability
, andacceleration
are crucial inputs for the placement and scaling algorithms. - Storing
pod_cidr
andservice_cidr
(from Submariner) is vital for understanding the network topology. - The
grafana
field directly links a cluster to its dynamically generated monitoring dashboard.
- Attributes like
Graph
(models/hdag/graph.py
):graph_descriptor
(JSONB): Stores the entire user-provided HDAG definition. This allows SMO to refer back to the original intent and structure.placement
(JSONB): Persists the current calculated placement of services onto clusters. This is used bydecide_placement
as thecurrent_placement
(y
variable) to minimize re-optimization costs.services = db.relationship('Service', back_populates='graph', cascade='all,delete')
: This SQLAlchemy relationship withcascade='all,delete'
means that when aGraph
record is deleted, all its associatedService
records are also automatically deleted from the database, ensuring data integrity.
Service
(models/hdag/service.py
):cluster_affinity
: The primary cluster a service is assigned to.artifact_ref
,artifact_type
,artifact_implementer
: Store OCI artifact details.values_overwrite
(JSONB): Stores the Helm values overrides, including dynamically injected placement information likeclustersAffinity
andserviceImportClusters
. This allows for fine-grained configuration of each service instance.alert
(JSONB): Contains the Prometheus alert rule definition that triggers the conditional deployment of this service.
- NFVCL Models (
models/nfvcl/*.py
):BM_K8S_cluster.smo_id
andOS_K8S_cluster.smo_id
: The use of@event.listens_for(..., 'before_insert')
to auto-generate unique IDs (e.g.,f"OS_K8S_{target.pop_area}_{random_string}"
) is a clean way to ensure primary key uniqueness with a meaningful prefix.- These models enable SMO to track clusters provisioned via NFVCL, linking SMO’s internal representation to the NFVCL’s blueprint IDs (
nfvcl_id
).
3. API Endpoints (routes/
): Interacting with the SMO¶
POST /project/<project>/graphs
(routes/hdag/graph.py
):- This deployment endpoint showcases flexibility. If
request_data
contains anartifact
key,get_descriptor_from_artifact
is called, which in turn useshdarctl pull --untar
to fetch and parse the descriptor from an OCI artifact. - Otherwise, it expects the request body itself to be the HDAG descriptor (after
yaml.safe_load
). This dual input mechanism caters to different deployment workflows.
- This deployment endpoint showcases flexibility. If
GET /graphs/<name>/placement
(routes/hdag/graph.py
):- Allows a user or an automated system to explicitly request SMO to re-evaluate and potentially change the placement of an already deployed graph. This is useful if cluster conditions change (e.g., new cluster added, existing cluster resources dwindle) or if the optimization objective itself needs to be reconsidered.
POST /alerts
(routes/hdag/graph.py
):- This endpoint is designed to be a webhook target for Prometheus Alertmanager. When an alert (previously configured by SMO) fires, Alertmanager sends a notification here.
- The
deploy_conditional_service
function then extracts theservice
label from the alert to identify which SMO-managed service needs to be deployed.
4. Business Logic (services/
): The Engine Room¶
services/hdag/graph_service.py
:deploy_graph
Sequence:- Persist initial
Graph
state. - Initial Placement:
calculate_naive_placement
is used. This is a simpler, greedy algorithm, likely chosen for speed during initial deployment before more detailed metrics are available for the CVXPY optimizer. create_service_imports
: This step is vital for multi-cluster networking. It analyzes theconnectionPoints
in the HDAG descriptor to determine which services need to communicate. TheserviceImportClusters
list generated here is then passed to Helm, likely configuring Submariner ServiceExports or similar mechanisms.- For each service:
- Conditional deployment: If triggered by an event,
create_alert
prepares a Prometheus rule, andPrometheusHelper.update_alert_rules
injects it into thePrometheusRule
CR. - Grafana dashboards are programmatically generated for the service.
- Helm values are prepared, crucially including
clustersAffinity
(from placement) andserviceImportClusters
. helm_install_artifact
deploys the service if not conditional.
- Conditional deployment: If triggered by an event,
- A Grafana dashboard for the whole graph is created.
- If
SCALING_ENABLED
, background scaling threads are spawned.
- Persist initial
trigger_placement
Flow:- Stops any active scaling threads for the graph to avoid conflicts.
- Fetches current replica counts from Karmada (
KarmadaHelper.get_replicas
). - Fetches current cluster capacities from SMO’s database.
- Calls
decide_placement
(the CVXPY optimization) using current state (replicas, capacities, existing placement) as input. The objective function here balances minimizing the number of active clusters/deployments (w_dep * cp.sum(x)
) with minimizing changes from the current state (w_re * cp.sum(cp.multiply(y, (y - x)))
). - If placement changes for any service, its Helm
values_overwrite
is updated, andhelm_install_artifact
is called with theupgrade
command. - Restarts scaling threads with the new placement context.
scaling_loop
anddecide_replicas
(inutils/scaling.py
):- The
y = ax + b
model (alpha[s] * r_current[s] + beta[s] >= request_rates[s]
) is a linear approximation of a service’s capacity (requests it can handle) based on the number of replicas.alpha
(slope) andbeta
(intercept) are service-specific and currently hardcoded (e.g.,ALPHA = {'image-compression-vo': 33.33,...}
). This implies a need for prior performance profiling of services to determine these coefficients. - The dependency of
image-compression-vo
‘s request rate onnoise-reduction
‘s rate suggests a processing pipeline where the output of one feeds the input of another. - If
decide_replicas
(CVXPY optimization) fails to find an optimal solution (e.g., due to conflicting constraints like insufficient total capacity for the demand), it returnsNone
. In this scenario,scaling_loop
triggers a full graph re-placement (requests.get(f'http://localhost:8000/graphs/{graph_name}/placement')
), hoping that changing cluster assignments might resolve the scaling infeasibility. - The objective function for
decide_replicas
minimizes a weighted sum of CPU utilization (w_util
) and the number of replica changes (w_trans
), aiming for both efficiency and stability.
- The
- Conditional Deployment (
deploy_conditional_service
): When the/alerts
endpoint receives a POST from Alertmanager, this function is called. It identifies theservice
name from the alert’s labels and, if the service exists and is in a pending state, it callshelm_install_artifact
to deploy it.
services/cluster/cluster_service.py
(fetch_clusters
):- SMO maintains its own view of cluster resources in its database. This is beneficial because:
- It can enrich Karmada/Submariner data with SMO-specific information (like Grafana dashboard URLs).
- It provides a snapshot for SMO’s algorithms, potentially reducing direct API calls to Karmada during every decision-making process.
- It allows SMO to function even if there are temporary connectivity issues with the Karmada control plane, operating on its last known state.
- SMO maintains its own view of cluster resources in its database. This is beneficial because:
services/nfvcl/*.py
: These services demonstrate SMO acting as a client to another orchestration system (NFVCL). They translate SMO’s internal requests (e.g., “create an OS K8s cluster”) into the specific API calls expected by theNFVCL_BASE_URL
.
5. Utilities (utils/
): The Building Blocks¶
placement.py
&scaling.py
(CVXPY Optimization):- The objective functions are key:
decide_placement
:w_dep * cp.sum(x) + w_re * cp.sum(cp.multiply(y, (y - x)))
.cp.sum(x)
can be interpreted as a proxy for the number of active service-to-cluster assignments (deployment cost).cp.sum(cp.multiply(y, (y - x)))
is a bit more complex; ify
(current placement) is 1 andx
(new placement) becomes 0 for a service-cluster pair, it adds to the cost, penalizing removal. If both are 1, it’s zero. This aims to find a new placement that is “good” while not deviating too much from the current one if possible.decide_replicas
:w_util * cp.sum(...) + w_trans * cp.sum(...)
. This clearly balances minimizing resource usage (proportional to replicas * CPU limits) against minimizing the churn in replica counts.
- These are Mixed-Integer Programs because
x
(placement) is boolean, andr_current
(replicas) is integer. Solvers like GLPK_MI are designed for these.
- The objective functions are key:
karmada_helper.py
:- Leverages Karmada’s
Cluster
CRD (cluster.karmada.io/v1alpha1
) to get an aggregated view of member cluster resources (resourceSummary
). - Uses standard Kubernetes AppsV1API calls (but directed at Karmada) to manage
Deployment
scales. This shows how Karmada provides a unified API front-end for multi-cluster resources.
- Leverages Karmada’s
prometheus_helper.py
:- The direct modification of
PrometheusRule
CRs (e.g.,kube-prometheus-stack-0
) is a powerful integration. It means SMO can dynamically create and remove alerts that are native to the Prometheus ecosystem. This requires appropriate RBAC permissions for the SMO service account in themonitoring
namespace. The subsequent HTTP POST to/-/reload
ensures Prometheus picks up these changes.
- The direct modification of
grafana_helper.py
&grafana_template.py
:- The level of detail in
grafana_template.py
(defining panels, targets with specific PromQL queries, templating variables for dashboards) shows a commitment to providing rich, out-of-the-box observability for applications and clusters managed by SMO. This significantly enhances user experience.
- The level of detail in
intent_translation.py
: The mappings (e.g.,CPU_MAPPING = {'light': 0.5, ...}
) are a simple but effective first step in making resource requests more abstract and user-friendly.
III. Key Insights from the Code¶
- Optimization-Driven Decisions: The use of CVXPY for core placement and scaling logic is central. The SMO isn’t just applying rules; it’s finding mathematically “best” solutions according to its defined cost models and constraints. This allows it to adapt to changing conditions (workload, cluster availability) more intelligently.
- Deep Observability Integration as a First-Class Citizen: The SMO’s proactive creation of Prometheus alert rules and highly customized Grafana dashboards is not an afterthought. It’s woven into the deployment process, ensuring that users immediately get visibility into their HDAGs and the underlying infrastructure.
- Karmada as the Multi-Cluster Abstraction Layer: The SMO offloads the complexity of multi-cluster resource distribution and management to Karmada. SMO focuses on the strategic decisions (what goes where, how many replicas), and Karmada handles the tactical execution on member clusters.
- Stateful Orchestration for Advanced Logic: The SMO needs its own database because:
- It stores the original HDAG intent (
graph_descriptor
), which might not be directly represented in Karmada/Kubernetes. - It tracks its own calculated
placement
to inform re-optimization. - It stores service-specific configurations like
alert
rules andvalues_overwrite
that are SMO-level concerns. - It maintains links to Grafana dashboards and NFVCL-provisioned resources.
This internal state allows the SMO to perform more complex lifecycle operations and maintain context beyond what a purely stateless orchestrator could.
- It stores the original HDAG intent (
- Explicit Intent Translation Workflow: The path from an abstract HDAG descriptor to a running application involves several translation steps:
- Parsing the descriptor (from OCI or direct input).
- Translating abstract resource terms (
light
,small
) to concrete values (0.5
CPU,500MiB
memory). - Initial placement decision.
- Calculating service import needs for cross-cluster communication.
- Generating Helm
values_overwrite
with placement and import details. - Ongoing scaling decisions based on real-time metrics and capacity models.
- Reactive Capabilities via Conditional Deployment: The Prometheus alert integration allows SMO to react to events in the environment by deploying specific services, enabling event-driven architectures.
- Handling External Dependencies: SMO’s reliance on
helm
andhdarctl
CLIs means their availability and version compatibility are important operational considerations. The use ofsubprocess
also requires careful error handling (e.g., capturingstderr
on failure). - Potential for Advanced Scaling Models: While the current scaling model (
y = ax + b
) is linear and uses hardcoded coefficients, the framework (CVXPY, Prometheus metrics) could support more complex, non-linear, or even machine-learning-derived scaling models in the future if theALPHA
andBETA
parameters could be learned or dynamically adjusted.
Page last modified: 2025-05-08 11:32:02