Portfolio Balancing
Allocate investment across stocks to minimize risk while achieving a target return.
Allocate investment across stocks to minimize risk while achieving a target return.
Multi-reasoner template: rules-based compliance, covariance clustering, and bi-objective Markowitz optimization that uses solver shadow prices to trace the risk-return frontier, with a crisis-regime stress test.
Browse files
Browse files
What this template is for
Investors and portfolio managers often need to allocate capital across multiple assets while balancing expected return against risk. This template implements a classic Markowitz mean-variance model that chooses non-negative allocations to minimize portfolio variance subject to a minimum expected return target.
This template uses RelationalAI’s prescriptive reasoning (optimization) capabilities to compute an optimal allocation under constraints, and to run a small scenario analysis that illustrates the risk/return trade-off.
Prescriptive reasoning helps you:
- Quantify trade-offs between return targets and risk.
- Enforce constraints like budgets and no-short-selling.
- Explore scenarios by varying the minimum expected return.
Who this is for
- You want an end-to-end example of prescriptive reasoning (optimization) with quadratic objectives.
- You’re comfortable with basic Python and optimization concepts (risk/return, covariance).
What you’ll build
- A semantic model for stocks, expected returns, and pairwise covariance.
- A quadratic program that chooses non-negative allocations.
- A minimum return constraint and a variance-minimization objective.
- A scenario loop over different minimum return targets with a summary table.
What’s included
- Model + solve script:
portfolio_balancing.py - Sample data:
data/returns.csv,data/covariance.csv - Outputs: per-scenario solver status/objective, allocation table, and a scenario summary
Prerequisites
Access
- A Snowflake account that has the RAI Native App installed.
- A Snowflake user with permissions to access the RAI Native App.
Tools
- Python >= 3.10
Quickstart
Follow these steps to run the template with the included sample data.
-
Download the ZIP file for this template and extract it:
Terminal window curl -O https://private.relational.ai/templates/zips/v0.13/portfolio_balancing.zipunzip portfolio_balancing.zipcd portfolio_balancing -
Create and activate a virtual environment
Terminal window python -m venv .venvsource .venv/bin/activatepython -m pip install -U pip -
Install dependencies
Terminal window python -m pip install . -
Configure Snowflake connection and RAI profile
Terminal window rai init -
Run the template
Terminal window python portfolio_balancing.py -
Expected output
The script solves three scenarios for the minimum expected return target.
Running scenario: min_return = 10Status: OPTIMAL, Objective: ...Portfolio allocation:name value...==================================================Scenario Analysis Summary==================================================10: OPTIMAL, obj=...20: OPTIMAL, obj=...30: OPTIMAL, obj=...
Template structure
.├─ README.md├─ pyproject.toml├─ portfolio_balancing.py # main runner / entrypoint└─ data/ # sample input data ├─ returns.csv └─ covariance.csvStart here: portfolio_balancing.py
Sample data
Data files are in data/.
returns.csv
Defines one expected return value per stock.
| Column | Meaning |
|---|---|
index | Stock identifier |
returns | Expected return (decimal, e.g., 0.04 = 4%) |
covariance.csv
Defines pairwise covariance values between stock pairs.
| Column | Meaning |
|---|---|
i | First stock index |
j | Second stock index |
covar | Covariance between stocks i and j |
Model overview
The semantic model uses a single concept (Stock) and a pairwise covariance property (Stock.covar). The decision variable is a continuous allocation per stock.
Stock
Represents an investable asset.
| Property | Type | Identifying? | Notes |
|---|---|---|---|
index | int | Yes | Loaded from data/returns.csv |
returns | float | No | Expected return |
covar | float | No | Pairwise covariance with another Stock |
quantity | float | No | Decision variable (continuous, non-negative) |
How it works
This section walks through the highlights in portfolio_balancing.py.
Import libraries and configure inputs
First, the script imports the Semantics and optimization APIs, configures the data directory, and defines the key parameters:
from pathlib import Path
import pandasfrom pandas import read_csv
from relationalai.semantics import Float, Model, data, require, select, sum, wherefrom relationalai.semantics.reasoners.optimization import Solver, SolverModel
# --------------------------------------------------# Configure inputs# --------------------------------------------------
DATA_DIR = Path(__file__).parent / "data"
# Disable pandas inference of string types. This ensures that string columns# in the CSVs are loaded as object dtype. This is only required when using# relationalai versions prior to v1.0.pandas.options.future.infer_string = False
# Budget and minimum return parameters.BUDGET = 1000MIN_RETURN = 20Define concepts and load CSV data
Next, it creates a Model, defines the Stock concept, and loads both CSVs. The covariance values are defined by joining stock indices using where(...).define(...):
# --------------------------------------------------# Define semantic model & load data# --------------------------------------------------
# Create a Semantics model container.model = Model("portfolio", config=globals().get("config", None), use_lqp=False)
# Stock concept: available investments with expected returns.Stock = model.Concept("Stock")Stock.returns = model.Property("{Stock} has {returns:float}")
# Load expected return data from CSV.data(read_csv(DATA_DIR / "returns.csv")).into(Stock, keys=["index"])
# Stock.covar property: covariance matrix between stock pairs.Stock.covar = model.Property("{Stock} and {stock2:Stock} have {covar:float}")Stock2 = Stock.ref()
# Load covariance data from CSV.covar_csv = read_csv(DATA_DIR / "covariance.csv")pairs = data(covar_csv)where( Stock.index == pairs.i, Stock2.index == pairs.j).define( Stock.covar(Stock, Stock2, pairs.covar))Define decision variables, constraints, and objective
Then it creates a decision variable Stock.x_quantity and registers constraints and the quadratic variance objective inside build_formulation(...):
# --------------------------------------------------# Model the decision problem# --------------------------------------------------
# Stock.x_quantity decision variable: amount allocated to each stock.Stock.x_quantity = model.Property("{Stock} quantity is {x:float}")
c = Float.ref()
# Scenario parameter. This is updated inside the scenario loop.min_return = MIN_RETURN
# Budget is fixed across scenarios.budget = BUDGET
def build_formulation(s): """Register variables, constraints, and objective on the solver model.""" # Decision variable: quantity of each stock. s.solve_for(Stock.x_quantity, name=["qty", Stock.index])
# Constraint: no short selling. bounds = require(Stock.x_quantity >= 0) s.satisfy(bounds)
# Constraint: budget limit. budget_constraint = require(sum(Stock.x_quantity) <= budget) s.satisfy(budget_constraint)
# Constraint: minimum return target (scenario parameter). return_constraint = require(sum(Stock.returns * Stock.x_quantity) >= min_return) s.satisfy(return_constraint)
# Objective: minimize portfolio risk (variance) risk = sum(c * Stock.x_quantity * Stock2.quantity).where(Stock.covar(Stock2, c)) s.minimize(risk)Solve and print results
Finally, the script loops over multiple values of min_return, creates a fresh SolverModel for each scenario, and prints both the allocation and a summary:
# --------------------------------------------------# Solve with Scenario Analysis (Numeric Parameter)# --------------------------------------------------
SCENARIO_PARAM = "min_return"SCENARIO_VALUES = [10, 20, 30]
scenario_results = []
for scenario_value in SCENARIO_VALUES: print(f"\nRunning scenario: {SCENARIO_PARAM} = {scenario_value}")
# Set scenario parameter value. min_return = scenario_value
# Create a fresh SolverModel for each scenario. s = SolverModel(model, "cont") build_formulation(s)
solver = Solver("highs") s.solve(solver, time_limit_sec=60)
scenario_results.append({ "scenario": scenario_value, "status": str(s.termination_status), "objective": s.objective_value, }) print(f" Status: {s.termination_status}, Objective: {s.objective_value}")
# Print portfolio allocation from solver results. var_df = s.variable_values().to_df() qty_df = var_df[ var_df["name"].str.startswith("qty") & (var_df["float"] > 0.001) ].rename(columns={"float": "value"}) print(f"\n Portfolio allocation:") print(qty_df.to_string(index=False))
# --------------------------------------------------# Solve and check solution# --------------------------------------------------
# Print a scenario summary table.print("\n" + "=" * 50)print("Scenario Analysis Summary")print("=" * 50)for result in scenario_results: print(f" {result['scenario']}: {result['status']}, obj={result['objective']}")Troubleshooting
I get ModuleNotFoundError when running the script
- Confirm you created and activated the virtual environment from the Quickstart.
- Reinstall dependencies with
python -m pip install .. - Verify you are running
python portfolio_balancing.pyfrom theportfolio_balancing/folder.
The script fails while reading a CSV from data/
- Confirm
data/returns.csvanddata/covariance.csvexist. - Verify headers match the expected columns (
index,returns,i,j,covar). - Check for missing values and non-numeric entries in return/covariance columns.
I see an unexpected termination status (not OPTIMAL)
- Try re-running; if you hit a time limit, consider increasing
time_limit_sec. - If you changed scenario parameters, confirm the minimum return target is feasible given the budget.
What this template is for
Investors and portfolio managers often need to allocate capital across multiple assets while balancing expected return against risk. This template implements a classic Markowitz mean-variance model that chooses non-negative allocations to minimize portfolio variance subject to a minimum expected return target.
This template uses RelationalAI’s prescriptive reasoning (optimization) capabilities to compute an optimal allocation under constraints, and to run a small scenario analysis that illustrates the risk/return trade-off.
Prescriptive reasoning helps you:
- Quantify trade-offs between return targets and risk.
- Enforce constraints like budgets and no-short-selling.
- Explore scenarios by varying the minimum expected return.
Who this is for
- You want an end-to-end example of prescriptive reasoning (optimization) with quadratic objectives.
- You’re comfortable with basic Python and optimization concepts (risk/return, covariance).
What you’ll build
- A semantic model for stocks, expected returns, and pairwise covariance.
- A quadratic program that chooses non-negative allocations.
- A minimum return constraint and a variance-minimization objective.
- A scenario loop over different minimum return targets with a summary table.
What’s included
- Model + solve script:
portfolio_balancing.py - Sample data:
data/returns.csv,data/covariance.csv - Outputs: per-scenario solver status/objective, allocation table, and a scenario summary
Prerequisites
Access
- A Snowflake account that has the RAI Native App installed.
- A Snowflake user with permissions to access the RAI Native App.
Tools
- Python >= 3.10
Quickstart
Follow these steps to run the template with the included sample data.
-
Download the ZIP file for this template and extract it:
Terminal window curl -O https://private.relational.ai/templates/zips/v0.14/portfolio_balancing.zipunzip portfolio_balancing.zipcd portfolio_balancing -
Create and activate a virtual environment
Terminal window python -m venv .venvsource .venv/bin/activatepython -m pip install -U pip -
Install dependencies
Terminal window python -m pip install . -
Configure Snowflake connection and RAI profile
Terminal window rai init -
Run the template
Terminal window python portfolio_balancing.py -
Expected output
The script solves three scenarios for the minimum expected return target.
Running scenario: min_return = 10Status: OPTIMAL, Objective: ...Portfolio allocation:name value...==================================================Scenario Analysis Summary==================================================10: OPTIMAL, obj=...20: OPTIMAL, obj=...30: OPTIMAL, obj=...
Template structure
.├─ README.md├─ pyproject.toml├─ portfolio_balancing.py # main runner / entrypoint└─ data/ # sample input data ├─ returns.csv └─ covariance.csvStart here: portfolio_balancing.py
Sample data
Data files are in data/.
returns.csv
Defines one expected return value per stock.
| Column | Meaning |
|---|---|
index | Stock identifier |
returns | Expected return (decimal, e.g., 0.04 = 4%) |
covariance.csv
Defines pairwise covariance values between stock pairs.
| Column | Meaning |
|---|---|
i | First stock index |
j | Second stock index |
covar | Covariance between stocks i and j |
Model overview
The semantic model uses a single concept (Stock) and a pairwise covariance property (Stock.covar). The decision variable is a continuous allocation per stock.
Stock
Represents an investable asset.
| Property | Type | Identifying? | Notes |
|---|---|---|---|
index | int | Yes | Loaded from data/returns.csv |
returns | float | No | Expected return |
covar | float | No | Pairwise covariance with another Stock |
x_quantity | float | No | Decision variable (continuous, non-negative) |
How it works
This section walks through the highlights in portfolio_balancing.py.
Import libraries and configure inputs
First, the script imports the Semantics and optimization APIs, configures the data directory, and defines the key parameters:
from pathlib import Path
import pandasfrom pandas import read_csv
from relationalai.semantics import Float, Model, Relationship, data, require, select, sum, wherefrom relationalai.semantics.reasoners.optimization import Solver, SolverModel
# --------------------------------------------------# Configure inputs# --------------------------------------------------
DATA_DIR = Path(__file__).parent / "data"
# Disable pandas inference of string types. This ensures that string columns# in the CSVs are loaded as object dtype. This is only required when using# relationalai versions prior to v1.0.pandas.options.future.infer_string = False
# Budget and minimum return parameters.BUDGET = 1000MIN_RETURN = 20Define concepts and load CSV data
Next, it creates a Model, defines the Stock concept, and loads both CSVs. The covariance values are defined by joining stock indices using where(...).define(...):
# --------------------------------------------------# Define semantic model & load data# --------------------------------------------------
# Create a Semantics model container.model = Model("portfolio", config=globals().get("config", None))
# Stock concept: available investments with expected returns.Stock = model.Concept("Stock")Stock.returns = model.Property("{Stock} has {returns:float}")
# Load expected return data from CSV.data(read_csv(DATA_DIR / "returns.csv")).into(Stock, keys=["index"])
# Stock.covar property: covariance matrix between stock pairs.Stock.covar = model.Relationship("{Stock} and {stock2:Stock} have {covar:float}")OtherStock = Stock.ref()
# Load covariance data from CSV.covar_csv = read_csv(DATA_DIR / "covariance.csv")pairs = data(covar_csv)where( Stock.index == pairs.i, OtherStock.index == pairs.j).define( Stock.covar(Stock, OtherStock, pairs.covar))Define decision variables, constraints, and objective
Then it creates a decision variable Stock.x_quantity and registers constraints and the quadratic variance objective inside build_formulation(...):
# --------------------------------------------------# Model the decision problem# --------------------------------------------------
# Stock.x_quantity decision variable: amount allocated to each stock.Stock.x_quantity = model.Property("{Stock} quantity is {x:float}")
covar_val = Float.ref()
# Scenario parameter. This is updated inside the scenario loop.min_return = MIN_RETURN
# Budget is fixed across scenarios.budget = BUDGET
def build_formulation(s): """Register variables, constraints, and objective on the solver model.""" # Decision variable: quantity of each stock. s.solve_for(Stock.x_quantity, name=["qty", Stock.index])
# Constraint: no short selling. bounds = require(Stock.x_quantity >= 0) s.satisfy(bounds)
# Constraint: budget limit. budget_constraint = require(sum(Stock.x_quantity) <= budget) s.satisfy(budget_constraint)
# Constraint: minimum return target (scenario parameter). return_constraint = require(sum(Stock.returns * Stock.x_quantity) >= min_return) s.satisfy(return_constraint)
# Objective: minimize portfolio risk (variance) risk = sum(covar_val * Stock.x_quantity * OtherStock.x_quantity).where(Stock.covar(OtherStock, covar_val)) s.minimize(risk)Solve and print results
Finally, the script loops over multiple values of min_return, creates a fresh SolverModel for each scenario, and prints both the allocation and a summary:
# --------------------------------------------------# Solve with Scenario Analysis (Numeric Parameter)# --------------------------------------------------
SCENARIO_PARAM = "min_return"SCENARIO_VALUES = [10, 20, 30]
scenario_results = []
for scenario_value in SCENARIO_VALUES: print(f"\nRunning scenario: {SCENARIO_PARAM} = {scenario_value}")
# Set scenario parameter value. min_return = scenario_value
# Create a fresh SolverModel for each scenario. s = SolverModel(model, "cont") build_formulation(s)
solver = Solver("highs") s.solve(solver, time_limit_sec=60)
scenario_results.append({ "scenario": scenario_value, "status": str(s.termination_status), "objective": s.objective_value, }) print(f" Status: {s.termination_status}, Objective: {s.objective_value}")
# Print portfolio allocation from solver results. var_df = s.variable_values().to_df() qty_df = var_df[ var_df["name"].str.startswith("qty") & (var_df["value"] > 0.001) ] print(f"\n Portfolio allocation:") print(qty_df.to_string(index=False))
# --------------------------------------------------# Solve and check solution# --------------------------------------------------
# Print a scenario summary table.print("\n" + "=" * 50)print("Scenario Analysis Summary")print("=" * 50)for result in scenario_results: print(f" {result['scenario']}: {result['status']}, obj={result['objective']}")Troubleshooting
I get ModuleNotFoundError when running the script
- Confirm you created and activated the virtual environment from the Quickstart.
- Reinstall dependencies with
python -m pip install .. - Verify you are running
python portfolio_balancing.pyfrom theportfolio_balancing/folder.
The script fails while reading a CSV from data/
- Confirm
data/returns.csvanddata/covariance.csvexist. - Verify headers match the expected columns (
index,returns,i,j,covar). - Check for missing values and non-numeric entries in return/covariance columns.
I see an unexpected termination status (not OPTIMAL)
- Try re-running; if you hit a time limit, consider increasing
time_limit_sec. - If you changed scenario parameters, confirm the minimum return target is feasible given the budget.
What this template is for
Portfolio managers don’t want to pay twice for the same exposure — if two funds track nearly the same benchmark, owning both is one bet with worse bookkeeping. This template chains four reasoning stages on a single shared ontology to build compliant, risk-optimized portfolios across an 8-stock universe and stress-test them under a crisis regime.
It uses RelationalAI’s rules, graph, and prescriptive reasoners in a chained workflow:
- Rules scan the current book for compliance violations — overconcentrated holdings (> 15% of balance), sector concentration (> 30%), and high-risk traders — as derived Relationships.
- Graph builds a correlation graph from the covariance matrix, runs Louvain clustering, and picks the highest-Sharpe stock per cluster as the cluster’s representative. 8 stocks collapse to 5 distinct bets; near-duplicates are dropped from the investable universe rather than capped within it.
- Prescriptive optimization solves a bi-objective Markowitz QP on the representative-only universe under position and sector caps, using the solver’s shadow prices (constraint duals) to trace the efficient frontier efficiently across a
ScenarioConcept that combines three budgets and two regimes. - Crisis stress test is the same
solve_epsiloncall — no separate model — butScenario.regimepicks a PSD-preserving shrinkage covariance, so base and crisis frontiers come out of one pipeline.
Each stage writes derived properties the next reads directly: Rules define the thresholds Stage 3 enforces as constraints, Stage 2’s Stock.is_representative shapes the decision space, and the stress test reads Stock.regime_covar keyed by Scenario.regime. See “How it works” for the full data flow.
Why this problem matters
Portfolio managers don’t want to pay twice for the same exposure. If two funds track nearly the same benchmark, allocating
The four-stage approach addresses each gap. Stage 1 surfaces existing violations in the current book (diagnostic). Stage 2 clusters by return covariance and picks the highest-Sharpe representative per cluster, collapsing redundant bets. Stage 3 optimizes over the representative-only universe under position and sector limits. Stage 4 re-solves under a PSD-preserving crisis covariance to stress the resulting portfolio.
Key design patterns demonstrated
- Shared compliance thresholds —
SECTOR_LIMITis defined once and enforced in both stages.POSITION_LIMIT(Stage 1 per-stock compliance) andREP_POSITION_LIMIT(Stage 3 per-representative cap) are deliberately different: a representative carries its cluster’s combined exposure, so the construction-side cap is higher than the holdings-side compliance cap - Graph results feed optimization — Louvain cluster ids and per-cluster argmax (highest Sharpe) both persist on
Stock, and the optimizer’sStock.is_non_representative()constraint forces non-reps to zero (complement defined positively because the prescriptive rewriter doesn’t acceptmodel.not_()in a solver.where()) - Collapse, don’t cap — the graph stage reduces the investable universe to distinct bets rather than allowing all N stocks and capping within redundant groups
- Scenario Concept for parameter sweeps —
Scenarioentities combine budget (1,000, $2,000) and regime (base, crisis) so each epsilon solve handles all six combinations in one call - Shadow-price-guided frontier — each
solve(sensitivity=True)returns the return-constraint’s dual (shadow price), which IS the frontier’s local slope d(variance)/d(return). Three drivers (grid, adaptive, dichotomic) use that dual to place sample points; at equal solve budget the dual-guided drivers approximate the frontier far more tightly than blind even spacing - Epsilon constraint method —
solve_epsilon(eps_rate)minimizes variance subject to a return-target floor, producing one Pareto point per call without manually fixing return values - PSD-preserving stress covariance — correlation shrinkage toward all-ones keeps the QP convex at every point, unlike naive off-diagonal scaling
- Quadratic programming via HiGHS — the risk objective is quadratic (
x' * Cov * x); HiGHS solves the convex QP to a global optimum and, withsensitivity=True, returns the duals the frontier search relies on - Anchor solves establish feasible range — Anchor 1 (minimize risk) and Anchor 2 (maximize return) bracket the return range before the frontier search
Who this is for
- Quantitative analysts and portfolio managers exploring mean-variance optimization
- Data scientists learning quadratic programming with RelationalAI
- Finance students studying the Markowitz efficient frontier
- Anyone interested in risk-return trade-off analysis with scenario comparisons
What you’ll build
- A rules-based compliance pipeline using RAI derived properties and Relationships to flag overconcentrated holdings, sector concentration violations, and high-risk traders
- A correlation graph over stocks with Louvain community detection, plus per-cluster representative selection by highest Sharpe
- A quadratic programming model that minimizes portfolio variance subject to position and sector limits on a representative-only universe (non-reps forced to zero)
- Budget and no-short-selling constraints across multiple (budget, regime) scenarios
- Shadow-price-guided frontier tracing: three drivers (grid, adaptive, dichotomic) that use solver duals to sample the efficient frontier, compared head-to-head at equal solve budget
- Anchor solves to establish the feasible return range
- Pareto analysis with exact dual marginals (shadow prices) and knee detection
- A crisis-regime stress test using PSD-preserving correlation shrinkage to compare base vs crisis frontiers side-by-side
What’s included
portfolio_balancing.py— Main script with all four stages: rules-based compliance, covariance clustering (Louvain), bi-objective QP with shadow-price-guided frontier tracing, and crisis-regime stress testdata/returns.csv— Stock universe: index, ticker, sector, expected returns (8 stocks)data/covar.csv— Covariance matrix entries (i, j, covariance value)data/users.csv— User profiles with risk scoresdata/accounts.csv— Account balancesdata/holdings.csv— Current holdings per account and stockdata/transactions.csv— Transaction history with flagged-transaction indicatorspyproject.toml— Python package configuration with dependencies
Prerequisites
Access
- A Snowflake account that has the RAI Native App installed.
- A Snowflake user with permissions to access the RAI Native App.
Tools
- Python >= 3.10
- RelationalAI Python SDK (
relationalai) == 1.9.0
Quickstart
-
Download ZIP:
Terminal window curl -O https://docs.relational.ai/templates/zips/v1/portfolio_balancing.zipunzip portfolio_balancing.zipcd portfolio_balancing -
Create venv:
Terminal window python -m venv .venvsource .venv/bin/activatepython -m pip install --upgrade pip -
Install:
Terminal window python -m pip install . -
Configure:
Terminal window rai init -
Run:
Terminal window python portfolio_balancing.py -
Expected output (sample — full output covers all four stages):
======================================================================STAGE 1: COMPLIANCE ANALYSIS (rules)======================================================================--- Rule 1: Overconcentrated Holdings (position > 15% of balance) ---holding_id=1, ticker=AAPL, account_id=1, value=18000.00, balance=100000.00, pct=18.0%...--- Rule 2: Sector Concentration (sector > 30% of balance) ---account_id=1, sector=Technology, sector_value=34000.00, pct=34.0%...--- Rule 3: High Risk Traders (risk_score > 0.8 AND >5 flagged txns) ---user_id=1, name=Alice Chen, risk_score=0.85...======================================================================STAGE 2: GRAPH -- Covariance Clustering (Louvain)======================================================================Correlation graph: 4 edges (|correlation| >= 0.3)Louvain communities: 5 cluster(s)Cluster 1 (size 3): AAPL (Technology), MSFT (Technology), GOOGL (Technology)Cluster 2 (size 2): JNJ (Healthcare), PFE (Healthcare)Cluster 3 (size 1): XOM (Energy)Cluster 4 (size 1): PG (Consumer Staples)Cluster 5 (size 1): JPM (Financials)Avg correlation: intra-cluster = +0.683, inter-cluster = +0.131Cluster representatives (5 of 8 stocks, picked by highest Sharpe):Cluster 1: GOOGL (Technology) -- Sharpe = 0.605Cluster 2: PFE (Healthcare) -- Sharpe = 0.530Cluster 3: XOM (Energy) -- Sharpe = 0.588Cluster 4: PG (Consumer Staples) -- Sharpe = 0.444Cluster 5: JPM (Financials) -- Sharpe = 0.500======================================================================STAGE 3: BI-OBJECTIVE OPTIMIZATION(position + sector limits on representative universe; base & crisis regimes)======================================================================ANCHOR SOLVE 1: Minimize risk (no return constraint)Status: OPTIMALbase_500: return = 32.4336, risk = 1160.3926base_1000: return = 64.8673, risk = 4641.5705base_2000: return = 129.7346, risk = 18566.2819crisis_500: return = 31.6873, risk = 1913.5995crisis_1000: return = 63.3745, risk = 7654.3981crisis_2000: return = 126.7490, risk = 30617.5925ANCHOR SOLVE 2: Maximize return (swap objective)Status: OPTIMALbase_500: return = 42.0000base_1000: return = 84.0000base_2000: return = 168.0000crisis_500: return = 42.0000crisis_1000: return = 84.0000crisis_2000: return = 168.0000Reference scenario 'base_1000': frontier spans expected return [64.8673, 84.0000]======================================================================SENSITIVITY-GUIDED FRONTIER (reference 'base_1000', 6-solve budget per method)======================================================================running grid driver ...running adaptive driver ...running dichotomic driver ...Frontier approximation quality (same solve budget, lower gap = better):method solves max chord-gap--------------------------------------grid 6 557.9250adaptive 6 415.1730dichotomic 6 202.2972 <- tightestShadow price = frontier slope (exact dual vs finite-difference secant):(dual = extra variance incurred per unit of additional required return)return variance dual (lambda) secant------------------------------------------------------64.8673 4641.5705 0.00 --71.2734 5181.9733 134.83 84.3675.9396 5946.0980 192.68 163.7580.4605 6944.2401 250.64 220.7983.1779 7809.1748 650.79 318.2984.0000 8528.0000 1098.00 874.39======================================================================STAGE 4: CRISIS REGIME STRESS TEST(PSD-preserving correlation shrinkage, alpha = 0.7)======================================================================EFFICIENT FRONTIER: Risk vs Return (per scenario, exact dual marginals)base_1000 (budget=1000, regime=base):# Label Return Risk Marginal Knee--------------------------------------------------------1 min_risk 64.87 4641.5705 0.002 p1 71.27 5181.9733 134.83 <--3 p2 75.94 5946.0980 192.684 p3 80.46 6944.2401 250.645 p4 83.18 7809.1748 650.796 p5 84.00 8528.0000 1098.00(similar tables for base_500, base_2000, crisis_500, crisis_1000, crisis_2000)Volatility (sqrt risk) -- base vs crisis at each frontier point:Budget 1000:Label vol_base vol_crisis gap gap_%-------------------------------------------------------min_risk 68.1291 87.4894 +19.3603 +28.4%p1 71.9859 93.2657 +21.2798 +29.6%p2 77.1109 98.5270 +21.4161 +27.8%p3 83.3321 104.3961 +21.0640 +25.3%p4 88.3695 109.1925 +20.8230 +23.6%p5 92.3472 112.3478 +20.0006 +21.7%(similar tables for Budget 500 and Budget 2000, identical gap_% pattern)Crisis volatility sits ~22-30% above base at every frontier point and the gap peaks in the middle of the frontier (p1 at +29.6%), not at the concentrated end (p5 at +21.7%). That inversion is the payoff of the representative-only universe: at the concentrated end the optimizer is picking the highest-Sharpe distinct bet per cluster, which incidentally sits in sectors with lower crisis correlations (Energy, Consumer Staples). Without the representative collapse, the concentrated end would stack near-duplicates and see the crisis gap grow, not shrink.
Template structure
.├── README.md├── pyproject.toml├── portfolio_balancing.py└── data/ ├── returns.csv ├── covar.csv ├── users.csv ├── accounts.csv ├── holdings.csv └── transactions.csvHow it works
This section walks through the highlights in portfolio_balancing.py.
Reasoner overview
| Stage | Reasoner | Reads from ontology | Writes to ontology | Role |
|---|---|---|---|---|
| 1 | Rules | Holding, Account, User, Transaction, Stock | Holding.is_overconcentrated, Holding.is_sector_concentrated, User.is_high_risk_trader | 4 overconcentrated holdings (AAPL 18%, MSFT 16%, JNJ 16%, PFE 16.2%). 2 sector concentrations (Technology 34%, Healthcare 32.2%). 2 high-risk traders (Alice Chen 0.85, Eve Taylor 0.92). |
| 2 | Graph (Louvain) | Stock.covar (diagonal for variance), derived Stock.correlation filtered at threshold 0.3 | Stock.variance, Stock.volatility, Stock.correlation, Stock.cluster, Stock.sharpe, Stock.cluster_max_sharpe, Stock.is_representative | 4 edges retained after thresholding. Louvain yields 5 clusters; 5 representatives picked by highest Sharpe (one per cluster). Collapses 8 stocks to 5 distinct bets. |
| 3 | Prescriptive (QP) | Stock.returns, Stock.regime_covar, Stock.is_representative, Scenario.budget, Scenario.regime | Stock.x_quantity indexed by Scenario (non-reps forced to 0) | Min-risk and max-return anchors bracket the frontier. solve(sensitivity=True) returns the constraint dual (shadow price) at each point; three drivers (grid/adaptive/dichotomic) use it to place 6 samples, dichotomic giving the tightest approximation (max chord-gap 202 vs grid 558). Knee detected at p1 from the exact duals. |
| 4 | Prescriptive (stress) | Stock.regime_covar under “crisis” regime | (shares Stock.x_quantity with Stage 3) | Crisis volatility ~22-30% higher than base at every frontier point; gap peaks mid-frontier (p1 at +29.6%) and narrows toward the concentrated end (p5 at +21.7%). The representative-only universe keeps the concentrated end from stacking near-duplicate bets that would otherwise amplify crisis vol. |
All four stages share a single RAI model. Compliance thresholds are defined once at the top of the script. Stage 1 uses POSITION_LIMIT = 0.15 and SECTOR_LIMIT = 0.30 to flag existing violations as derived Relationships. Stage 3 re-uses SECTOR_LIMIT but applies REP_POSITION_LIMIT = 0.30 to the decision variable: after representative collapse each cluster has exactly one carrier, so its cap is legitimately higher than a per-stock compliance cap.
How the reasoners chain
Each stage writes derived properties the next reads directly. Stage 1’s thresholds (POSITION_LIMIT, SECTOR_LIMIT) become Stage 3 constraints. Stage 2’s Stock.is_representative and Stock.is_non_representative shape Stage 3’s decision space (non-reps forced to zero). Stage 4 uses the same solve_epsilon call as Stage 3 — the Regime concept keyed into Stock.regime_covar makes base vs crisis a scenario view on the same solve, not a separate model. The Reasoner overview table above names each property that crosses a stage boundary.
Multi-scenario Pareto frontier in one pipeline
Scenario combines three budgets and two regimes — six tuples. Each solve_epsilon(eps_rate) call returns one optimal allocation per tuple, so a single solve prices all six scenarios at once. The three frontier drivers share a solve cache, so running all of them costs roughly one budget’s worth of unique solves rather than 3x. Two consequences:
- Base and crisis are comparable at equal budget and equal return target: the vol gap is a pure regime effect, not a re-fitting artifact.
- Adding a fourth regime or a fifth budget is a data edit in
scenario_data, not a code change insolve_epsilon. Scenarios are data.
Stage 1: Rules-based compliance analysis
The first stage defines compliance flags as RAI derived properties and Relationships. The model loads portfolio data (users, accounts, holdings, transactions) alongside the stock universe, then evaluates three rules using two configurable thresholds:
POSITION_LIMIT = 0.15 # max fraction of budget per stockSECTOR_LIMIT = 0.30 # max fraction of budget per sectorRule 1 — Overconcentrated holdings: a holding whose value exceeds POSITION_LIMIT of the account balance. The holding value is a derived property:
Holding.value = model.Property(f"{Holding} has value {Float:holding_value}")model.define(Holding.value(Holding.quantity * Holding.purchase_price))
Holding.is_overconcentrated = model.Relationship(f"{Holding} is overconcentrated")AccountR1 = Account.ref()model.where( Holding.account(AccountR1), Holding.value > POSITION_LIMIT * AccountR1.balance,).define(Holding.is_overconcentrated())Rule 2 — Sector concentration: total holdings in a sector exceeding SECTOR_LIMIT of the account balance. Uses aggregation to sum holding values per (account, sector):
sector_exposure = sum(HoldingSC.value).where( HoldingSC.account(AccountSC), HoldingSC.stock(StockSC), StockSC.sector_ref(SectorSC),).per(AccountSC, SectorSC)
model.where( Holding.account(AccountSC), Holding.stock(StockR2), StockR2.sector_ref(SectorSC), sector_exposure > SECTOR_LIMIT * AccountSC.balance,).define(Holding.is_sector_concentrated())Rule 3 — High-risk traders: users with risk_score > 0.8 and more than 5 flagged transactions. Flagged transaction count is computed via aggregation:
flagged_count = sum(TransactionHR.is_flagged_val).where( TransactionHR.user(User),).per(User)
model.where( User.risk_score > 0.8, flagged_count > 5,).define(User.is_high_risk_trader())Stage 2: Graph — covariance clustering
Volatility and correlation are derived in PyRel from the base covariance, so the ontology is the single source of truth for every similarity metric. Stock.variance picks the covariance diagonal, Stock.volatility applies sqrt(variance) via relationalai.semantics.std.math.sqrt, and Stock.correlation(i, j) = covar(i, j) / (vol_i * vol_j):
Stock.volatility = model.Property(f"{Stock} has {Float:stock_volatility}")model.define(Stock.volatility(sqrt(Stock.variance)))
Stock.correlation = model.Property( f"{Stock} and {Stock} have correlation {Float:stock_correlation}")PairedStockCorr = Stock.ref()cov_ij_ref = Float.ref()model.where( Stock.covar(PairedStockCorr, cov_ij_ref),).define( Stock.correlation( PairedStockCorr, cov_ij_ref / (Stock.volatility * PairedStockCorr.volatility), ))The Graph reasoner builds an undirected graph with Stock as the node concept. Edges are filtered in PyRel directly against the derived correlation property — no upstream edge list required:
corr_graph = Graph( model, directed=False, weighted=False, node_concept=Stock, aggregator="sum",)
stock_i_ref = Stock.ref()stock_j_ref = Stock.ref()corr_ref = Float.ref()model.define(corr_graph.Edge.new(src=stock_i_ref, dst=stock_j_ref)).where( stock_i_ref.correlation(stock_j_ref, corr_ref), stock_i_ref.index < stock_j_ref.index, math_abs(corr_ref) >= CORR_THRESHOLD,)Louvain community detection runs directly on the graph and returns (node, cluster_id) pairs. The cluster id is persisted as a Stock property so downstream stages can consume it:
community = corr_graph.louvain()cluster_label = Integer.ref("cluster_label")Stock.cluster = model.Property(f"{Stock} in cluster {Integer:cluster_id}")stock_clust_ref = Stock.ref()model.define(stock_clust_ref.cluster(cluster_label)).where( community(stock_clust_ref, cluster_label))The script reports cluster sizes and intra- vs inter-cluster average correlation as a sanity check that the clustering separates co-moving stocks from independent ones.
After clustering, Stage 2 picks one representative per cluster — the stock with the highest Sharpe ratio — using per-group argmax in PyRel. Only the representatives will be eligible for allocation in Stage 3:
Stock.sharpe = model.Property(f"{Stock} has Sharpe {Float:stock_sharpe}")model.define(Stock.sharpe(Stock.returns / Stock.volatility))
peer_for_max = Stock.ref()Stock.cluster_max_sharpe = model.Property( f"{Stock} has cluster max Sharpe {Float:cluster_max_sharpe}")model.define( Stock.cluster_max_sharpe( aggs.max(peer_for_max.sharpe) .where(peer_for_max.cluster == Stock.cluster) .per(Stock) ))
Stock.is_representative = model.Relationship(f"{Stock} is cluster representative")model.where(Stock.sharpe == Stock.cluster_max_sharpe).define( Stock.is_representative())Stage 3: Bi-objective optimization
Scenario concept and decision variables
The Stock concept (defined earlier for all stages) carries ticker, sector, expected returns, and the base covariance matrix. Stage 2 added Stock.variance, Stock.volatility, Stock.correlation, Stock.cluster, Stock.sharpe, Stock.cluster_max_sharpe, Stock.is_representative, and Stock.is_non_representative on top. Stage 3 consumes the representative flag via its compliance constraints, and adds budget-and-regime scenarios, regime-conditioned covariance, and decision variables.
Scenarios combine budget and regime so each epsilon solve handles all six (budget, regime) combinations simultaneously:
Regime = model.Concept("Regime", identify_by={"regime_name": String})model.define(Regime.new(regime_name="base"))model.define(Regime.new(regime_name="crisis"))
Scenario = model.Concept("Scenario", identify_by={"name": String})Scenario.budget = model.Property(f"{Scenario} has {Float:budget}")Scenario.regime = model.Property(f"{Scenario} in {Regime}")scenario_data = model.data( [ ("base_500", 500, "base"), ("base_1000", 1000, "base"), ("base_2000", 2000, "base"), ("crisis_500", 500, "crisis"), ("crisis_1000", 1000, "crisis"), ("crisis_2000", 2000, "crisis"), ], columns=["name", "budget", "regime"],)model.define( s := Scenario.new(name=scenario_data["name"]), s.budget(scenario_data["budget"]),)# Link Scenario to Regime by matching the regime name from the data.scenario_link_ref = Scenario.ref()regime_link_ref = Regime.ref()model.where( scenario_link_ref.name == scenario_data["name"], regime_link_ref.regime_name == scenario_data["regime"],).define(scenario_link_ref.regime(regime_link_ref))Define decision variables, constraints, and objective
Each stock gets a continuous quantity variable indexed by Scenario (multi-argument Property).
Stock.x_quantity = model.Property(f"{Stock} in {Scenario} has {Float:quantity}")x_qty = Float.ref()Two concentration limits plus a representative-only filter are added via _add_compliance_constraints. Position and sector caps behave as before; the Stock.is_non_representative() relation forces every non-representative stock to zero allocation, which is how the graph stage’s redundancy removal shows up at solve time. The complement is defined positively because the prescriptive rewriter can’t accept model.not_(...) inside a solver constraint:
def _add_compliance_constraints(problem): # Position limit: each representative <= REP_POSITION_LIMIT * budget. problem.satisfy(model.where( Stock.x_quantity(Scenario, x_qty), ).require(x_qty <= REP_POSITION_LIMIT * Scenario.budget))
# Sector limit: total allocation to stocks in same sector <= SECTOR_LIMIT * budget. sector_alloc = sum(x_qty).where( Stock.x_quantity(Scenario, x_qty), Stock.sector == s_sector_ref.sector, ).per(Scenario, s_sector_ref.sector) problem.satisfy(model.where( Stock.x_quantity(Scenario, x_qty), ).require(sector_alloc <= SECTOR_LIMIT * Scenario.budget))
# Representative-only: non-representative stocks forced to zero. problem.satisfy(model.where( Stock.x_quantity(Scenario, x_qty), Stock.is_non_representative(), ).require(x_qty == 0))The risk objective is quadratic in the decision variables and uses the regime-conditioned covariance: each Scenario picks its matching regime’s covariance, so base and crisis scenarios solve against different covariances in the same call.
problem.minimize( sum(regime_cov_val * x_qty * x_qty_paired) .where( Stock.regime_covar(PairedStock, Scenario.regime, regime_cov_val), Stock.x_quantity(Scenario, x_qty), PairedStock.x_quantity(Scenario, x_qty_paired), ))Solve anchors, then trace the frontier with shadow prices
Two anchor solves establish the feasible return range. Anchor 1 minimizes risk with no return constraint; Anchor 2 maximizes return. The span is measured on a single reference scenario (base_1000).
result1 = solve_epsilon(eps_rate=None) # min-risk anchorEach interior solve minimizes variance subject to a return-target floor and requests sensitivity information, so HiGHS returns the return constraint’s dual — the shadow price. By the envelope theorem that dual is exactly the frontier’s local slope d(variance)/d(return), so one solve yields both a Pareto point and the slope there, with no finite differencing.
problem.solve("highs", time_limit_sec=60, sensitivity=True)# shadow_price = dual of the return-floor constraint = d(variance)/d(return)Sign convention: minimizing variance subject to return >= target, the dual is non-negative with units of variance per unit return, and it rises monotonically along the frontier (variance gets more expensive as you demand more return).
Three drivers spend the same solve budget differently and are compared head-to-head:
- grid — evenly spaced return targets, blind to the frontier’s shape (the control).
- adaptive — sizes each step by the current shadow price so points land evenly in variance space.
- dichotomic — repeatedly splits the interval with the largest chord-vs-tangent gap, sampling where the two endpoints’ shadow prices predict they meet (the NISE scheme).
Quality is scored by max chord-gap: the largest variance error of linearly interpolating between solved points. At equal 6-solve budget the dual-guided drivers win decisively (dichotomic 202 vs grid 558), because the duals tell the search where the frontier curves most.
Pareto analysis output
The script prints the three-driver quality comparison, the shadow-price-vs-secant table (each exact dual next to the finite-difference slope it brackets), the efficient frontier per (budget, regime) scenario, and programmatic knee detection where the exact dual jumps most. The dichotomic frontier is materialized as the FrontierPoint Concept, with integrity constraints asserting that neither return nor risk decreases along it — a relational statement of Pareto-efficiency.
Stage 4: Crisis regime stress test
Crisis covariance is derived in PyRel via PSD-preserving correlation shrinkage, keyed by a Regime concept. The shrinkage formula rho_crisis = alpha * rho + (1 - alpha) * J re-expressed in covariance units becomes cov_crisis(i, j) = alpha * cov(i, j) + (1 - alpha) * vol_i * vol_j — a convex combination of PSD matrices, so PSD is preserved by construction:
Stock.regime_covar = model.Property( f"{Stock} and {Stock} in {Regime} have {Float:regime_covar}")
# Base regime: covariance unchanged.model.where( Stock.covar(PairedStockBase, base_cov_ref), base_regime_ref.regime_name == "base",).define(Stock.regime_covar(PairedStockBase, base_regime_ref, base_cov_ref))
# Crisis regime: convex combination of base covariance and vol_i * vol_j.model.where( Stock.covar(PairedStockCrisis, crisis_cov_ref), crisis_regime_ref.regime_name == "crisis",).define( Stock.regime_covar( PairedStockCrisis, crisis_regime_ref, CRISIS_ALPHA * crisis_cov_ref + (1 - CRISIS_ALPHA) * Stock.volatility * PairedStockCrisis.volatility, ))Both regimes live on the same Stock.regime_covar property, keyed by the Regime concept, so Stage 3’s objective can select the right covariance per scenario without branching:
After the Stage 3 frontier is traced, Stage 4 emits a side-by-side comparison of base and crisis volatility (sqrt(risk)) at each frontier point, grouped by budget. Crisis volatility is consistently ~22-30% higher than base. The gap peaks in the middle of the frontier (p1 at +29.6%) and narrows toward the concentrated end (p5 at +21.7%). That shape is the payoff of the representative-only universe: at the concentrated end the optimizer is picking the highest-Sharpe distinct bet per cluster (Energy and Consumer Staples in this dataset), which happen to have lower crisis correlations than the middle of the frontier. Without the representative collapse, the concentrated end would stack near-duplicates and the crisis gap would grow instead of shrink.
Customize this template
- Adjust compliance thresholds:
POSITION_LIMIT(default 0.15) applies in Stage 1 compliance rules (per-stock holdings).REP_POSITION_LIMIT(default 0.30) applies in Stage 3 optimization (per-representative allocation, which carries its cluster’s combined exposure).SECTOR_LIMIT(default 0.30) applies to both. Note thatREP_POSITION_LIMITmust satisfyREP_POSITION_LIMIT * num_representatives >= 1.0or the fully-invested constraint becomes infeasible. - Tune the correlation graph: Raise or lower
CORR_THRESHOLD(default 0.3) to control graph sparsity. Higher thresholds produce fewer edges and more singleton clusters; lower thresholds produce a denser graph and fewer, larger clusters. - Change the representative picking rule: Stage 2 picks the highest-Sharpe stock per cluster. To pick differently, change the
Stock.cluster_max_sharpederivation — e.g., replaceStock.sharpewithStock.returns(highest return),-Stock.volatility(lowest vol), or a weighted blend. Singletons are always their own representative regardless of rule. - Adjust crisis severity: Lower
CRISIS_ALPHA(default 0.7) shrinks correlations harder toward all-ones (more severe crisis).alpha = 1.0is no crisis (base);alpha = 0.0is maximum crisis (all correlations = 1). Values between 0.5 and 0.9 give interesting comparisons while keeping the QP well-conditioned. - Add more stocks: Extend
returns.csvandcovar.csvwith additional assets and their covariance entries. - Add compliance rules: Define additional Relationships in the rules stage (e.g., minimum holding period, transaction velocity limits).
- Allow short selling: Remove the non-negativity constraint to allow negative holdings.
- Adjust frontier resolution: Increase
N_SOLVESfor a finer-grained frontier. Because the three drivers share a solve cache, the total number of unique solves stays close toN_SOLVESrather than 3x. - Maximize return for given risk: Flip the formulation to maximize expected return subject to a risk budget.
- Transaction costs: Add a linear or quadratic penalty term for rebalancing from an existing portfolio.
Troubleshooting
Problem is infeasible
A frontier solve becomes infeasible when its return-floor target exceeds what the position and sector limits allow at that budget; the error message names the return level that failed. Check that the Anchor 2 max-return output looks right for your data, then relax REP_POSITION_LIMIT or SECTOR_LIMIT, or raise the budget values in the scenario data. Reducing N_SOLVES only coarsens the frontier grid — it does not fix infeasibility.
rai init fails or connection errors
Ensure your Snowflake credentials are configured correctly and that the RAI Native App is installed on your account. Run rai init again and verify the connection settings.
ModuleNotFoundError for relationalai
Make sure you activated the virtual environment and ran python -m pip install . from the template directory. The pyproject.toml declares the required dependencies.
Solver reports non-convex or numerical issues
Ensure the covariance matrix is symmetric and positive semi-definite. Check that covar.csv contains entries for all (i, j) pairs and that covar(i,j) == covar(j,i). HiGHS solves convex QPs to a global optimum and returns shadow prices (duals) when sensitivity=True.