Jump to content

How to efficiently build an asset tree?


Go to solution Solved by John Brezovec,

Recommended Posts

I'm building an asset tree using Python in Seeq Data Lab which will contain thousands of signals, conditions, and such.  The path is always 3 levels deep and looks something like Product Category->Product ID->Quality Recipe->Quality Signal (or Condition).  The connector which defines the signals and conditions for my datasource encodes the prescribed hierarchy path as the "Description" property on each Signal/Condtion.  My logic looks something like this: 

  1. Get the existing hierarchy tree using spy.assets.Tree(...) 
  2. Use spy.search(...) to get all of the signals and conditions that I'm interested in
  3. Use information from the asset tree and search results to determine:
    1. What branches do not yet exist in the tree
    2. Which signals+conditions do not yet exist in the tree.  
    3. All of the branches to which the new signals will be added
  4. Build the new branches in the tree
  5. Add the new signals+conditions to the tree as follows:
    1. For each of the branches where new signals will go:
      1. Extract a new dataframe from the search results dataframe.  This dataframe will have only the new signals that belong to the current branch.
      2. Use the spy.tree.insert(...) passing in the parent_path as the branch and the children as the dataframe containing the new signals that belong to the branch.

This allows me to run my script to update the existing tree as efficiently as I've been able to figure out.  The problem is that this script still takes forever.  Most of the time is spent in the calls to  spy.tree.insert.  Is there a better and more efficient way to call this function?

Edited by Ben Hines
Link to comment
Share on other sites

  • Seeq Team
  • Solution

Got it! When working with large trees with spy.assets.Tree, you want to call insert as few times as possible. The way to do that is to insert using DataFrames. The first workflow that I would try is:

  1. Get the existing hierarchy tree using spy.assets.Tree(...) 
  2. Use spy.search(...) to get all of the signals and conditions that I'm interested in
  3. Manipulate the results of spy.search to construct a DataFrame to add columns 'Path' and 'Friendly Name', which represent where in the tree you want to place the item, and what you want its name to be in the tree.
  4. Insert the entire DataFrame into your tree, don't worry about inserting items that already exist or not (they'll just get overwritten)
  5. When pushing the tree, specify a metadata_state_file. This file enables 'incremental pushing', meaning SPy will only push items that were not previously pushed. This should dramatically decrease how long it takes to repeatedly push large trees with small changes.

An example of this on example data (anyone should be able to run it on their Seeq instance):

import pandas as pd
from seeq import spy

tree = spy.assets.Tree('Insert with DataFrame', workbook='Example of DataFrame Insert')

tags_to_insert = spy.search({'Name': 'Area ?_Temperature', 'Datasource Name': 'Example Data'})

tags_to_insert['Path'] = tags_to_insert['Name'].str.extract(r'(Area \w+)_\w+')
tags_to_insert['Friendly Name'] = 'Temperature'

tree.insert(tags_to_insert)
tree.push(metadata_state_file='insert_with_dataframe.pkl')

 

Link to comment
Share on other sites

Thanks for the information.  I have a follow-up question...  Let's say that my tree is initially empty and I want the search results to be inserted, say, 3 levels deep.  I tried this:

tags_to_insert['Path'] = 'Level 1 >> Level 2 >> Level 3'
tree.insert(tags_to_insert)
tree.visualize()

The resulting tree looks like this: 
Insert with DataFrame

  • My Tree
    • Level 3
      • Signal
      • Signal
      • ...

I was expecting something more like:

  • My Tree
    • Level 1
      • Level 2
        • Level 3
          • Signal
          • Signal
          • ...

Can you let me know why this is? 

Link to comment
Share on other sites

I thought the issue might be that the complete path needs to exist in the tree.   So, I tried this quick experiment:

# Ensure the path "Level 1 >> Level 2 >> Level 3" exists in the tree:
tree.insert(parent = sqc_tree.name, children = ['Level 1'])
tree.insert(parent = 'Level 1', children = ['Level 2'])
tree.insert(parent = 'Level 1 >> Level 2', children = ['Level 3'])

# Set the path on the signals in the search results and add to the tree
tags_to_insert['Path'] = 'Level 1 >> Level 2 >> Level 3'
tree.insert(tags_to_insert)
tree.visualize()

This now looks like: 

My Tree
|-- Level 1
|   |-- Level 2
|       |-- Level 3
|-- Level 3
    |-- Signal 1
    |-- Signal 2
    |-- Signal 3
    |-- ...

Notice that the signals are ending up under a top-level "Level 3" rather than the nested one.

Link to comment
Share on other sites

  • Seeq Team

The path doesn't need to already exist in the tree -- what's happening here is that SPy is truncating the path to the highest common asset in the Path. Since all of the items being inserted at Level 1 >> Level 2 >> Level 3, Level 3 is highest common asset, so the Level 1 and Level 2 are stripped out before inserting.

In practice with your actual tags this shouldn't be an issue, unless you're intending to have a levels at the beginning of your tree that only contain a single asset as a child.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...