python - Efficient way to find the paths with highest average edge value between two points in a pandas dataframe? -


pardon seemingly confusing phrasing of question. here i'd do.

given dataframe df

fruit1     fruit2      weight orange     apple       0.2 orange     grape       0.4 orange     pineapple   0.6 orange     banana      0.8 apple      grape       0.9 apple      pineapple   0.3 apple      banana      0.2 grape      pineapple   0.1 pineapple  banana      0.8 

and constraint highest allowed path length, l

i wish return dataframe has highest average path (i.e. summation of edges between points/path length maximum), edge represented weight column, given not exceed length l.

to illustrate example of mean highest average path:

let's have 4 points a, b, c & d. interested in finding highest average path between & d.

the highest average path max( (a->d)/1, (a->b + b->d)/2, (a->c + c->d)/2, (a->b + b->c + c-> d)/3, (a->c + c->b + b-> d)/3 ) in case of l=3

for l=2, max( (a->d)/1, (a->b + b->d)/2, (a->c + c->d)/2 )

in case of df, l=2 like

fruit1     fruit2      weight   maxavgpath(l=2) orange     apple       0.2       [orange, grape, apple]   orange     grape       0.4       [orange, apple, grape] orange     pineapple   0.6       [orange, banana, pineapple] orange     banana      0.8       [orange, banana] apple      grape       0.9       [apple, grape] apple      pineapple   0.3       [apple, grape, pineapple] apple      banana      0.2       [apple, pineapple, banana]  grape      pineapple   0.1       [grape, orange, pineapple] grape      banana      0.1       [grape, apple, banana] pineapple  banana      0.8       [pineapple, banana] 

note: edge set contains enough rows represent entire network. instance, while (orange, apple) exists, (apple, orange) not because weight (apple, orange) same. however, if including helps in creating solution, feel free use them , leave them out in final output. returned paths of mirror pairs should equal returned paths of original pairs.

thanks clarification, asking path highest cost/(number of edges) between each pair of nodes in graph paths restricted upper limit of connecting edges. longest path problem np-hard efficient solution possible restrictions (see https://en.wikipedia.org/wiki/longest_path_problem). think limit of edge connections artificially enforces longest path of length l bring down exponential in l.

the networkx module can take care of finding our simple paths depth-first search , can rank paths manually summing weights , sorting. can each pair of nodes , add new series dataframe.

import pandas import networkx  # create graph dataframe g = networkx.from_pandas_dataframe(path_frame, 'fruit1', 'fruit2', 'weight')  # find longest path between source , target length l def maxpath_cond(g, source, target, edge_attr, l=none):     #use networkx simple paths function uses depth first search     paths = networkx.simple_paths.all_simple_paths(g,source, target, l)     # calculate , sort costs of path     costs = [(pathcost(g, pth, edge_attr), pth) pth in paths]     return sorted(costs, key=lambda x:x[0], reverse=true)  def pathcost(g,path, edge_attr):     lp = len(path)-1     return sum(g[path[n]][path[n+1]][edge_attr] n in range(lp))/lp #iterate through dataframe , create new series made of long paths mxs = [] n in range(len(path_frame)):     src, targ = path_frame.loc[n]['fruit1'], path_frame.loc[n]['fruit2']     mxl = maxpath_cond(g, src, targ, 'weight', 2)[0]     mxs.append( mxl[1])  print(path_frame.assign(maxavgpath=mxs)) 

which outputs:

      fruit1     fruit2  weight                   maxavgpath 0     orange      apple     0.2       [orange, grape, apple] 1     orange      grape     0.4       [orange, apple, grape] 2     orange  pineapple     0.6  [orange, banana, pineapple] 3     orange     banana     0.8             [orange, banana] 4      apple      grape     0.9               [apple, grape] 5      apple  pineapple     0.3    [apple, grape, pineapple] 6      apple     banana     0.2   [apple, pineapple, banana] 7      grape  pineapple     0.1    [grape, apple, pineapple] 8  pineapple     banana     0.8          [pineapple, banana] 

Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -