python - Efficient way to find the paths with highest average edge value between two points in a pandas dataframe? -
pardon seemingly confusing phrasing of question. here i'd do.
given dataframe df
fruit1 fruit2 weight orange apple 0.2 orange grape 0.4 orange pineapple 0.6 orange banana 0.8 apple grape 0.9 apple pineapple 0.3 apple banana 0.2 grape pineapple 0.1 pineapple banana 0.8
and constraint highest allowed path length, l
i wish return dataframe has highest average path (i.e. summation of edges between points/path length maximum), edge represented weight column, given not exceed length l.
to illustrate example of mean highest average path:
let's have 4 points a, b, c & d. interested in finding highest average path between & d.
the highest average path max( (a->d)/1, (a->b + b->d)/2, (a->c + c->d)/2, (a->b + b->c + c-> d)/3, (a->c + c->b + b-> d)/3 ) in case of l=3
for l=2, max( (a->d)/1, (a->b + b->d)/2, (a->c + c->d)/2 )
in case of df, l=2 like
fruit1 fruit2 weight maxavgpath(l=2) orange apple 0.2 [orange, grape, apple] orange grape 0.4 [orange, apple, grape] orange pineapple 0.6 [orange, banana, pineapple] orange banana 0.8 [orange, banana] apple grape 0.9 [apple, grape] apple pineapple 0.3 [apple, grape, pineapple] apple banana 0.2 [apple, pineapple, banana] grape pineapple 0.1 [grape, orange, pineapple] grape banana 0.1 [grape, apple, banana] pineapple banana 0.8 [pineapple, banana]
note: edge set contains enough rows represent entire network. instance, while (orange, apple) exists, (apple, orange) not because weight (apple, orange) same. however, if including helps in creating solution, feel free use them , leave them out in final output. returned paths of mirror pairs should equal returned paths of original pairs.
thanks clarification, asking path highest cost/(number of edges) between each pair of nodes in graph paths restricted upper limit of connecting edges. longest path problem np-hard efficient solution possible restrictions (see https://en.wikipedia.org/wiki/longest_path_problem). think limit of edge connections artificially enforces longest path of length l bring down exponential in l.
the networkx module can take care of finding our simple paths depth-first search , can rank paths manually summing weights , sorting. can each pair of nodes , add new series dataframe.
import pandas import networkx # create graph dataframe g = networkx.from_pandas_dataframe(path_frame, 'fruit1', 'fruit2', 'weight') # find longest path between source , target length l def maxpath_cond(g, source, target, edge_attr, l=none): #use networkx simple paths function uses depth first search paths = networkx.simple_paths.all_simple_paths(g,source, target, l) # calculate , sort costs of path costs = [(pathcost(g, pth, edge_attr), pth) pth in paths] return sorted(costs, key=lambda x:x[0], reverse=true) def pathcost(g,path, edge_attr): lp = len(path)-1 return sum(g[path[n]][path[n+1]][edge_attr] n in range(lp))/lp #iterate through dataframe , create new series made of long paths mxs = [] n in range(len(path_frame)): src, targ = path_frame.loc[n]['fruit1'], path_frame.loc[n]['fruit2'] mxl = maxpath_cond(g, src, targ, 'weight', 2)[0] mxs.append( mxl[1]) print(path_frame.assign(maxavgpath=mxs))
which outputs:
fruit1 fruit2 weight maxavgpath 0 orange apple 0.2 [orange, grape, apple] 1 orange grape 0.4 [orange, apple, grape] 2 orange pineapple 0.6 [orange, banana, pineapple] 3 orange banana 0.8 [orange, banana] 4 apple grape 0.9 [apple, grape] 5 apple pineapple 0.3 [apple, grape, pineapple] 6 apple banana 0.2 [apple, pineapple, banana] 7 grape pineapple 0.1 [grape, apple, pineapple] 8 pineapple banana 0.8 [pineapple, banana]
Comments
Post a Comment