python - Find dicts from lists the have the closest value in a faster/better way -
i tried searching response not find 1 though i'm sure must have been asked before. must not searching correct phrases.
my problem have 2 large lists of dicts , trying match dicts in list dict in list b has closest value particular key, in case timestamp
. timestamps dicts may or may not same, , want proceed acting on dict list if has match list b timestamp value within 15 of it's timestamp. dictionaries not identical in structure both contain timestamp key value pair.
first tried similar this:
for itema in lista: closestitemb = min(listb, key=lambda x :abs(x["timestamp"])-int(itema["timestamp")) if(abs(itema['timestamp'] - closestitemb['timestamp']) < 15: #write info both dicts csv file
this extremely slow large lists. realized lists both ordered timestamp should possible speed significantly
my thought process on first loops through search entire list b closest , next time through search through small slice beyond last matching listb index. in 99% of cases next item form list matches 1 of next few items list b. don't , in case searched end of list b again looking closest match, go searching small slices again until next miss.
for itema in lista: closestitemb = min(listb[lastfoundindex:lastfoundindex+3, key=lambda x :abs(x["timestamp"])-int(itema["timestamp")) if(abs(itema['timestamp'] - closestitemb['timestamp']) < 15: lastfoundindex = listb.index(closestitemb) #write info both dicts csv file else: closestitemb = min(listb[lastfoundindex:len(listb)-1, key=lambda x :abs(x["timestamp"])-int(itema["timestamp")) if(abs(itema['timestamp'] - closestitemb['timestamp']) < 15: lastfoundindex = listb.index(closestitemb) #write info both dicts csv file
this faster first iteration, not as have expected. interestingly gets slower , slower @ finding match runs. i', guessing might way list slicing works i'm not entirely sure happening under hood there.
as may able tell python isn't best. thought of better way write code, don't know how write in pythonic way.
what search through list b until sign of difference in timestamp list , list b flips, @ point 1 of last 2 items checked has closest list a. next item in lista can same thing, starting index in list b found match. code replace following line:
closestitemb = min(listb[lastfoundindex:lastfoundindex+3, key=lambda x :abs(x["timestamp"])-int(itema["timestamp"))
but not sure how write it.
or there may entirely other way of solving problem(i find there when comes python)
any appreciated
how following code? it's using 2 lists "timestamp" numbers in them instead of dicts, using dicts complicate matters - algorithm stay same.
the idea here have 2 pointers , b (ia , ib), , see if values @ ia , ib close enough make match. if aren't, if difference positive, means value in further ahead 1 in b , ib has play catchup. if difference negative, it's other way around , ia has play catchup.
a = [1, 4, 35, 40, 56, 70, 90, 110 ] b = [3, 20, 39, 57, 62, 84, 100, 150] ia = 0 ib = 0 while ia < len(a) , ib < len(b): delta = a[ia] - b[ib] if abs(delta) <= 15: print("found match @ ia={} ({}) , ib={} ({})".format(ia, a[ia], ib, b[ib])) # both items matched, continue next ones ia += 1 ib += 1 elif delta > 15: # we're far behind in b list, try catch ib += 1 elif delta < -15: # far behind in list, try catch ia += 1
note wasn't sure how handle cases 2 values 1 list might match 1 in second - example, both 1 , 4 a
list match 3 b
list, algorithm presented takes value out of race once matched partner other list. change changing happens ia
, ib
once match found.
the following code finds possible matches (i think), still single iteration (but doesn't add matches candidate list find best one:
a = [1, 4, 35, 40, 56, 70, 90, 110 ] b = [3, 20, 39, 57, 62, 84, 100, 150] ia = 0 ib = 0 while ia < len(a) , ib < len(b): delta = a[ia] - b[ib] if abs(delta) <= 15: print("found match @ ia={} ({}) , ib={} ({})".format(ia, a[ia], ib, b[ib])) if delta < 0: # there might better match yet timestamp @ ib ia += 1 elif delta > 0: # there might better match yet timestamp in ia ib += 1 else: # perfect match, won't better. move along in both lists ia += 1 ib += 1 elif delta > 15: # we're far behind in b list, try catch ib += 1 elif delta < -15: # far behind in list, try catch ia += 1
now if needed find best (closest) match, code might this:
a = [1, 4, 35, 40, 56, 70, 90, 110 ] b = [3, 20, 39, 57, 62, 84, 100, 150] ia = 0 ib = 0 best_at = -1 best_diff = 10000 while ia < len(a) , ib < len(b): delta = a[ia] - b[ib] if abs(delta) <= 15: print("found match @ ia={} ({}) , ib={} ({})".format(ia, a[ia], ib, b[ib])) if abs(delta) < best_diff: best_at = ib best_diff = abs(delta) if delta < 0: if best_diff < 10000: print("best match {} {} @ ib={}".format(a[ia], b[best_at], best_at)) best_diff = 10000 ia += 1 elif delta > 0: ib += 1 else: # perfect match print("best match {} {} @ ib={}".format(a[ia], b[best_at], best_at)) best_diff = 10000 ia += 1 ib += 1 elif delta > 15: ib += 1 elif delta < -15: if best_diff < 10000: print("best match {} {} @ ib={}".format(a[ia], b[best_at], best_at)) best_diff = 10000 ia += 1
this still runs in linear time. time complexity o(n+m), n length of list a
, m length of list b
, , can see that's case because in every iteration through while loop, either ia
or ib
gets advanced 1.
i don't think can better o(n+m) if want find closest match every timestamp in list a.
Comments
Post a Comment