python - Find dicts from lists the have the closest value in a faster/better way -


i tried searching response not find 1 though i'm sure must have been asked before. must not searching correct phrases.

my problem have 2 large lists of dicts , trying match dicts in list dict in list b has closest value particular key, in case timestamp. timestamps dicts may or may not same, , want proceed acting on dict list if has match list b timestamp value within 15 of it's timestamp. dictionaries not identical in structure both contain timestamp key value pair.

first tried similar this:

for itema in lista:     closestitemb = min(listb, key=lambda x :abs(x["timestamp"])-int(itema["timestamp"))     if(abs(itema['timestamp'] - closestitemb['timestamp']) < 15:         #write info both dicts csv file 

this extremely slow large lists. realized lists both ordered timestamp should possible speed significantly

my thought process on first loops through search entire list b closest , next time through search through small slice beyond last matching listb index. in 99% of cases next item form list matches 1 of next few items list b. don't , in case searched end of list b again looking closest match, go searching small slices again until next miss.

for itema in lista:     closestitemb = min(listb[lastfoundindex:lastfoundindex+3, key=lambda x :abs(x["timestamp"])-int(itema["timestamp"))     if(abs(itema['timestamp'] - closestitemb['timestamp']) < 15:         lastfoundindex = listb.index(closestitemb)         #write info both dicts csv file     else:         closestitemb = min(listb[lastfoundindex:len(listb)-1, key=lambda x :abs(x["timestamp"])-int(itema["timestamp"))         if(abs(itema['timestamp'] - closestitemb['timestamp']) < 15:             lastfoundindex = listb.index(closestitemb)             #write info both dicts csv file 

this faster first iteration, not as have expected. interestingly gets slower , slower @ finding match runs. i', guessing might way list slicing works i'm not entirely sure happening under hood there.

as may able tell python isn't best. thought of better way write code, don't know how write in pythonic way.

what search through list b until sign of difference in timestamp list , list b flips, @ point 1 of last 2 items checked has closest list a. next item in lista can same thing, starting index in list b found match. code replace following line:

closestitemb = min(listb[lastfoundindex:lastfoundindex+3, key=lambda x :abs(x["timestamp"])-int(itema["timestamp")) 

but not sure how write it.

or there may entirely other way of solving problem(i find there when comes python)

any appreciated

how following code? it's using 2 lists "timestamp" numbers in them instead of dicts, using dicts complicate matters - algorithm stay same.

the idea here have 2 pointers , b (ia , ib), , see if values @ ia , ib close enough make match. if aren't, if difference positive, means value in further ahead 1 in b , ib has play catchup. if difference negative, it's other way around , ia has play catchup.

a = [1, 4, 35, 40, 56, 70, 90, 110 ] b = [3, 20, 39, 57, 62, 84, 100, 150]  ia = 0 ib = 0 while ia < len(a) , ib < len(b):     delta = a[ia] - b[ib]     if abs(delta) <= 15:         print("found match @ ia={} ({}) , ib={} ({})".format(ia, a[ia], ib, b[ib]))         # both items matched, continue next ones          ia += 1         ib += 1     elif delta > 15:         # we're far behind in b list, try catch         ib += 1     elif delta < -15:         # far behind in list, try catch          ia += 1 

note wasn't sure how handle cases 2 values 1 list might match 1 in second - example, both 1 , 4 a list match 3 b list, algorithm presented takes value out of race once matched partner other list. change changing happens ia , ib once match found.

the following code finds possible matches (i think), still single iteration (but doesn't add matches candidate list find best one:

a = [1, 4, 35, 40, 56, 70, 90, 110 ] b = [3, 20, 39, 57, 62, 84, 100, 150]  ia = 0 ib = 0 while ia < len(a) , ib < len(b):     delta = a[ia] - b[ib]     if abs(delta) <= 15:         print("found match @ ia={} ({}) , ib={} ({})".format(ia, a[ia], ib, b[ib]))         if delta < 0:            # there might better match yet timestamp @ ib            ia += 1         elif delta > 0:            # there might better match yet timestamp in ia            ib += 1         else:            # perfect match, won't better. move along in both lists            ia += 1            ib += 1     elif delta > 15:         # we're far behind in b list, try catch         ib += 1     elif delta < -15:         # far behind in list, try catch          ia += 1 

now if needed find best (closest) match, code might this:

a = [1, 4, 35, 40, 56, 70, 90, 110 ] b = [3, 20, 39, 57, 62, 84, 100, 150]  ia = 0 ib = 0 best_at = -1 best_diff = 10000 while ia < len(a) , ib < len(b):     delta = a[ia] - b[ib]     if abs(delta) <= 15:                 print("found match @ ia={} ({}) , ib={} ({})".format(ia,  a[ia], ib, b[ib]))         if abs(delta) < best_diff:             best_at = ib             best_diff = abs(delta)                            if delta < 0:             if best_diff < 10000:                 print("best match {} {} @ ib={}".format(a[ia], b[best_at], best_at))                 best_diff = 10000             ia += 1                     elif delta > 0:             ib += 1         else:             # perfect match             print("best match {} {} @ ib={}".format(a[ia], b[best_at], best_at))             best_diff = 10000             ia += 1             ib += 1      elif delta > 15:         ib += 1     elif delta < -15:         if best_diff < 10000:             print("best match {} {} @ ib={}".format(a[ia], b[best_at], best_at))             best_diff = 10000         ia += 1 

this still runs in linear time. time complexity o(n+m), n length of list a , m length of list b, , can see that's case because in every iteration through while loop, either ia or ib gets advanced 1.

i don't think can better o(n+m) if want find closest match every timestamp in list a.


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -