algorithm - How to optimise storage size and performance when generating a large list of sequences in python? -


problem

i generating possible sequences of form, given integer n:

  • the sequence has length n
  • the sequence must contain numbers n, n-1, n-2, ... , n-k ≥ 1 k < n. numbers can repeated.

for example, n = 3, possible sequences are:

1, 2, 3 1, 3, 2 2, 1, 3 2, 3, 1 3, 1, 2 3, 2, 1 2, 2, 3 2, 3, 2 3, 2, 2 2, 3, 3 3, 2, 3 3, 3, 2 3, 3, 3 

in other words, sequence must contain n , numbers counting down n without jumps, in no particular order , repetitions allowed.

given n, number of such sequences given ordered bell numbers or fubini numbers, grow extremely fast.

here code using generate sequences:

from sympy.utilities.iterables import multiset_permutations  def generate_sequences(n):     sequences = []     unpermuted_seq in unpermuted_sequences(n,n):         permutation in multiset_permutations(unpermuted_seq):             sequences.append(permutation)     return sequences  def unpermuted_sequences(number,remaining_slots): # generates list of possible unpermuted sequences      if remaining_slots == 0:         yield []         return     repetitions in range(1, remaining_slots + 1):         sequence in unpermuted_sequences(number - 1, remaining_slots - repetitions):             yield sequence + repetitions*[number] 

questions

the code posted above works intended. 2 main concerns following:

  1. storage: particular application, once n chosen, need store sequences. need go through list , remove sequences not satisfy particular condition. however, small n (i.e. n > 8), lot of memory required (order of gbs).

  2. generation time: code takes long time generate sequences, small n.

how can generate sequences in way optimises storage , generation time?

i store values binary. numbers 16 use nibble (4 bits - using bit shifting) store each value. n=8 'only' need 545835 * 4 bytes = ± 2mb -- n=10 ± 500mb.

for faster processing , writing file, can use memory mapping (calculate required size front, create file of size, , open memory mapping).

this way every value can written directly mapping (i.e. file, if memory), eliminate slower sequences.append(permutation). consider writing sequences need, because if want remove them later you'll need shift other data.

you add small header file values: n, k, number of sequences, in binary.


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -