algorithm - How to optimise storage size and performance when generating a large list of sequences in python? -
problem
i generating possible sequences of form, given integer n:
- the sequence has length
n
- the sequence must contain numbers
n
,n-1
,n-2
,...
,n-k ≥ 1
k < n
. numbers can repeated.
for example, n = 3
, possible sequences are:
1, 2, 3 1, 3, 2 2, 1, 3 2, 3, 1 3, 1, 2 3, 2, 1 2, 2, 3 2, 3, 2 3, 2, 2 2, 3, 3 3, 2, 3 3, 3, 2 3, 3, 3
in other words, sequence must contain n
, numbers counting down n
without jumps, in no particular order , repetitions allowed.
given n
, number of such sequences given ordered bell numbers or fubini numbers, grow extremely fast.
here code using generate sequences:
from sympy.utilities.iterables import multiset_permutations def generate_sequences(n): sequences = [] unpermuted_seq in unpermuted_sequences(n,n): permutation in multiset_permutations(unpermuted_seq): sequences.append(permutation) return sequences def unpermuted_sequences(number,remaining_slots): # generates list of possible unpermuted sequences if remaining_slots == 0: yield [] return repetitions in range(1, remaining_slots + 1): sequence in unpermuted_sequences(number - 1, remaining_slots - repetitions): yield sequence + repetitions*[number]
questions
the code posted above works intended. 2 main concerns following:
storage: particular application, once
n
chosen, need store sequences. need go through list , remove sequences not satisfy particular condition. however, smalln
(i.e.n > 8
), lot of memory required (order of gbs).generation time: code takes long time generate sequences, small
n
.
how can generate sequences in way optimises storage , generation time?
i store values binary. numbers 16 use nibble (4 bits - using bit shifting) store each value. n=8
'only' need 545835 * 4 bytes = ± 2mb -- n=10
± 500mb.
for faster processing , writing file, can use memory mapping (calculate required size front, create file of size, , open memory mapping).
this way every value can written directly mapping (i.e. file, if memory), eliminate slower sequences.append(permutation)
. consider writing sequences need, because if want remove them later you'll need shift other data.
you add small header file values: n
, k
, number of sequences
, in binary.
Comments
Post a Comment