pandas - Unbound local error: ("local variable referenced before assignment") -
import pandas pd, numpy np d = [{'cabin': 'f g13'},{'cabin': 'a32 a45'},{'cabin': 'f23 f36'},{'cabin': 'b24'},{'cabin': nan}] df = pd.dataframe(d) def deck_list(row): if row['cabin']!=row['cabin']: cabinid = 'none' else: cabinsubstr = row['cabin'].split(' ') in cabinsubstr: if i.find('f ') != -1: cabinid = i[0][0] break if i.find('f ') == 0: cabinid = i[1][0] break return cabinid df['deck_id'] = df.apply(deck_list, axis=1)
am missing something? i've written akin plenty of times , i've never gotten error maybe it's stupid?
another way write this, using vectorized string methods is:
import pandas pd import numpy np nan = np.nan df = pd.dataframe([{'cabin': 'f g13'}, {'cabin': 'a32 a45'}, {'cabin': 'f23 f36'}, {'cabin': 'b24'}, {'cabin': nan}]) cabin_parts = df['cabin'].str.split(' ', expand=true) conditions = [pd.isnull(df['cabin']), df['cabin'].str.startswith('f').astype(bool), ~df['cabin'].str.contains('f').astype(bool)] choices = [none, cabin_parts[1].str[0], cabin_parts[0].str[0]] df['deck_id'] = np.select(conditions, choices)
which yields
cabin deck_id 0 f g13 g 1 a32 a45 2 f23 f36 f 3 b24 b 4 nan none
alternatively, if understand cabin
--> deck_id
naming pattern correctly, perhaps
df['deck_id'] = df['cabin'].str.extract(r'(\d\d*)?\s*(\d\d+)', expand=true)[1].str[0]
would suffice, since
in [86]: df['cabin'].str.extract(r'(\d\d*)?\s*(\d\d+)', expand=true) out[86]: 0 1 0 f g13 1 a32 a45 2 f23 f36 3 nan b24 4 nan nan
the regex pattern (\d\d*)?\s*(\d\d+)
has following meaning:
(\d\d*)? first capturing group: 0-or-1 (nondigit followed 0-or-more digits) \s* 0-or-more whitespace (\d\d+) second capturing group: (nondigit followed 1-or-more digits)
Comments
Post a Comment