java - How to make regular expression to allow optional prefix and suffix extraction -
as title described, regular expression should serve purpose on extract information given string, prefix of string (optional) , suffix of string (optional)
so
prefix_group_1_suffix
returns group_1
when prefix 'prefix_' , suffix _suffix
prefix_group_1
returns group_1
when prefix 'prefix_' , suffix null
<-- code can't handle situation
group_1_suffix
returns group_1
when prefix 'null' , suffix _suffix
group_1
returns group_1
when prefix 'null' , suffix null
<-- code can't handle situation
here code, found doesn't work when
string itemname = ""; string prefix = "test_"; string suffix = ""; string itemstring = prefix + "item_1" + suffix; string prefix_quote = "".equals(prefix) ? "" : pattern.quote(prefix); string suffix_quote = "".equals(suffix) ? "" : pattern.quote(suffix); string regex = prefix_quote + "(.*?)" + suffix_quote; pattern pattern = pattern.compile(regex); matcher matcher = pattern.matcher(itemstring); while (matcher.find()) { itemname = matcher.item(1); break; } system.out.println("itemstring '"+itemstring+"'"); system.out.println("prefix quote '"+prefix_quote+"'"); system.out.println("suffix quote '"+suffix_quote+"'"); system.out.println("regex '"+regex+"'"); system.out.println("itemname '"+itemname+"'");
and here output
itemstring 'test_item_1' prefix quote '\qtest_\e' suffix quote '' regex '\qtest_\e(.*?)' itemname ''
but above code works other 2 conditions
the reason why code fails lies in lazy quantifier .*?
. it's priority match little possible, preferably empty string, that. therefore need anchor regex start/end of string , possible prefix/suffix.
for that, can use lookaround assertions:
string prefix = "test_"; string suffix = ""; string itemstring = prefix + "item_1" + suffix; string prefix_quote = "".equals(prefix) ? "^" : pattern.quote(prefix); string suffix_quote = "".equals(suffix) ? "$" : pattern.quote(suffix); string regex = "(?<=^|" + prefix_quote + ")(.*?)(?=$|" + suffix_quote + ")"; pattern pattern = pattern.compile(regex); matcher matcher = pattern.matcher(itemstring);
this result in regex
(?<=^|test_)item_1(?=$|$)
explanation:
(?<= # assert it's possible match before current position ^ # either start of string | # or test_ # prefix ) # end of lookbehind item_1 # match "item_1" (?=$|$) # assert it's possible match after current position # either end of string or suffix (which replaced # end of string if empty. of course optimized # when constructing regex, quick-and-dirty solution).
Comments
Post a Comment