proper regex to tokenize sentence with leading dash -


here regex i'm using tokenizer: [^a-za-z\'-]+

however, if want apply sentence this: -this test. -yes, it's test self-consciousness result ['-this', 'is', 'a', 'test', '-yes', "it's", 'a', 'test', 'for', 'self-consciousness'] there leading - ahead of this , yes. there gonna way eliminate leading -? maybe modification on regex i'm using?

you'd need qualify dash in middle.

since using negatives split up, have allow
wrong dashes matched.

(?:[^a-za-z'-]|(?<![a-za-z'])-|-(?![a-za-z']))+

https://regex101.com/r/ql7lwq/1

 (?:       [^a-za-z'-]         # not of these    |                    # or,       (?<!                # allow dash if not preceded 1 of others            [a-za-z']        )       -    |                    # or,       -                   # allow dash if not followed 1 of others       (?! [a-za-z'] )  )+ 

Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -