Jargon demo

Jargon is a Go package with tokenizers and lemmatizers. Source & docs.


Jargon picks out known terms (lemmas) from technical text, for example:

We are looking for experienced Rails developers, with experience in NodeJS and Obj C.

The result is consistent, canonical terminology — allowing for better analysis like NLP. Jargon uses tags and synonyms from StackOverflow, and implements insensitivity to spaces, hyphens, dots and case.

Source data might use react, React.js or React  JS or REACTJS, but we are confident they get converted to one string.
We can lemmatize HTML, but only the text nodes.

<!— Comments are left verbatim even if terms like Ruby are within them. True for tags and attributes as well. -->

<p class="rails">Hi! Let's talk Rails and SQL.</p>
The parsing rules work well for comma separated files:

Jane Doe,"c sharp, ecma script",6
Foo Bar,"aspnet mvc R NodeJS", 7.5
The parsing rules work well for JSON:

"name": "Microsoft Access",
"name": "X Code"

Published by Matt Sherman on May 18, 2018