Disclaimer 1: This is an indicative list of how AI is used in Contify. For a detailed description refer to these blogs:
Non-Technical: Behind The Scenes Of a Market Intelligence Platform
Technical: Machine Learning Problems: The Easy Parts
Disclaimer 2: Subjective problems such as understanding information are never completely solved by AI technologies. We include human intervention wherever AI cannot achieve desired results so that our users get an intelligence system that can use.
- Is it a company website or a news website?
The initial information processing for both the sources are different.
- What page of the website should be monitored for updates? Company posts strategic updates on different kinds of sections -- press release, announcements, news, media or similar pages.
- Are website integrations working fine or should be reviewed?
Depending on the change in the publishing patterns, Contify identifies whether a website sourcing has any problem or not.
- What's the published date of the article?
It seems like a trivial problem but it is a difficult to accurately identify the date in an article. Because there are no standard tags or date formats.
- What part of the webpage has the headline and what part has the text of the article?
A web page has hundreds of HTML tags. Any of those tags can have the text of the article.
- Removing irrelevant and non-business information
Not just non-business such as Sports, Entertainment, Politics, etc. but also irrelevant information such as CSR, internal events, etc.
- Grouping of similar and duplicate information
Group similar information published using different words on different sites at different times.
- Tagging of entities (nouns such as companies, locations, persons, etc.)
Disambiguation: Is it "Apple" the fruit, or "Apple" the company?
Aboutness: Is it about the company or just a passing mention of the company name?
- Tagging of other metadata
Topics, Industries, or Custom Categories such as based on our customers' products and services. Topics such as M&A, Management Change, Business Expansion, etc.
- Removal unnecessary text from the updates, e.g.
** Remove datelines such as SAN FRANCISCO, Jan. 1, 2019
** Remove introductory or marketing text, e.g. “DXC Technology, the world’s leading independent, end-to-end IT services company, announced” is changed to “DXC Technology announced”
** Stock tickers were removed, e.g. removed the ticker (Nasdaq: TBBK) from “The Bancorp, Inc. (Nasdaq: TBBK) today announced..”,
- Adding contextual information for easy reading:
** Changed text from “Today” to the current date of the update
Replaced updates from paid sources with updates from freely accessible sources.