How Does Contify Uses AI Technologies?

Disclaimer 1: This is an indicative list of how AI is used in Contify. For a detailed description, refer to these blogs:

Non-Technical: Behind The Scenes Of a Market Intelligence Platform

Technical: Machine Learning Problems: The Easy Parts

Disclaimer 2: Subjective problems such as understanding information are never completely solved by AI technologies. We include human intervention wherever AI cannot achieve desired results so that our users get an intelligence system that they can use.

Sourcing

  • Differentiating between a company website and a news website

    The initial information processing for both sources is different. Contify uses the meta description of the source, and custom rules to differentiate between a company website or a news website.

  • Identifying pages to monitor on a website for updates

    Company posts strategic updates on different kinds of sections -- press release, announcements, news, media, or similar pages. 

  • Determining whether website integrations are working fine or should be reviewed

    Contify determines whether a website sourcing has any problem or not, depending on the change in the publishing patterns.

  • Identifying the published date and author of the article

    It seems like a trivial problem, but accurately identifying the date and author of an article can be quite challenging, particularly because there are no standard tags for authors and/or date formats. They have to be identified through the sources.

  • Determining which part of the webpage has the headline and which part has the text of the article

    A web page has hundreds of HTML tags. Any of those tags can have the text of the article. Contify scours all of these tags and identifies the one with the text to retrieve it.

    Processing

  • Removing irrelevant and non-business information
    Pre-trained ML models to identify and remove non-business information such as Sports, Entertainment, Politics, etc.

  • Named Entity Recognition (nouns such as companies, locations, persons, etc.)
    Disambiguation: Is it "Apple" the fruit, or "Apple" the company?
    Aboutness: Is it about the company or just a passing mention of the company name?

  • Classification: Tagging of business metadata such as topics, industry, etc.

    Topics, Industries, or Custom Categories such as those based on our customers' products and services. Topics such as M&A, Management Change, Business Expansion, etc.

    The classification is based on three layers: Rules, Machine Learning, and Human Curation. The human-curated data is used to continuously update ML models.

  • Grouping of similar and duplicate information
    Group duplicates and similar information published on different sources. Similar information is identified not only based on the words, but also the company..

  • Removal of unnecessary text from the updates, e.g.
    ** Removes datelines such as SAN FRANCISCO, Jan. 1, 2019
    ** Removes introductory or marketing text, e.g. “DXC Technology, the world’s leading independent, end-to-end IT services company, announced” is changed to “DXC Technology announced”
    ** Stock tickers are removed, e.g. (Nasdaq: TBBK) will be removed from the excerpt “The Bancorp, Inc. (Nasdaq: TBBK) today announced...”

  • Adding contextual information for easy reading:
    ** Changes text from “Today” to the current date of the update.
    Replaces updates from paid sources with updates from freely accessible sources.

    Analysis

  • Extraction of Facts, Auto Summarization, and Quotes
    Extracts important text along with business data (facts) from a news story, and auto-summarizes based on extractive summarization techniques. This is done using a combination of NLP, ML, and mathematical models. Also extracts the quotes of leadership from the article.