Announcing the public preview of custom Text Analytics for health
Published date: 23 May, 2023
Today, we are pleased to announce the public preview release of custom Text Analytics for health as one of the latest features offered through Azure’s Cognitive Services for Language. The feature is available for free but is limited to 5,000 text records per Language resource and is accessible to users through the Language Studio along with a suite of other NLP capabilities.
Custom Text Analytics for health is the custom version of Text Analytics for health, the prebuilt NLP solution for extracting and labeling relevant medical information from a variety of unstructured medical text. It allows you to use state-of-the-art machine learning models to build your own healthcare entity extraction model using data from your specific domain such as oncology, pediatrics, or any abbreviations used by your organization. It also leverages all the prebuilt entity categories from Text Analytics for health, allowing you to extend them with your custom vocabulary. Your data will be stored on your own private Azure Storage Blob Container and will be linked to the project upon creation.
The first step of the process is defining your schema or entity map. Entities can be populated with prebuilt, learned, and/or a list of components, which are different extraction methods. The prebuilt component refers to all the pretrained entities supported by the Text Analytics for health model. These entities are automatically added to your entity map and do not require any labeling or training to function. On the other hand, new entities you define are populated with the learned component which uses labeled data to extract the entities from context. An example of the schema definition is shown below, where TreatmentLocation is a user-defined entity populated with the learned component and BodyStructure is an entity with a prebuilt component.
All entities, prebuilt or custom, can be supplemented with your own vocabulary by populating their list components with a list key and synonyms as shown in the figure below. List components use exact case matching of the synonyms to perform entity extraction and return the list value or key.
The next step of the process is the data labeling where you are required to add labels to your data for all instances of your custom entities and entities with the learned component. The labeled data is used to train the model to extract the entities from context as well as to test and evaluate the model performance after training. You can add your labels programmatically using the authoring REST APIs or manually using the Language Studio’s labeling experience by selecting the span in the text followed by selecting the entity, as shown below.
After labeling your data, you can then train and evaluate your model using your training and testing datasets. The model performance pages, shown below, allow you to view the results of the evaluation which contains a summary of the average F1, precision, and recall scores for your model, detailed scoring of each custom entity, a list of all the false positive and negative occurrences, along with guidance on ways to improve your model’s performance.
After your model is trained and you are satisfied with the evaluation, you can deploy your model to one of the supported regions in order to consume it and make predictions. You can also use the Language Studio’s testing page in order to visualize the results from the endpoint, as shown in the figure below.
Custom Text Analytics for health is now available in the following Azure regions:
- East US
- North Europe
- UK South
We’re also excited to announce that Custom Text Analytics for health will be available for free, for a limited time, using the Standard (S) Tier, providing you with 5,000 text records per Language resource and unlimited training time.
The authoring and prediction experiences can also be done entirely programmatically using our Language Customization Authoring and Runtime Text Analytics REST APIs.
Text Analysis Authoring APIs
Text Analysis Runtime APIs
If you are interested in learning more about Custom Text Analytics for health, check out our reference documentation features conceptual articles, how to guides, and a QuickStart to get you started today!
Microsoft products and services (1) are not designed, intended or made available as a medical device, and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment.