Training data transparency

Our AI assistants are trained on curated, lawful datasets that exclude copyrighted materials. The table below summarizes the types of data used in training, ensuring transparency and compliance with intellectual property laws:

Training data inventory

Data type Description Source Exclusions
Proprietary QnA datasets Question-and-answer pairs on compliance frameworks (e.g., ISO 27001, SOC 2, GDPR) Created by Better ISMS based on consulting expertise, informed by general ISMS principles No ISO NIST frameworks, or copyrighted content
Synthetic QnA pairs Hypothetical compliance scenarios and questions Generated by Better ISMS using original content No copyrighted standards, regulations, or external texts
User-derived QnA insights QnA pairs based on anonymized patterns from user conversations (e.g., common ISMS queries) Insights from reviewing ISMS Copilot usage, stripped of user information No user data or copyrighted materials
Public domain resources Cybersecurity guidelines and expired copyright works U.S. NIST publications (e.g., SP 800-53), public domain in the U.S. No ISO standards or copyrighted content
Open-source content Guides and reports under permissive licenses Creative Commons (CC0, CC BY) materials from cybersecurity communities No restricted TDM or copyrighted standards
Licensed content Partners-provided original information security guidance Generated by partners using orginal content No ISO standards or copyrighted content

Read mode:

Copyrights