Blog
September 2, 2024

Data Minimisation & The Dilemma of AI

Metomic CEO Rich Vibert ponders the dilemma businesses face when it comes to data minimisation and building their own AI tools.

Download
Download

When it was announced that GDPR was coming into force, I remember the panic beforehand. Marketing teams and security professionals alike were worried about the prospect of ensuring data was held for a minimal amount of time while still allowing employees to do their jobs effectively.

While the regulations undoubtedly brought in more data privacy regulations for individuals, organisations were put under pressure to reduce the amount of data they retain; a best practice in data security regardless, but now with the added fear of hefty fines and reputational damage to make it a necessity. Data minimisation is a key component of GDPR, and businesses face big penalties for neglecting it.

Recently, I’ve found myself witnessing a similar phenomenon - the rise of Generative AI and the panic it brings to security professionals around the globe. We all remember the Samsung ChatGPT data leak, which resulted in the company banning the use of ChatGPT until safe usage measures could be put in place (even though banning ChatGPT in an organisation is close to impossible, just like it is for schools too - some of which have embraced GenAI). Businesses today are being forced to take measures to create a secure environment for Generative AI to be used. Denying employees the possibility of leveraging the potential of Generative AI in their work is a competitive disadvantage, while their peers become ever more efficient and productive. In fact, GenAI has had a steeper initial adoption curve than other recent technologies, such as smartphones and tablets.

At the same time, whether it’s senior executives outlining presentations, developers checking source code, or customer service teams writing confidential emails to clients, there’s a whole range of sensitive data that could find its way into Large Language Models (LLMs) like ChatGPT, and this rightfully sets alarm bells ringing for security organisations.

Building bespoke AI tools with existing company data

With the recent announcement from OpenAI that users can soon create their own GPT’s, and companies keen to build their own AI tools, organisations will have the opportunity to make in-house solutions that they can control themselves. These bespoke AI tools will need existing data to learn from, in order to satisfy the exact needs of the company.

It begs the question: how do we balance minimising data for compliance reasons with retaining enough data to teach business-specific AI tools?

Businesses may be forced to weigh up whether they’re willing to risk the security of customer data for the sake of building tools that will benefit the business in the future. It will be a decision that balances customer and company needs.

There is no doubt that businesses should make every effort to comply with regulations, setting retention periods on the data they collect to ensure data is held no longer than necessary, but this then limits the amount of data they have on hand to feed bespoke AI machines.

Recent UK government guidelines recommend that organisations ‘apply appropriate checks and sanitisation of data and inputs’ as well as implementing other security measures to ensure AI tools are built with security front of mind.

However, leadership teams should also consider the following questions before de-prioritising data minimisation in favour of data retention for AI purposes:

  1. How do we make sure we’re staying compliant?
  2. What is the objective of the AI tool we are building? How much data will we need to provide it?
  3. What are the risks of de-prioritising data minimisation?
  4. If we were hit with a data breach, how would the data we retain for AI purposes be impacted? Is it worth the risk?

Reviewing this from a security and privacy perspective, it seems de-prioritising data minimisation is a huge risk - one that may not pay off in the long run. It will be interesting to see how security teams deal with this new dilemma, and whether we will see more companies impacted as they retain more data to feed new AI machines.

The key to ensuring data is secured across new GenAI tools is understanding where it’s stored, how employees are sharing it, and when it makes sense to delete it.

Having a powerful data security solution in place will be vital as businesses begin the difficult process of managing data across multiple GenAI tools, and enabling employees to remain productive in the process.

When it was announced that GDPR was coming into force, I remember the panic beforehand. Marketing teams and security professionals alike were worried about the prospect of ensuring data was held for a minimal amount of time while still allowing employees to do their jobs effectively.

While the regulations undoubtedly brought in more data privacy regulations for individuals, organisations were put under pressure to reduce the amount of data they retain; a best practice in data security regardless, but now with the added fear of hefty fines and reputational damage to make it a necessity. Data minimisation is a key component of GDPR, and businesses face big penalties for neglecting it.

Recently, I’ve found myself witnessing a similar phenomenon - the rise of Generative AI and the panic it brings to security professionals around the globe. We all remember the Samsung ChatGPT data leak, which resulted in the company banning the use of ChatGPT until safe usage measures could be put in place (even though banning ChatGPT in an organisation is close to impossible, just like it is for schools too - some of which have embraced GenAI). Businesses today are being forced to take measures to create a secure environment for Generative AI to be used. Denying employees the possibility of leveraging the potential of Generative AI in their work is a competitive disadvantage, while their peers become ever more efficient and productive. In fact, GenAI has had a steeper initial adoption curve than other recent technologies, such as smartphones and tablets.

At the same time, whether it’s senior executives outlining presentations, developers checking source code, or customer service teams writing confidential emails to clients, there’s a whole range of sensitive data that could find its way into Large Language Models (LLMs) like ChatGPT, and this rightfully sets alarm bells ringing for security organisations.

Building bespoke AI tools with existing company data

With the recent announcement from OpenAI that users can soon create their own GPT’s, and companies keen to build their own AI tools, organisations will have the opportunity to make in-house solutions that they can control themselves. These bespoke AI tools will need existing data to learn from, in order to satisfy the exact needs of the company.

It begs the question: how do we balance minimising data for compliance reasons with retaining enough data to teach business-specific AI tools?

Businesses may be forced to weigh up whether they’re willing to risk the security of customer data for the sake of building tools that will benefit the business in the future. It will be a decision that balances customer and company needs.

There is no doubt that businesses should make every effort to comply with regulations, setting retention periods on the data they collect to ensure data is held no longer than necessary, but this then limits the amount of data they have on hand to feed bespoke AI machines.

Recent UK government guidelines recommend that organisations ‘apply appropriate checks and sanitisation of data and inputs’ as well as implementing other security measures to ensure AI tools are built with security front of mind.

However, leadership teams should also consider the following questions before de-prioritising data minimisation in favour of data retention for AI purposes:

  1. How do we make sure we’re staying compliant?
  2. What is the objective of the AI tool we are building? How much data will we need to provide it?
  3. What are the risks of de-prioritising data minimisation?
  4. If we were hit with a data breach, how would the data we retain for AI purposes be impacted? Is it worth the risk?

Reviewing this from a security and privacy perspective, it seems de-prioritising data minimisation is a huge risk - one that may not pay off in the long run. It will be interesting to see how security teams deal with this new dilemma, and whether we will see more companies impacted as they retain more data to feed new AI machines.

The key to ensuring data is secured across new GenAI tools is understanding where it’s stored, how employees are sharing it, and when it makes sense to delete it.

Having a powerful data security solution in place will be vital as businesses begin the difficult process of managing data across multiple GenAI tools, and enabling employees to remain productive in the process.

When it was announced that GDPR was coming into force, I remember the panic beforehand. Marketing teams and security professionals alike were worried about the prospect of ensuring data was held for a minimal amount of time while still allowing employees to do their jobs effectively.

While the regulations undoubtedly brought in more data privacy regulations for individuals, organisations were put under pressure to reduce the amount of data they retain; a best practice in data security regardless, but now with the added fear of hefty fines and reputational damage to make it a necessity. Data minimisation is a key component of GDPR, and businesses face big penalties for neglecting it.

Recently, I’ve found myself witnessing a similar phenomenon - the rise of Generative AI and the panic it brings to security professionals around the globe. We all remember the Samsung ChatGPT data leak, which resulted in the company banning the use of ChatGPT until safe usage measures could be put in place (even though banning ChatGPT in an organisation is close to impossible, just like it is for schools too - some of which have embraced GenAI). Businesses today are being forced to take measures to create a secure environment for Generative AI to be used. Denying employees the possibility of leveraging the potential of Generative AI in their work is a competitive disadvantage, while their peers become ever more efficient and productive. In fact, GenAI has had a steeper initial adoption curve than other recent technologies, such as smartphones and tablets.

At the same time, whether it’s senior executives outlining presentations, developers checking source code, or customer service teams writing confidential emails to clients, there’s a whole range of sensitive data that could find its way into Large Language Models (LLMs) like ChatGPT, and this rightfully sets alarm bells ringing for security organisations.

Building bespoke AI tools with existing company data

With the recent announcement from OpenAI that users can soon create their own GPT’s, and companies keen to build their own AI tools, organisations will have the opportunity to make in-house solutions that they can control themselves. These bespoke AI tools will need existing data to learn from, in order to satisfy the exact needs of the company.

It begs the question: how do we balance minimising data for compliance reasons with retaining enough data to teach business-specific AI tools?

Businesses may be forced to weigh up whether they’re willing to risk the security of customer data for the sake of building tools that will benefit the business in the future. It will be a decision that balances customer and company needs.

There is no doubt that businesses should make every effort to comply with regulations, setting retention periods on the data they collect to ensure data is held no longer than necessary, but this then limits the amount of data they have on hand to feed bespoke AI machines.

Recent UK government guidelines recommend that organisations ‘apply appropriate checks and sanitisation of data and inputs’ as well as implementing other security measures to ensure AI tools are built with security front of mind.

However, leadership teams should also consider the following questions before de-prioritising data minimisation in favour of data retention for AI purposes:

  1. How do we make sure we’re staying compliant?
  2. What is the objective of the AI tool we are building? How much data will we need to provide it?
  3. What are the risks of de-prioritising data minimisation?
  4. If we were hit with a data breach, how would the data we retain for AI purposes be impacted? Is it worth the risk?

Reviewing this from a security and privacy perspective, it seems de-prioritising data minimisation is a huge risk - one that may not pay off in the long run. It will be interesting to see how security teams deal with this new dilemma, and whether we will see more companies impacted as they retain more data to feed new AI machines.

The key to ensuring data is secured across new GenAI tools is understanding where it’s stored, how employees are sharing it, and when it makes sense to delete it.

Having a powerful data security solution in place will be vital as businesses begin the difficult process of managing data across multiple GenAI tools, and enabling employees to remain productive in the process.