Rick Cook once wrote:
Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning.
It's tongue in cheek, and I don't enjoy calling users idiots, but sometimes you have to question them and it's most apparent in how users interact with free text fields in your software.
Here's something I come back to regularly when working with development teams: you can architect a beautifully compliant data model, nail your data minimization obligations, and have your privacy notices polished to a shine, and then a user will cheerfully type their Social Insurance Number into a "Comments" field and blow the whole thing up.
Free text fields are, from a privacy engineering standpoint, a very risky design decision to make. They are essentially an open invitation for users to store whatever they feel like storing, and users are creative in ways that will genuinely surprise you. Credit card numbers. Passwords. Medical diagnoses. Financial account details. Credentials for other systems entirely. Telling you 'how they really feel'. I've seen it all come through fields that were designed to collect something entirely innocuous, like a product inquiry or a support note.
The problem this creates is not just technical, it is a compliance and liability problem. If your privacy notice says you collect contact information and general enquiries, but your database is quietly accumulating Social Insurance Numbers because nobody told users not to put them there, you are sitting on a gap between what you told people you collect and what you actually store. That gap is where small issues become big issues.
So what do you actually do about it?
The first and most immediate tool is the just-in-time privacy notice. This is a short, contextual warning placed directly at the point of data entry, not buried in your terms and conditions. Something along the lines of: "Please do not include passwords, credit card numbers, or personal identification numbers in this field." It is not glamorous, it's arguably not immensely effective, but it does two things. It gives users a genuine warning, and it gives you a more defensible position if something ends up in there anyway. It shifts the reasonable steps toward protection in your favour.
The second layer is programmatic and far more effective. If there's one thing that's certain, you cannot rely on users reading notices, and you cannot rely on your team manually reviewing free text at scale. What you can do is implement detection mechanisms that flag or intercept potentially sensitive data before it gets written to your database.
Regular expressions can catch patterns that look like credit card numbers, SIN or SSN formats, or passport numbers. There are also SDKs and services that specialize in sensitive data detection and can anonymize, redact, or alert on content in near real-time. The right approach depends on your specific use case, your stack, and what level of risk you are willing to carry, but the point is that options exist and they are worth building into your design conversation early.
The worst position to be in is discovering this problem after the fact, when you are trying to explain to a regulator or a client why sensitive third-party data ended up in a field that was never meant to hold it.
If your team is building features with free text inputs, or if you are auditing an existing product and want to think through your specific mitigations, I am happy to work through it with you. This is exactly the kind of privacy engineering conversation I have with development teams and executive teams regularly, either as a workshop or as part of a broader engagement. You can reach me at rossgsaunders.com.