Quick Links

Re: Column Redaction

Lists:	pgsql-hackers

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Column Redaction
Date:	2014-10-10 08:57:19
Message-ID:	CA+U5nML+V+v0q4R2M7ZehQ3zKHMBBDtpKmKG6ZAkNsa8MykF7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Postgres currently supports column level SELECT privileges.

1. If we want to confirm a credit card number, we can issue SELECT 1
FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

2. If we want to look for card fraud, we need to be able to use the
full card number to join to transaction data and look up blocked card
lists etc..

3. We want to block the direct retrieval of card numbers for
additional security.
In some cases, we might want to return an answer like '**** ***** **** 7733'

We can't do all of the above with current facilities inside the database.

The ability to mask output for data in certain cases, for the purpose
of security, is known lately as data redaction, or column-level data
redaction.

The best way to support this requirement would be to allow columns to
have an additional "output formatting function". This would be
executed only when data is about to be returned by a query. All other
uses of that would not restrict the data.

This would have other uses as well, such as default report formats, so
we can store financial amounts as NUMERIC, but format them on
retrieval as $12,345.78 etc..

Suggested user interface would be...
FORMAT functionname(parameters, if any)

e.g.
CREATE TABLE customer
( id ...
...
, stored_card_number NUMERIC FORMAT pci_card_number_redaction()
...
);

We'd need to implement something to allow pg_dump to ignore format
functions. I suggest the best way to do that is by providing a BACKUP
role that can be delegated to other users. We would then allow a
parameter for SET output_formatting = on | off, which can only be set
by superuser and BACKUP role, then have pg_dump issue SET
output_formatting = off explicitly when it runs.

Do we want redaction in PostgreSQL?
Do we want it generalised into output format functions?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Dave Page <dpage(at)pgadmin(dot)org>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 09:13:18
Message-ID:	CA+OCxoz-x-v1-wFLbUmvpbpbjGBfqrH52MMm+y5xU6fXoOTJ=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 10, 2014 at 9:57 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like '**** ***** **** 7733'
>
> We can't do all of the above with current facilities inside the database.
>
> The ability to mask output for data in certain cases, for the purpose
> of security, is known lately as data redaction, or column-level data
> redaction.
>
> The best way to support this requirement would be to allow columns to
> have an additional "output formatting function". This would be
> executed only when data is about to be returned by a query. All other
> uses of that would not restrict the data.
>
> This would have other uses as well, such as default report formats, so
> we can store financial amounts as NUMERIC, but format them on
> retrieval as $12,345.78 etc..
>
> Suggested user interface would be...
> FORMAT functionname(parameters, if any)
>
> e.g.
> CREATE TABLE customer
> ( id ...
> ...
> , stored_card_number NUMERIC FORMAT pci_card_number_redaction()
> ...
> );

I like that idea a lot - could be very useful (it reminds me of my Pick days).

> We'd need to implement something to allow pg_dump to ignore format
> functions. I suggest the best way to do that is by providing a BACKUP
> role that can be delegated to other users. We would then allow a
> parameter for SET output_formatting = on | off, which can only be set
> by superuser and BACKUP role, then have pg_dump issue SET
> output_formatting = off explicitly when it runs.

That seems like a reasonable approach. I can imagine other uses for a
BACKUP role in the future.

> Do we want redaction in PostgreSQL?

> Do we want it generalised into output format functions?

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Thom Brown <thom(at)linux(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 09:15:26
Message-ID:	CAA-aLv5inWDbS9U9_mbhZVRdqFpPDcRrcFf-GL16hp0AYXRBSA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 09:57, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like '**** ***** **** 7733'

One question that immediately springs to mind is: would the format
apply when passing columns to other functions? If not, wouldn't
something like

SELECT upper(redacted_column::text) ...

just bypass the formatting?

Also, how would casting be handled? Would it be forbidden for such cases?

And couldn't the card number be worked out using:

SELECT 1 FROM customer WHERE stored_card_number LIKE '%1 7733';
?column?
----------
(0 rows)

SELECT 1 FROM customer WHERE stored_card_number LIKE '%2 7733';
?column?
----------
1
(1 row)

SELECT 1 FROM customer WHERE stored_card_number LIKE '%12 7733';
?column?
----------
(0 rows)

.. and so on, which could be scripted in a DO statement?

Not so much a challenge to the idea, but just wishing to understand
how it would work.

--
Thom

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 09:29:01
Message-ID:	5437A6DD.7020508@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/10/2014 11:57 AM, Simon Riggs wrote:
> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like '**** ***** **** 7733'
>
> We can't do all of the above with current facilities inside the database.

Deny access to the underlying tables. Write SQL functions to do 1. and
2., and grant privileges to the functions, instead. For 3. create views
that do the redaction.

> The ability to mask output for data in certain cases, for the purpose
> of security, is known lately as data redaction, or column-level data
> redaction.
>
> The best way to support this requirement would be to allow columns to
> have an additional "output formatting function". This would be
> executed only when data is about to be returned by a query. All other
> uses of that would not restrict the data.

I don't see how that could work. Once you have access to the datum, you
can find its value in many indirect ways, without invoking the output
function. For example, write a PL/pgSQL function that takes the card
number as argument. Use < and > to binary search its value. If you block
< and >, I'm sure there are countless other ways.

And messing with output functions seems pretty, well, messy, in general.

I think the only solution that's going to work in practice is to
implement the redaction at a higher level. Don't allow direct access to
the tables with card numbers. Create functions that do whatever joins,
etc. you need to do with them, and grant privileges to only the functions.

- Heikki

From:	Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 09:34:53
Message-ID:	D05D9583.14F87%damian.wolgast@si-co.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
>This would have other uses as well, such as default report formats, so
>we can store financial amounts as NUMERIC, but format them on
>retrieval as $12,345.78 etc..

Nice idea, but what if you need to do further calculations?
If you output the value of credit card transactions it works fine, but in
case you want to SUM up the values, then you need to cast it back from
text(?) to numeric, calculate it and cast it to text(?) again?
And if you do - for any reason - need the credit card number in your
application (for example sending it to the credit card company to deduct
money) how can you retrieve it¹s original value?

Moreover, if you SELECT from a sub-SELECT which already has the formatted
information and not the plain data?

Maybe you should restrict access to tables for a certain user and only
allow the user to use a view which formats the output.

Modern applications do have a presentation layer which should take care of
data formatting. I am not sure if it is a good idea to mix data storage
and data presentation in the database.

Regards,
Damian Wolgast (irc:asymetrixs)

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 09:38:54
Message-ID:	CA+U5nMJZqu=rwYmw4xjwWQewAs7ojmCNEDuDv3j-rj__ASiVVQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 10:29, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 10/10/2014 11:57 AM, Simon Riggs wrote:
>>
>> Postgres currently supports column level SELECT privileges.
>>
>> 1. If we want to confirm a credit card number, we can issue SELECT 1
>> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>>
>> 2. If we want to look for card fraud, we need to be able to use the
>> full card number to join to transaction data and look up blocked card
>> lists etc..
>>
>> 3. We want to block the direct retrieval of card numbers for
>> additional security.
>> In some cases, we might want to return an answer like '**** ***** ****
>> 7733'
>>
>> We can't do all of the above with current facilities inside the database.
>
>
> Deny access to the underlying tables. Write SQL functions to do 1. and 2.,
> and grant privileges to the functions, instead. For 3. create views that do
> the redaction.

If everything were easy to lock down the approach you suggest is of
course the best way.

The problem there is that the SQL for (2) changes frequently, so we
want to give people SQL access.

Just not the ability to retrieve data in a usable form.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 09:45:06
Message-ID:	CA+U5nMJQ4J4ec1W31HyUE_C8FRCYAOd4JibXdTUTAz=PjvM9Jg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 10:15, Thom Brown <thom(at)linux(dot)com> wrote:
> On 10 October 2014 09:57, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Postgres currently supports column level SELECT privileges.
>>
>> 1. If we want to confirm a credit card number, we can issue SELECT 1
>> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>>
>> 2. If we want to look for card fraud, we need to be able to use the
>> full card number to join to transaction data and look up blocked card
>> lists etc..
>>
>> 3. We want to block the direct retrieval of card numbers for
>> additional security.
>> In some cases, we might want to return an answer like '**** ***** **** 7733'
>
> One question that immediately springs to mind is: would the format
> apply when passing columns to other functions? If not, wouldn't
> something like
>
> SELECT upper(redacted_column::text) ...
>
> just bypass the formatting?

Yes, it would. As would SELECT redacted_column || ' '

I'm not sure how to block such usage, other than to apply it prior to
final calculation of functions.

i.e. we apply it in the SELECT clause, but not in the other clauses
FROM ON/WHERE/GROUP/ORDER/HAVING etc..

> Also, how would casting be handled? Would it be forbidden for such cases?
>
>
> And couldn't the card number be worked out using:
>
> SELECT 1 FROM customer WHERE stored_card_number LIKE '%1 7733';
> ?column?
> ----------
> (0 rows)
>
> SELECT 1 FROM customer WHERE stored_card_number LIKE '%2 7733';
> ?column?
> ----------
> 1
> (1 row)
>
> SELECT 1 FROM customer WHERE stored_card_number LIKE '%12 7733';
> ?column?
> ----------
> (0 rows)
>
> .. and so on, which could be scripted in a DO statement?
>
>
> Not so much a challenge to the idea, but just wishing to understand
> how it would work.

Yes, covert channels would always exist. It would really be down to
auditing to control such exploits.

Redaction is aimed at minimising access in normal usage.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 10:08:02
Message-ID:	A0FFDE5A-2886-49EA-BF2F-E422CC009082@si-co.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> The problem there is that the SQL for (2) changes frequently, so we
> want to give people SQL access.

So you want to give people access to your SQL database and worry that they could see specific information (credit card numbers) in plain and therefore you want to format it, so that people cannot see the real data. Is that correct?

I'd either do that by only letting them access a view or be reconsidering if it is really a good idea to give them SQL access to the server as they could do other things which e.g. could slow down the server enormously.
Never trust the user. So I see what you want to achieve but I am not sure if the reason to do that is good. Can you explain please?
Maybe you should provide them an interface (e.g. web app) that restricts access to certain functions and cares about formatting.

Regards
Damian Wolgast (irc:asymetrixs)

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 10:21:32
Message-ID:	CA+U5nM+aHdVZTS-ccppkmUr3W_eB1X6RAQsa8pvOOJe2tpV5sw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 11:08, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net> wrote:
>
>> The problem there is that the SQL for (2) changes frequently, so we
>> want to give people SQL access.
>
> So you want to give people access to your SQL database and worry that they could see specific information (credit card numbers) in plain and therefore you want to format it, so that people cannot see the real data. Is that correct?
>
> I'd either do that by only letting them access a view or be reconsidering if it is really a good idea to give them SQL access to the server as they could do other things which e.g. could slow down the server enormously.
> Never trust the user. So I see what you want to achieve but I am not sure if the reason to do that is good. Can you explain please?
> Maybe you should provide them an interface (e.g. web app) that restricts access to certain functions and cares about formatting.

The requirement for redaction cannot be provided by a view.

A view provides a single value for each column, no matter whether it
is used in SELECT or WHERE clause.

Redaction requires output formatting only, but unchanged for other purposes.

Redaction is now a feature available in other databases. I guess its
possible its all smoke and mirrors, but thats why we discuss stuff
before we build it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 10:27:26
Message-ID:	5437B48E.6090707@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/10/2014 01:21 PM, Simon Riggs wrote:
> Redaction is now a feature available in other databases. I guess its
> possible its all smoke and mirrors, but thats why we discuss stuff
> before we build it.

I googled for Oracle Data redaction, and found "General Usage guidelines":

> General Usage Guidelines
>
> * Oracle Data Redaction is not intended to protect against attacks by
> privileged database users who run ad hoc queries directly against the
> database.
>
> * Oracle Data Redaction is not intended to protect against users who
> run exhaustive SQL queries that attempt to determine the actual
> values by inference.

So it's not actually suitable for the example you gave. I don't think we
want this feature...

- Heikki

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 10:35:09
Message-ID:	20141010103509.GZ28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon,

* Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
> The requirement for redaction cannot be provided by a view.
>
> A view provides a single value for each column, no matter whether it
> is used in SELECT or WHERE clause.
>
> Redaction requires output formatting only, but unchanged for other purposes.
>
> Redaction is now a feature available in other databases. I guess its
> possible its all smoke and mirrors, but thats why we discuss stuff
> before we build it.

In general, I'm on-board with the idea and similar requests have come
from users I've talked with.

Is there any additional information available on how these other
databases deal with the questions and concerns which have been raised?

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client). Of
course, that approach also requires that you have a specific item in
mind. Methods to mitigate would include not allowing regular users to
create functions or run DO blocks and rate-limiting their queries, along
with appropriate auditing.

Thanks,

Stephen

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 10:42:43
Message-ID:	CAFj8pRDXogwVkSbO2Ny8N1Le7mFwHWQnj-Xa1NvraR+fC79gDA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2014-10-10 10:57 GMT+02:00 Simon Riggs <simon(at)2ndquadrant(dot)com>:

> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like '**** ***** ****
> 7733'
>
> We can't do all of the above with current facilities inside the database.
>
> The ability to mask output for data in certain cases, for the purpose
> of security, is known lately as data redaction, or column-level data
> redaction.
>
> The best way to support this requirement would be to allow columns to
> have an additional "output formatting function". This would be
> executed only when data is about to be returned by a query. All other
> uses of that would not restrict the data.
>
> This would have other uses as well, such as default report formats, so
> we can store financial amounts as NUMERIC, but format them on
> retrieval as $12,345.78 etc..
>
> Suggested user interface would be...
> FORMAT functionname(parameters, if any)
>
> e.g.
> CREATE TABLE customer
> ( id ...
> ...
> , stored_card_number NUMERIC FORMAT pci_card_number_redaction()
> ...
> );
>
> We'd need to implement something to allow pg_dump to ignore format
> functions. I suggest the best way to do that is by providing a BACKUP
> role that can be delegated to other users. We would then allow a
> parameter for SET output_formatting = on | off, which can only be set
> by superuser and BACKUP role, then have pg_dump issue SET
> output_formatting = off explicitly when it runs.
>
>
I see a benefit of this feature as alternative output function .. I
remember a talk about output format of boolean function. But how this
feature can help to security?

You should to disallow any expression over this column marked or you have
to enforced output alternative output function early.

When you require a alternative output format function (should be
implemented in C), then there is not too less work than implementation of
new type. So probably much more practical a any expression can be used

stored_card_number NUMERIC FORMAT (right(stored_card_numbe::text, 4))

Regards

Pavel

> Do we want redaction in PostgreSQL?
> Do we want it generalised into output format functions?
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	Thom Brown <thom(at)linux(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 10:45:19
Message-ID:	CAA-aLv72LWwPyaXy=8DVfc7gAvJRaZQefJ=HdietSP-srajWYQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 11:35, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Simon,
>
> * Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
>> The requirement for redaction cannot be provided by a view.
>>
>> A view provides a single value for each column, no matter whether it
>> is used in SELECT or WHERE clause.
>>
>> Redaction requires output formatting only, but unchanged for other purposes.
>>
>> Redaction is now a feature available in other databases. I guess its
>> possible its all smoke and mirrors, but thats why we discuss stuff
>> before we build it.
>
> In general, I'm on-board with the idea and similar requests have come
> from users I've talked with.
>
> Is there any additional information available on how these other
> databases deal with the questions and concerns which have been raised?
>
> Regarding functions, 'leakproof' functions should be alright to allow,
> though Heikki brings up a good point regarding binary search being
> possible in a plpgsql function (or even directly by a client). Of
> course, that approach also requires that you have a specific item in
> mind. Methods to mitigate would include not allowing regular users to
> create functions or run DO blocks and rate-limiting their queries, along
> with appropriate auditing.

To be honest, this all sounds rather flaky. Even if you do rate-limit
their queries, they can use methods that avoid rate-limiting, such as
recursive queries. And if you're only after one credit card number
(to use the original example), you'd get it in a relatively short
amount of time, despite some rate-limiting system.

This gives the vague impression of security, but it really seems just
the placing of a few obstacles in the way.

And "auditing" sounds like a euphemism for "pass the problem of
security on elsewhere anyway".

Thom

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 10:49:24
Message-ID:	20141010104924.GA28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
> On 10/10/2014 01:21 PM, Simon Riggs wrote:
> >Redaction is now a feature available in other databases. I guess its
> >possible its all smoke and mirrors, but thats why we discuss stuff
> >before we build it.
>
> I googled for Oracle Data redaction, and found "General Usage guidelines":
>
> >General Usage Guidelines
> >
> >* Oracle Data Redaction is not intended to protect against attacks by
> >privileged database users who run ad hoc queries directly against the
> >database.
> >
> >* Oracle Data Redaction is not intended to protect against users who
> >run exhaustive SQL queries that attempt to determine the actual
> >values by inference.
>
> So it's not actually suitable for the example you gave. I don't
> think we want this feature...

Or, we need to consider how Oracle addresses these risks and consider if
we can provide a similar capability. Those capabilities may include
specific configuration and could be a prerequisite for this feature, but
I don't think it's sensible to say we don't want this feature simply
because it can't stand alone as a perfect answer to these risks.

As has been discussed before, we are likely in a better position to
identify the concerns and problem areas, come up with recommendations
for configuration and/or develop new capabilities to mitigate those
risks, than the every-day user or DBA. If we provide it and address
these issues in a central location which is generally available, then
fixes and problems can be addressed and fixed rather than every
database implementation faced with these concerns having to address
them independently with, most likely, poorer quality solutions.

While we don't want every feature of every database, this deserves more
consideration.

Thanks,

Stephen

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 11:00:54
Message-ID:	20141010110054.GB28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Thom Brown (thom(at)linux(dot)com) wrote:
> To be honest, this all sounds rather flaky. Even if you do rate-limit
> their queries, they can use methods that avoid rate-limiting, such as
> recursive queries. And if you're only after one credit card number
> (to use the original example), you'd get it in a relatively short
> amount of time, despite some rate-limiting system.

The discussion about looking up specific card numbers in the original
email from Simon was actually an allowed use-case, as I understood it,
not a risk concern. Indeed, if you know a valid credit card number
already, as in this example, then why are you bothering with the search?
Perhaps it would provide confirmation, but it's not the database's
responsibility to make you forget the number you already have. Doing a
random walk through a keyspace of 10^16 and extracting a significant
enough number of results to be useful should be difficult. I agree that
if we're completely unable to make it difficult then this is less
useful, but I feel it's a bit early to jump to that conclusion.

> This gives the vague impression of security, but it really seems just
> the placing of a few obstacles in the way.

One might consider that all security is just placing obstacles in the
way.

> And "auditing" sounds like a euphemism for "pass the problem of
> security on elsewhere anyway".

Auditing is a known requirement for good security.. There's certainly
different levels of it, but if you aren't at least auditing your
security configuration for the attack vectors you're concerned about,
then you're unlikely to have any real security.

Thanks,

Stephen

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 11:01:10
Message-ID:	5437BC76.50401@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/10/2014 01:35 PM, Stephen Frost wrote:
> Regarding functions, 'leakproof' functions should be alright to allow,
> though Heikki brings up a good point regarding binary search being
> possible in a plpgsql function (or even directly by a client). Of
> course, that approach also requires that you have a specific item in
> mind.

It doesn't require that you have a specific item in mind. Binary search
is cheap, O(log n). It's easy to write a function to do a binary search
on a single item, passed as argument, and then apply that to all rows:

SELECT binary_search_reveal(cardnumber) FROM redacted_table;

Really, I don't see how this can possible be made to work. You can't
allow ad hoc processing of data, and still avoid revealing it to the user.

- Heikki

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 11:05:39
Message-ID:	20141010110539.GC28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
> On 10/10/2014 01:35 PM, Stephen Frost wrote:
> >Regarding functions, 'leakproof' functions should be alright to allow,
> >though Heikki brings up a good point regarding binary search being
> >possible in a plpgsql function (or even directly by a client). Of
> >course, that approach also requires that you have a specific item in
> >mind.
>
> It doesn't require that you have a specific item in mind. Binary
> search is cheap, O(log n). It's easy to write a function to do a
> binary search on a single item, passed as argument, and then apply
> that to all rows:
>
> SELECT binary_search_reveal(cardnumber) FROM redacted_table;

Note that your binary_search_reveal wouldn't be marked as leakproof and
therefore this wouldn't be allowed. If this was allowed, you'd simply
do "raise notice" inside the function and call it a day.

Thanks,

Stephen

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 11:11:16
Message-ID:	5437BED4.6000806@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/10/2014 11:38 AM, Simon Riggs wrote:
> On 10 October 2014 10:29, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>> On 10/10/2014 11:57 AM, Simon Riggs wrote:
>>> Postgres currently supports column level SELECT privileges.
>>>
>>> 1. If we want to confirm a credit card number, we can issue SELECT 1
>>> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>>>
>>> 2. If we want to look for card fraud, we need to be able to use the
>>> full card number to join to transaction data and look up blocked card
>>> lists etc..
>>>
>>> 3. We want to block the direct retrieval of card numbers for
>>> additional security.
>>> In some cases, we might want to return an answer like '**** ***** ****
>>> 7733'
>>>
>>> We can't do all of the above with current facilities inside the database.
>>
>> Deny access to the underlying tables. Write SQL functions to do 1. and 2.,
>> and grant privileges to the functions, instead. For 3. create views that do
>> the redaction.
> If everything were easy to lock down the approach you suggest is of
> course the best way.
>
> The problem there is that the SQL for (2) changes frequently, so we
> want to give people SQL access.
1. Give people access to development system with "safe" data where they
write their functions

2. once function is working, pass it to auditors

3. deploy and use the function.
> Just not the ability to retrieve data in a usable form.
For an attacker any access is "in a usable form", for honest people you
can just provide a view or set-returning function.

btw, one way to do the "redaction" you suggested above is to write a
special
type, which redacts data on output.

You can even make the type output function dependent on backup role.

Just make sure that users are aware that it is not really a security
feature
which protects against attackers.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 11:15:43
Message-ID:	5437BFDF.9010202@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/10/2014 02:05 PM, Stephen Frost wrote:
> * Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
>> On 10/10/2014 01:35 PM, Stephen Frost wrote:
>>> Regarding functions, 'leakproof' functions should be alright to allow,
>>> though Heikki brings up a good point regarding binary search being
>>> possible in a plpgsql function (or even directly by a client). Of
>>> course, that approach also requires that you have a specific item in
>>> mind.
>>
>> It doesn't require that you have a specific item in mind. Binary
>> search is cheap, O(log n). It's easy to write a function to do a
>> binary search on a single item, passed as argument, and then apply
>> that to all rows:
>>
>> SELECT binary_search_reveal(cardnumber) FROM redacted_table;
>
> Note that your binary_search_reveal wouldn't be marked as leakproof and
> therefore this wouldn't be allowed. If this was allowed, you'd simply
> do "raise notice" inside the function and call it a day.

*shrug*, just do the same with a more complicated query, then. Even if
you can't create a function that does that, you can still execute the
same logic without a function.

- Heikki

From:	Thom Brown <thom(at)linux(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 11:25:59
Message-ID:	CAA-aLv7J565FhEv33Rzi9K-5e5fL5pUU4OysRhy9XZxAKyOU4g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 12:00, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Thom Brown (thom(at)linux(dot)com) wrote:
>> To be honest, this all sounds rather flaky. Even if you do rate-limit
>> their queries, they can use methods that avoid rate-limiting, such as
>> recursive queries. And if you're only after one credit card number
>> (to use the original example), you'd get it in a relatively short
>> amount of time, despite some rate-limiting system.
>
> The discussion about looking up specific card numbers in the original
> email from Simon was actually an allowed use-case, as I understood it,
> not a risk concern. Indeed, if you know a valid credit card number
> already, as in this example, then why are you bothering with the search?

The topic being "column redaction" rather than "column formatting"
leads me to believe that the main use-case of the feature would be to
prevent the user from discovering the full value of the column. It's
not so much point 1 I was responding do, rather point 3, where you
don't know the card number, but you get information about it in the
results. The purpose of this feature would be to prevent the user
from seeing all that data, which is a security feature, but at the
moment it just seems to be a way of making it a little less easy to
get at that data.

>> This gives the vague impression of security, but it really seems just
>> the placing of a few obstacles in the way.
>
> One might consider that all security is just placing obstacles in the
> way.

There's a difference between intending that there shouldn't be a way
past security and just making access a matter of walking a longer
route.

I wouldn't be against formatting per se, but for the purposes of that,
I would say that views can already serve that purpose.

--
Thom

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 11:27:47
Message-ID:	20141010112747.GD28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
> On 10/10/2014 02:05 PM, Stephen Frost wrote:
> >* Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
> >>On 10/10/2014 01:35 PM, Stephen Frost wrote:
> >>>Regarding functions, 'leakproof' functions should be alright to allow,
> >>>though Heikki brings up a good point regarding binary search being
> >>>possible in a plpgsql function (or even directly by a client). Of
> >>>course, that approach also requires that you have a specific item in
> >>>mind.
> >>
> >>It doesn't require that you have a specific item in mind. Binary
> >>search is cheap, O(log n). It's easy to write a function to do a
> >>binary search on a single item, passed as argument, and then apply
> >>that to all rows:
> >>
> >>SELECT binary_search_reveal(cardnumber) FROM redacted_table;
> >
> >Note that your binary_search_reveal wouldn't be marked as leakproof and
> >therefore this wouldn't be allowed. If this was allowed, you'd simply
> >do "raise notice" inside the function and call it a day.
>
> *shrug*, just do the same with a more complicated query, then. Even
> if you can't create a function that does that, you can still execute
> the same logic without a function.

Not sure I see what you're getting at here..? My point was that you'd
need a target number and the system would only provide confirmation that
the number exists, or does not. Your argument was that the table
itself would provide the target number, which was flawed. I don't see
how "just do the same with a more complicated query" removes the need to
have a target number for the binary search.

A better argument would be the equality case than the binary search if
you're simply looking for confirmation of existence. If the user can
define a table of targets, or uses a VALUES construct, and then join to
it then we might build a hash table and provide those results faster
than a binary search, though this again means that the user is
providing the list of keys to check.

As mentioned elsewhere on the thread, I agree that this capability
wouldn't be useful if a random search (which is providing the 'targets')
through a 10^16 keyspace generated a significant number of results (I'd
also throw in there "in a reasonable amount of time"- clearly it'd be
possible to extract all keys given sufficient time, even with a random
search). The sketch that Simon outlined won't obviously provide that
guarantee, but I'm not prepared to say we couldn't provide it at all
while meeting the goal he outlined.

Thanks,

Stephen

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 11:45:46
Message-ID:	20141010114546.GE28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Thom Brown (thom(at)linux(dot)com) wrote:
> On 10 October 2014 12:00, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > The discussion about looking up specific card numbers in the original
> > email from Simon was actually an allowed use-case, as I understood it,
> > not a risk concern. Indeed, if you know a valid credit card number
> > already, as in this example, then why are you bothering with the search?
>
> The topic being "column redaction" rather than "column formatting"
> leads me to believe that the main use-case of the feature would be to
> prevent the user from discovering the full value of the column.

I believe the idea is to limit the chances that a user with limited
pre-existing knowledge would be able to determine the full value of
items in the column, especially in bulk.

> It's
> not so much point 1 I was responding do, rather point 3, where you
> don't know the card number, but you get information about it in the
> results.

We'd certainly want to prevent that to the limit possible. Do you have
a specific thought about how they'd be able to find a full number beyond
a random search..?

> The purpose of this feature would be to prevent the user
> from seeing all that data, which is a security feature, but at the
> moment it just seems to be a way of making it a little less easy to
> get at that data.

I certainly appreciate the thought challenges and critique and I'm
hopeful we could make it more than "a little less easy" to get at the
information. If we aren't able to do that, then the feature isn't
useful, certainly.

> >> This gives the vague impression of security, but it really seems just
> >> the placing of a few obstacles in the way.
> >
> > One might consider that all security is just placing obstacles in the
> > way.
>
> There's a difference between intending that there shouldn't be a way
> past security and just making access a matter of walking a longer
> route.

Throwing random 16-digit numbers and associated information at a credit
card processor could be viewed as "walking a longer route" too. The
same goes for random key searches or password guesses.

Thanks,

Stephen

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 12:25:06
Message-ID:	5437D022.1080408@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/10/2014 02:27 PM, Stephen Frost wrote:
> * Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
>> On 10/10/2014 02:05 PM, Stephen Frost wrote:
>>> * Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
>>>> On 10/10/2014 01:35 PM, Stephen Frost wrote:
>>>>> Regarding functions, 'leakproof' functions should be alright to allow,
>>>>> though Heikki brings up a good point regarding binary search being
>>>>> possible in a plpgsql function (or even directly by a client). Of
>>>>> course, that approach also requires that you have a specific item in
>>>>> mind.
>>>>
>>>> It doesn't require that you have a specific item in mind. Binary
>>>> search is cheap, O(log n). It's easy to write a function to do a
>>>> binary search on a single item, passed as argument, and then apply
>>>> that to all rows:
>>>>
>>>> SELECT binary_search_reveal(cardnumber) FROM redacted_table;
>>>
>>> Note that your binary_search_reveal wouldn't be marked as leakproof and
>>> therefore this wouldn't be allowed. If this was allowed, you'd simply
>>> do "raise notice" inside the function and call it a day.
>>
>> *shrug*, just do the same with a more complicated query, then. Even
>> if you can't create a function that does that, you can still execute
>> the same logic without a function.
>
> Not sure I see what you're getting at here..? My point was that you'd
> need a target number and the system would only provide confirmation that
> the number exists, or does not. Your argument was that the table
> itself would provide the target number, which was flawed. I don't see
> how "just do the same with a more complicated query" removes the need to
> have a target number for the binary search.

You said above that it's OK to pass the card numbers to leakproof
functions. But if you allow that, you can write a function that takes as
argument a redacted card number, and unredacts it (using the < and =
operators in a binary search). And then you can just do "SELECT
unredact(card_number) from redacted_table".

You seem to have something stronger in mind: only allow the equality
operator on the redacted column, and nothing else. That might be better,
although I'm not really convinced. There are just too many ways you
could still leak the datum. Just a random example, inspired by the
recent CRIME attack on SSL: build a row with the redacted datum, and
another "guess" datum, and store it along with 1k of other data in a
temporary table. The row gets toasted. Observe how much it compressed;
if the guess datum is close to the original datum, it compresses well.
Now, you can probably stop that particular attack with more restrictions
on what you can do with the datum, but that just shows that pretty much
any computation you allow with the datum can be used to reveal its value.

- Heikki

From:	Thom Brown <thom(at)linux(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 12:28:01
Message-ID:	CAA-aLv4wPj+uU-j613oFwPfURKWi4rjhRi3-YKFxDMysw8LSkg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 12:45, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> >> This gives the vague impression of security, but it really seems just
>> >> the placing of a few obstacles in the way.
>> >
>> > One might consider that all security is just placing obstacles in the
>> > way.
>>
>> There's a difference between intending that there shouldn't be a way
>> past security and just making access a matter of walking a longer
>> route.
>
> Throwing random 16-digit numbers and associated information at a credit
> card processor could be viewed as "walking a longer route" too. The
> same goes for random key searches or password guesses.

But those would need to be exhaustive, and in nearly all cases,
impractical. Data such as plain credit card numbers stored in a
column, even with all its data masked, would be easy to determine.
Salted and hashed passwords, even with complete visibility of the
value, isn't vulnerable to scrutiny of particular character values.
If it were, no-one would use it.

Thom

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 12:43:01
Message-ID:	CA+U5nM+Snx4fZZJwBPmFKRdx=vWLLED5n8xGdM2ygzMfvGJN+w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 11:45, Thom Brown <thom(at)linux(dot)com> wrote:

> To be honest, this all sounds rather flaky.

To be honest, suggesting anything at all is rather difficult and I
recommend people try it.

Everything sounds crap when you didn't think of it and you've given it
an hour's thought.

I'm not blind to the difficulties raised and I thank you for your
input, but I think its too early to make sweeping generalisations.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 12:53:04
Message-ID:	CA+U5nMLbrv1_05-zry+nxkss7VOQ8e5-A7w4=HcRqLhX=rVqhA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 12:01, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> Really, I don't see how this can possible be made to work. You can't allow
> ad hoc processing of data, and still avoid revealing it to the user.

Anyone with unmonitored access and sufficient time can break through security.

I think that is true of any kind of security, and so it is true here also.

Auditing and controls are required also, that's why I suggested those
first. This proposal was looking beyond that to what we might need
next.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Thom Brown <thom(at)linux(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 12:58:35
Message-ID:	CA+TgmoYBBve9=xjH6ak5=A2Qg3FMQsOiA1u=K0ZoP=1svwzg=A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 10, 2014 at 7:00 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Thom Brown (thom(at)linux(dot)com) wrote:
>> To be honest, this all sounds rather flaky. Even if you do rate-limit
>> their queries, they can use methods that avoid rate-limiting, such as
>> recursive queries. And if you're only after one credit card number
>> (to use the original example), you'd get it in a relatively short
>> amount of time, despite some rate-limiting system.
>
> The discussion about looking up specific card numbers in the original
> email from Simon was actually an allowed use-case, as I understood it,
> not a risk concern. Indeed, if you know a valid credit card number
> already, as in this example, then why are you bothering with the search?
> Perhaps it would provide confirmation, but it's not the database's
> responsibility to make you forget the number you already have. Doing a
> random walk through a keyspace of 10^16 and extracting a significant
> enough number of results to be useful should be difficult. I agree that
> if we're completely unable to make it difficult then this is less
> useful, but I feel it's a bit early to jump to that conclusion.

You are obviously wearing your rose-colored glasses this morning. I
predict a competent SQL programmer could write an SQL function, or
client-side code, to pump the data out of the database using binary
search in milliseconds per row. And I think it's more likely than not
that there are other techniques that are much faster. The idea that
you're going to be able to let people query the data but not actually
retrieve it should be viewed with great skepticism. This is the
equivalent of telling a child that she can't open her Christmas
presents until Christmas, but she can shake them, hold them up to a
bright light, and/or X-ray the packages. If she doesn't know what's
in there by the time she opens it, it's just for lack of effort.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Thom Brown <thom(at)linux(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 13:05:05
Message-ID:	CAA-aLv55bujC5mt2Go5Xz-Kc2O5HftT==UVXfHPzOMKnuvyUjg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 13:43, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 10 October 2014 11:45, Thom Brown <thom(at)linux(dot)com> wrote:
>
>> To be honest, this all sounds rather flaky.
>
> To be honest, suggesting anything at all is rather difficult and I
> recommend people try it.

I have, and most ideas I've had have been justifiably shot down or
picked apart (scheduled background tasks, offloading stats collection
to standby, index maintenance in DML query plans, expression
statistics... to name but a few).

> Everything sounds crap when you didn't think of it and you've given it
> an hour's thought.

I'm not sure that means my concerns aren't valid. I don't think it
sounds crap, but I also can't see any use-case for it where we don't
already have things covered, or where it's going to offer any useful
level of security. Like with RLS, it may be that I'm just looking at
things from the wrong perspective.
--
Thom

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 13:18:48
Message-ID:	CAGTBQpZaAHBTPJ9zcm1qCzZHn8M0cAoE_ZMvJ5fF=7CqL47S_g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 10, 2014 at 5:57 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
> ...
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like '**** ***** **** 7733'

I wouldn't want to allow that:

select ref.ref, customer.name from (select generate_series as ref from
generate_series(0, 9999999999999999)) ref, customer
where ref.ref = stored_card_number.ref

May take a long while. Just disable everything except nestloop and
suck up the data as it comes. Can be optimized. Not sure how you'd
avoid this, not trivial at all. Not possible at all I'd venture.

But if you really really want to allow this, encrypt the column, and
provide a C function that can decrypt it. You can join encrypted
columns, and you can even include the last 4 digits unencrypted if you
want (I wouldn't want).

Has to be a C function to be able to avoid leaking the key, btw.

> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..

view works for this pretty well

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 14:53:55
Message-ID:	20141010145355.GF28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
> You said above that it's OK to pass the card numbers to leakproof
> functions. But if you allow that, you can write a function that
> takes as argument a redacted card number, and unredacts it (using
> the < and = operators in a binary search). And then you can just do
> "SELECT unredact(card_number) from redacted_table".

Not sure I'm following what you mean by 'redacted'. The original
proposal provided '**** **** **** 1234' as the 'redacted' number, and
I'm not seeing how you can get the rest of the number trivially with
just equality and binary search.

If you start with a complete number then you can get the system to tell
you if it exists or not with a binary search or even just doing an
equality check.

> You seem to have something stronger in mind: only allow the equality
> operator on the redacted column, and nothing else.

That wasn't my suggestion- I was merely pointing out that if you have a
complete number (perhaps by pulling out a random number, with a filter
against the last four digits, reducing the search space to 10^12) which
you want to check for existance, you can do that directly. No need for
a binary search at all.

> That might be
> better, although I'm not really convinced. There are just too many
> ways you could still leak the datum. Just a random example, inspired
> by the recent CRIME attack on SSL: build a row with the redacted
> datum, and another "guess" datum, and store it along with 1k of
> other data in a temporary table. The row gets toasted. Observe how
> much it compressed; if the guess datum is close to the original
> datum, it compresses well. Now, you can probably stop that
> particular attack with more restrictions on what you can do with the
> datum, but that just shows that pretty much any computation you
> allow with the datum can be used to reveal its value.

One concept I've been thinking about is a notion of 'trusted' data
sources to allow comparison against. Perhaps individual values are
allowed from the user also, but my thought is that you have:

master_table
trusted_table

Such that you can't view the sensetive column in either the master or
the trusted table, but you can join between the two on the sensetive
column and view other, non-sensetive, attributes of the two tables. You
might even allow other transformations on the sensetive column, provided
it always results in a boolean comparison to another sensetive column.
Not sure if that really solves Simon's use-case exactly, but it might
tease out other thoughts.

Thanks!

Stephen

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 14:56:16
Message-ID:	20141010145616.GG28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Thom Brown (thom(at)linux(dot)com) wrote:
> On 10 October 2014 12:45, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> >> There's a difference between intending that there shouldn't be a way
> >> past security and just making access a matter of walking a longer
> >> route.
> >
> > Throwing random 16-digit numbers and associated information at a credit
> > card processor could be viewed as "walking a longer route" too. The
> > same goes for random key searches or password guesses.
>
> But those would need to be exhaustive, and in nearly all cases,
> impractical.

That would be exactly the idea with this- we make it impractical to get
at the unredacted information.

> Data such as plain credit card numbers stored in a
> column, even with all its data masked, would be easy to determine.

I'm not as convinced of that as you are.. Though I'll point out that in
the use-cases which I've been talking to users about, it isn't credit
cards under discussion.

> Salted and hashed passwords, even with complete visibility of the
> value, isn't vulnerable to scrutiny of particular character values.
> If it were, no-one would use it.

I wasn't suggesting otherwise, but I don't see it as particularly
relevant to the discussion regardless.

Thanks,

Stephen

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Thom Brown <thom(at)linux(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 14:58:07
Message-ID:	20141010145807.GH28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Fri, Oct 10, 2014 at 7:00 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > The discussion about looking up specific card numbers in the original
> > email from Simon was actually an allowed use-case, as I understood it,
> > not a risk concern. Indeed, if you know a valid credit card number
> > already, as in this example, then why are you bothering with the search?
> > Perhaps it would provide confirmation, but it's not the database's
> > responsibility to make you forget the number you already have. Doing a
> > random walk through a keyspace of 10^16 and extracting a significant
> > enough number of results to be useful should be difficult. I agree that
> > if we're completely unable to make it difficult then this is less
> > useful, but I feel it's a bit early to jump to that conclusion.

Thanks much for the laugh. :)

> You are obviously wearing your rose-colored glasses this morning. I
> predict a competent SQL programmer could write an SQL function, or
> client-side code, to pump the data out of the database using binary
> search in milliseconds per row.

Clearly, if we're unable to prevent that, then this feature wouldn't be
useful. What would be helpful is to consider what we could provide
along these lines without allowing the data to be trivially recovered.

Thanks!

Stephen

From:	Thom Brown <thom(at)linux(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 15:27:11
Message-ID:	CAA-aLv6enVcVasqfsdk6gFMJiQU5bwM89ixoT=b598cs1+zYqA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 15:56, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Thom Brown (thom(at)linux(dot)com) wrote:
>> Data such as plain credit card numbers stored in a
>> column, even with all its data masked, would be easy to determine.
>
> I'm not as convinced of that as you are.. Though I'll point out that in
> the use-cases which I've been talking to users about, it isn't credit
> cards under discussion.

I think credit card numbers are a good example. If we're talking
about format functions here, there has to be something in addition to
that which determines permitted comparison operations. If not, and we
were going to remove all but = operations, we'd effectively cripple
the functionality of anything that's been formatted that wasn't
intended as a security measure. It almost sounds like an extension to
domains rather than column-level functionality.

But then if operators such as <, > and ~~ aren't hindered, it sounds
like no protection at all.

Also, joining to foreign tables could be an issue, copying data to
temporary tables could possibly remove any redaction, materialised
views would need to support it somehow. Although just because I can't
picture how that would work, it's no indication that it couldn't.

>> Salted and hashed passwords, even with complete visibility of the
>> value, isn't vulnerable to scrutiny of particular character values.
>> If it were, no-one would use it.
>
> I wasn't suggesting otherwise, but I don't see it as particularly
> relevant to the discussion regardless.

I guess I was trying to illustrate that the security in a hashed
password is acceptable because it requires exhaustive searching to
break. If comparison operators worked on it, it would be broken out
of the box.

--
Thom

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 15:37:23
Message-ID:	CAGTBQpamEEqsW+CVeYreXSiaZLhJJSpDQThYTNGx=f6SjOsEuA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 10, 2014 at 11:58 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> You are obviously wearing your rose-colored glasses this morning. I
>> predict a competent SQL programmer could write an SQL function, or
>> client-side code, to pump the data out of the database using binary
>> search in milliseconds per row.
>
> Clearly, if we're unable to prevent that, then this feature wouldn't be
> useful. What would be helpful is to consider what we could provide
> along these lines without allowing the data to be trivially recovered.

Joins are way too powerful to allow arbitrary joins to untrusted users.

The only somewhat secure way is to allow administrators define which
joins are possible, and untrusted users use those.

You get that with views. I'm not sure you can allow more than that,
and not have lots of leaks.

Is there a use case where redaction is the only solution really?

Nothing mentioned till now really is:

* Transaction logs and blocked card lists can be joined against users
and a view can be provided that includes the user, and not the credit
card. So you can join freely between the views just fine, by user, and
do all the analysis you need without exposing credit card numbers in
any way, not even redacted.

* If not users, you can join against a random but unique per card
value generated at some point when the card is first inserted in the
records, and you get a random token for the card. Still works, and can
be done with triggers, and is far less leaky than the proposed
redaction.

* Credit card number verification is a leak on its own, but if you
really want it, you can provide a function that does it. And I think
it's perfectly reasonable that defining leaking functions has to be an
admin thing.

* Views can expose the redacted value just fine for direct use. A
generically usable user-id or random-token to redacted number mapping
view would provide all the freedom you could want.

* Functions defined as leakproof (even when they're not, which is an
admin decision to throw data safety out the window, but it's a
possible decision), which allows fetching redacted columns that way
from security barrier views.

Is there anything not covered by the above that can be done by
built-in redacting?

If the answer is yes, then maybe the feature has value.

If the feature's value is ease of use, I'd weight that with the
security loss. False sense of security is a net security loss in most
(if not all) cases. Having to flesh out the logic through security
barrier views, leakproof redacting functions and triggers can have the
good side-effect of making all the possible leaks obvious to the
admin.

On Fri, Oct 10, 2014 at 12:27 PM, Thom Brown <thom(at)linux(dot)com> wrote:
> Also, joining to foreign tables could be an issue, copying data to
> temporary tables could possibly remove any redaction, materialised
> views would need to support it somehow. Although just because I can't
> picture how that would work, it's no indication that it couldn't.

Well, that's why encryption is usually regulatorily required on credit
card data. Way too many ways to leak, and way too valuable to expect
lack of knowledgeable and motivated people trying to get them.

On Fri, Oct 10, 2014 at 12:27 PM, Thom Brown <thom(at)linux(dot)com> wrote:
>>> Salted and hashed passwords, even with complete visibility of the
>>> value, isn't vulnerable to scrutiny of particular character values.
>>> If it were, no-one would use it.
>>
>> I wasn't suggesting otherwise, but I don't see it as particularly
>> relevant to the discussion regardless.
>
> I guess I was trying to illustrate that the security in a hashed
> password is acceptable because it requires exhaustive searching to
> break. If comparison operators worked on it, it would be broken out
> of the box.

Lately, the security of password-based authentication is being put
into question very often. So I wouldn't hold credit card numbers or
any other sensible information to the password standard.

But lets use the password example: it's widely accepted that holding
onto cleartext passwords or even transmitting over any channel them or
their plain hashes to be extremely bad practice. So redaction isn't
good enough for passwords, nor is salted hashing either. The only
generally accepted way on the security community, is a password proof
in the context of a zero-knowledge password proof protocol[0]. You'd
want something like that for any bit of info you need to "join" or
"compare" but you can't accept leaking it.

[0] http://en.wikipedia.org/wiki/Zero-knowledge_password_proof

From:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Thom Brown <thom(at)linux(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 15:45:37
Message-ID:	CAKddOFCgoh85EPYVm9O0Z6_SgejJEWpU0ogObSkp-WjHk+r9WQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 10, 2014 at 10:56 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> * Thom Brown (thom(at)linux(dot)com) wrote:
> > On 10 October 2014 12:45, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > >> There's a difference between intending that there shouldn't be a way
> > >> past security and just making access a matter of walking a longer
> > >> route.
> > >
> > > Throwing random 16-digit numbers and associated information at a credit
> > > card processor could be viewed as "walking a longer route" too. The
> > > same goes for random key searches or password guesses.
> >
> > But those would need to be exhaustive, and in nearly all cases,
> > impractical.
>
> That would be exactly the idea with this- we make it impractical to get
> at the unredacted information.
>

For fun I gave the search a try.

create table cards (id serial, cc bigint);
insert into cards (cc)
SELECT CAST(random() * 9999999999999999 AS bigint) FROM
generate_series(1,10000);

\timing on
WITH RECURSIVE t(id, range_min, range_max) AS (
SELECT id, 1::bigint, 9999999999999999 FROM cards
UNION ALL
SELECT id
, CASE WHEN cc >= range_avg THEN range_avg ELSE range_min END
, CASE WHEN cc <= range_avg THEN range_avg ELSE range_max END
FROM (SELECT id, (range_min + range_max) / 2 AS range_avg, range_min,
range_max
FROM t
) AS t_avg
JOIN cards USING (id)
WHERE range_min != range_max
)
SELECT id, range_min AS cc FROM t WHERE range_min = range_max;

On my laptop I can pull all 10,000 card numbers in less than 1 second. For
a text based item I don't imagine it would be much different. Numbers are
pretty easy to work with though.

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 18:03:13
Message-ID:	1412964193.25454.YahooMailNeo@web122305.mail.ne1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thom Brown <thom(at)linux(dot)com> wrote:
> On 10 October 2014 15:56, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> Thom Brown (thom(at)linux(dot)com) wrote:
>>> Data such as plain credit card numbers stored in a
>>> column, even with all its data masked, would be easy to determine.
>>
>> I'm not as convinced of that as you are.. Though I'll point out that in
>> the use-cases which I've been talking to users about, it isn't credit
>> cards under discussion.
>
> I think credit card numbers are a good example.

I'm not so sure. Aren't credit card numbers generally required by
law to be stored in an encrypted form?

> If we're talking
> about format functions here, there has to be something in addition to
> that which determines permitted comparison operations. If not, and we
> were going to remove all but = operations, we'd effectively cripple
> the functionality of anything that's been formatted that wasn't
> intended as a security measure. It almost sounds like an extension to
> domains rather than column-level functionality.

I have to say that my first thought was that format functions
associated with types with domain override would be a very nice
capability. But I don't see where that has much to do with
security. I have seen many places where redaction is necessary
(and in fact done), but I don't see how that could be addressed by
what Simon is proposing. Perhaps I'm missing something; if so, a
more concrete exposition of a use case might allow things to
"click".

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 18:26:27
Message-ID:	20141010182627.GA4122@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 10, 2014 at 02:05:05PM +0100, Thom Brown wrote:
> On 10 October 2014 13:43, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On 10 October 2014 11:45, Thom Brown <thom(at)linux(dot)com> wrote:
> >
> >> To be honest, this all sounds rather flaky.
> >
> > To be honest, suggesting anything at all is rather difficult and I
> > recommend people try it.
>
> I have, and most ideas I've had have been justifiably shot down or
> picked apart (scheduled background tasks, offloading stats collection
> to standby, index maintenance in DML query plans, expression
> statistics... to name but a few).
>
> > Everything sounds crap when you didn't think of it and you've given it
> > an hour's thought.
>
> I'm not sure that means my concerns aren't valid. I don't think it
> sounds crap, but I also can't see any use-case for it where we don't
> already have things covered, or where it's going to offer any useful
> level of security. Like with RLS, it may be that I'm just looking at
> things from the wrong perspective.

Agreed. The problem isn't giving it only an hours thought --- it is
that we can come up with serious problems in five _seconds_ of thought.
Unless you can some up with a solution to those issues, I am not sure
why we are even talking about it.

My other concern is you must have realized these issues in five seconds
too, so why didn't you mention them?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc:	Thom Brown <thom(at)linux(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 20:49:31
Message-ID:	20141010204930.GO28859@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Rod,

* Rod Taylor (rod(dot)taylor(at)gmail(dot)com) wrote:
> For fun I gave the search a try.

Neat!

> On my laptop I can pull all 10,000 card numbers in less than 1 second. For
> a text based item I don't imagine it would be much different. Numbers are
> pretty easy to work with though.

I had been planning to give something like this a shot once I got back
from various meetings today- so thanks! Being able to use the CC # *as*
the target for the binary search is definitely an issue, though looking
back on the overall problem space, CC's are less than 54 bits, and it's
actually a smaller space than than that if you know how they're put
together.

My thought on an attack was more along these lines:

select * from cards join (SELECT CAST(random() * 9999999999999999 AS
bigint) a from generate_series(1,1000000)) as foo on (cards.cc = foo.a);

Which could pretty quickly find ~500 CC #s in a second or so (with a
'cards' table of about 1M entries) based on my testing. That's clearly
sufficient enough to make it a viable attack also.

The next question I have is- do(es) the other vendor(s) provide a way to
address this or is it simply known that this doesn't offer any
protection at all from adhoc queries and it's strictly for formatting?
I can certainly imagine it actually being a way to simply avoid
*inadvertant* exposure rather than providing any security from the
individual running the commands. I'm not sure that would make it
genuinely different enough from simply maintaining a view which does
that filtering to make it useful on its own as a feature though, but I'm
not particularly strongly against it either.

Thanks!

Stephen

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-10 21:09:47
Message-ID:	CA+U5nM+un4ZKox8h52rJnpiF746TcuDaHKC=1nyUDi-gJ7dVrQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 19:26, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Fri, Oct 10, 2014 at 02:05:05PM +0100, Thom Brown wrote:
>> On 10 October 2014 13:43, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> > On 10 October 2014 11:45, Thom Brown <thom(at)linux(dot)com> wrote:
>> >
>> >> To be honest, this all sounds rather flaky.

> My other concern is you must have realized these issues in five seconds
> too, so why didn't you mention them?

Because the problems that you come up with in 5 seconds aren't
necessarily problems. You just think they are, given 5 seconds
thought. I think my first impression of the concept was poor also
though it would be wonderful if I had remembered all of my initial
objections.

I didn't have any problem with Thom's first post, which was helpful in
allowing me to explain the context and details. As I said in reply at
that point, this is not in itself a barrier; other measures are
necessary. The rest of the thread has descended into a massive
misunderstanding of the purpose and role of redaction.

When any of us move too quickly to a value judgement about a new
concept then we're probably missing the point.

All of us will be asked at sometime in the next few years why Postgres
doesn't have redaction. When you get it, post back here please. Or if
you win the argument on it not being useful in any circumstance, post
that here also. I'm not in a rush.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject:	Re: Column Redaction
Date:	2014-10-11 07:40:58
Message-ID:	CA+U5nMLH9muxY7fwLxXiuzAewj=wVh8UsNWwLLWxk6Aq3rF8Pw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 16:45, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:

> On my laptop I can pull all 10,000 card numbers in less than 1 second.

Right. Like I said: covert channels exist. Great example of how to
exploit them, thanks. Cool SQL.

What could be the use of "a security feature that does not prevent security"?

As soon as you issue the above query, you have clearly indicated your
intention to steal. Receiving information is no longer accidental, it
is an explicit act that is logged in the auditing system against your
name. This is sufficient to bury you in court and it is now a real
deterrent. Redaction has worked.

Redaction is similar to a 3m high razor wire fence. The fence reminds
you of what is correct and dissuades you from going further. The fence
does not prevent access by a determined and skillful agent (Rod), but
the CCTV cameras that are set out will record the action. It will be
almost impossible to claim you were just walking your dog, and the
wire cutters were a gift for your brother in law.

Redaction prevents accidental information loss only, forcing any loss
that occurs to be explicit. It ensures that loss of information can be
tied clearly back to an individual, like an ink packet that stains the
fingers of a thief.

I don't have a word or pithy phrase for this concept. Maybe something
related to "forcing their hand", flushing game into the open, or
simply preventing "tipping your hand" and inadvertently allowing data
loss.

Redaction clearly relies completely on auditing before it can have any
additional effect. And the effectiveness of redaction needs to be
understood next to Rod's example.

Since it relies on auditing, we need to do that first.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-11 08:51:28
Message-ID:	CA+U5nM+6xjj3gG---kqUyvQQKBmZ85F3H_b6Pp3DOM3qyy_-bQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 October 2014 11:27, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> I googled for Oracle Data redaction, and found "General Usage guidelines":
>
>> General Usage Guidelines
>>
>> * Oracle Data Redaction is not intended to protect against attacks by
>> privileged database users who run ad hoc queries directly against the
>> database.
>>
>> * Oracle Data Redaction is not intended to protect against users who
>> run exhaustive SQL queries that attempt to determine the actual
>> values by inference.
>
>
> So it's not actually suitable for the example you gave. I don't think we
> want this feature...

The full quote I read is the following...

"Even though Oracle Data Redaction is not intended to protect against
attacks by database users who run ad hoc queries directly against the
database, it can hide sensitive data for these ad hoc query scenarios
when you couple it with other preventive and detective controls."

That full context would have been useful.

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject:	Re: Column Redaction
Date:	2014-10-11 13:41:26
Message-ID:	54393386.8040607@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/11/2014 02:40 AM, Simon Riggs wrote:
> As soon as you issue the above query, you have clearly indicated
> your intention to steal. Receiving information is no longer
> accidental, it is an explicit act that is logged in the auditing
> system against your name. This is sufficient to bury you in court
> and it is now a real deterrent. Redaction has worked.
>
> Redaction is similar to a 3m high razor wire fence. The fence
> reminds you of what is correct and dissuades you from going
> further. The fence does not prevent access by a determined and
> skillful agent (Rod), but the CCTV cameras that are set out will
> record the action. It will be almost impossible to claim you were
> just walking your dog, and the wire cutters were a gift for your
> brother in law.
>
> Redaction prevents accidental information loss only, forcing any
> loss that occurs to be explicit. It ensures that loss of
> information can be tied clearly back to an individual, like an ink
> packet that stains the fingers of a thief.
>
> I don't have a word or pithy phrase for this concept. Maybe
> something related to "forcing their hand", flushing game into the
> open, or simply preventing "tipping your hand" and inadvertently
> allowing data loss.
>
> Redaction clearly relies completely on auditing before it can have
> any additional effect. And the effectiveness of redaction needs to
> be understood next to Rod's example.
>
> Since it relies on auditing, we need to do that first.

This is a really good summary. I definitely know of folks who would be
interested in this feature, but I also agree, as you have said, it
relies on a good audit trail.

Joe

- --
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUOTOGAAoJEDfy90M199hlswcP/1qUtwvsb+a4hKqL3FsIIkmK
+2f5x+TRm1C5B04QhVa4A7iOr+lfzcoGChV2x2EwCqKJWNzwcpZfB/vBNv593KU4
/WZ+r0o0Hih69dE8gAS602xkrw8x3iAqcTzfyrfiE2O9yhYjoCmqqPls6PtgACc7
JI9pNiPRO+Sd2B308FaD70KkbnGDjMeFPgrxU7NRZwf0NG/bkDq28vSJl5QLg6DO
lFEtB1mMVWWmlnfTgw+zTXamxPJZTLK2Z38OBX3mjjD+64kEMjI5YQ39X8T9Ndfu
0dCA6KCqfCiy/ANETv0ScdoO/uiEQ6VfkbXy1lHK9sWDgu7HOwTPo4c0ft4tILDK
NIXvCYAFK0aPzuEVLFfwf6wm6BP7kuJ+42fY+VwMwCkt4DoQpLRJChIQzJ9ilmK2
suMSmC/sxHeRkLwRAo4uHyAzLZbectq3VC6Zdjlx35jdWG7We1katBoIU8MOC0sc
YFcUJRQk+PTxjp1fOPS7szDZulCMMXP4s0v07hiW5z6EaY82I9mJk6dnuk8eha16
3h4zBgbkM9hZhKLlbwLFSUKZrQdUklRJDXQhUuUqSIOQAU02zEKs2Pl0w1l+h5CY
cb0xPfvkIVPgrDMRfEhdbr+rh2jcEE4gQeuWNe0cexuyZiKI+Xc2MLscaeqIeBNJ
bEur+OvRj+wlnrYPGA80
=gTcG
-----END PGP SIGNATURE-----

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-11 13:43:55
Message-ID:	20141011134355.GA20155@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 11, 2014 at 09:51:28AM +0100, Simon Riggs wrote:
> > So it's not actually suitable for the example you gave. I don't think we
> > want this feature...
>
> The full quote I read is the following...
>
> "Even though Oracle Data Redaction is not intended to protect against
> attacks by database users who run ad hoc queries directly against the
> database, it can hide sensitive data for these ad hoc query scenarios
> when you couple it with other preventive and detective controls."
>
> That full context would have been useful.

OK, that certainly helps.

I think the interesting question, though, is whether we can create a
data type that doesn't have any casting or comparison functions, and has
limited or no output function, and is useful. Are there are cases where
you would want to store data in a database that could not be fully
viewed but still would be useful to be stored.

For example, for a credit card type, you would output the last four
digits, but is there any value to storing the non-visible digits? You
can check the checksum of the digits, but that can be done on input and
doesn't require the storage of the digits. Is there some function we
could provide that would make that data type useful? Could we provide
comparison functions with delays or increasing delays?

I can think of a useful fully-redacted data type example, and that would
be the credit card expire date. You could store that in a field that
has no output or comparison functions, but you could provide a useful
function that would tell whether the expire date had passed based on the
system date. It would be useful to store such a date, and a user could
know the data value only after it had expired.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

From:	Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-12 20:09:36
Message-ID:	543AE000.1040607@archidevsys.co.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/10/14 21:57, Simon Riggs wrote:
> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like '**** ***** **** 7733'
>
> We can't do all of the above with current facilities inside the database.
>
> The ability to mask output for data in certain cases, for the purpose
> of security, is known lately as data redaction, or column-level data
> redaction.
>
> The best way to support this requirement would be to allow columns to
> have an additional "output formatting function". This would be
> executed only when data is about to be returned by a query. All other
> uses of that would not restrict the data.
>
> This would have other uses as well, such as default report formats, so
> we can store financial amounts as NUMERIC, but format them on
> retrieval as $12,345.78 etc..
>
> Suggested user interface would be...
> FORMAT functionname(parameters, if any)
>
> e.g.
> CREATE TABLE customer
> ( id ...
> ...
> , stored_card_number NUMERIC FORMAT pci_card_number_redaction()
> ...
> );
>
> We'd need to implement something to allow pg_dump to ignore format
> functions. I suggest the best way to do that is by providing a BACKUP
> role that can be delegated to other users. We would then allow a
> parameter for SET output_formatting = on | off, which can only be set
> by superuser and BACKUP role, then have pg_dump issue SET
> output_formatting = off explicitly when it runs.
>
> Do we want redaction in PostgreSQL?
> Do we want it generalised into output format functions?
>

I think having a FORMAT option would be good, but I strongly feel that
end users should NEVER EVER have direct access to any database with
sensitive information! And if the full details are stored, then
obviously, at some time people will have a legitimate need to access all
the digits - so it does not make sense to prevent this .

Also I think it would be useful to store formats, especially complicated
ones, so they can be defined once and reused as many times as required
- helps for standardisation.

How about something like:

CREATE FORMAT /format-name/ [WITH] /format-spec/ [DENY | ALLOW role-1, ...];

Where the /format-spec/ is either a function, or something similar to a
COBOL picture spec., I suspect that the implied security control with
the ALLOW & DENY options might prove too weak for anyone determined,
though it might be good enough in some common contexts.

CREATE FORMAT card_format_redacted WITH '**** **** **** 9999' ALLOW ALL;
CREATE FORMAT card_format_full '9999 9999 9999 9999' ALLOW admin_1;
CREATE FORMAT card_format_special special_card_formatter(); ALLOW
admin_42, mariadba;

-- specify default FORMAT
CREATE TABLE customer
(
...
stored_card_number NUMERIC FORMAT card_format_redacted,
...
)

-- unformatted, fails if role is neither admin-1 or a role that inherits
from it
SELECT
stored_card_number
WHERE
...;

-- using card_format_redacted
SELECT
stored_card_number FORMAT DEFAULT
WHERE
...;

-- using card_format_full, fails if role is neither admin-1 or a role
that inherits from it
SELECT
stored_card_number FORMAT card_format_full
WHERE
...;

Cheers,
Gavin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject:	Re: Column Redaction
Date:	2014-10-14 16:43:07
Message-ID:	CA+TgmoZ+rOdVjhNmbK5XkztHOQ2fps_V6=OKSvc1i-1Huc0w1Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 11, 2014 at 3:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> As soon as you issue the above query, you have clearly indicated your
> intention to steal. Receiving information is no longer accidental, it
> is an explicit act that is logged in the auditing system against your
> name. This is sufficient to bury you in court and it is now a real
> deterrent. Redaction has worked.

To me, this feels thin. It's true that this might be good enough for
some users, but I wouldn't bet on it being good enough for very many
users, and I really hope there's a better option. We have an existing
method of doing data redaction via security barrier views. There are
information leaks, to be sure, but they are far more subtle and
low-bandwidth than Rod's query. The reason for that is that only
trusted code (leakproof functions) are allowed to run against the
trusted data; the redaction is applied before any
potentially-untrustworthy stuff happens. Here, you're applying the
redaction as the very last step before sending the data to the user,
and that allows too much freedom to do bad stuff between the time the
database first lays hands on the data and the time it gets redacted.

But maybe that can be fixed. I don't know exactly how. I think you
need a design that allows you to restrict very tightly the operations
that an untrusted user can perform on trusted data. Maybe you only
want to allow "=" and nothing else, for example. Perhaps the set of
allowable predicates could be defined via DDL. Then when the query is
run, the system imposes a security fence. Only approved predicates
can be pushed through the fence. And when the data crosses the fence
from the trusted side to the untrusted side, redaction happens at that
point, rather than just before sending the data to the user.

This is, of course, more complicated. But I think it's likely to be
worth it. The problem with relying on auditing is that you need a
human to look at the audit logs and judge intent. With a query as
overt as Rod's, that's maybe not too hard. But with a lot of analysts
running a lot of queries, it might not be that hard to bury an
information-stealing query inside an innocent-looking query in such a
way that the administrator doesn't notice. Granted, that's playing
with fire, but I've encountered many security vulnerabilities in my
career that can be exploited without doing anything obviously evil.
If you retroactively put a packet-sniffer on every network I've ever
been connected to, and carefully examined all my network traffic,
you'd find me finding holes in all kinds of things, but in fact,
nobody's ever noticed a problem in advance of me reporting it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject:	Re: Column Redaction
Date:	2014-10-15 08:04:35
Message-ID:	CA+U5nMJ-hTGo5PqVEWDM9V29hTt=MPaUVymHx0tZi3X+WWSo_Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14 October 2014 17:43, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Oct 11, 2014 at 3:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> As soon as you issue the above query, you have clearly indicated your
>> intention to steal. Receiving information is no longer accidental, it
>> is an explicit act that is logged in the auditing system against your
>> name. This is sufficient to bury you in court and it is now a real
>> deterrent. Redaction has worked.
>
> To me, this feels thin. It's true that this might be good enough for
> some users, but I wouldn't bet on it being good enough for very many
> users, and I really hope there's a better option. We have an existing
> method of doing data redaction via security barrier views.

I agree with "thin". There is a leak in the design, so let me coin the
phrase "imprecise security". Of course, the leaks reduce the value of
such a feature; they just don't reduce it all the way to zero.

Security barrier views or views of any kind don't do the required job.

We are not able to easily classify people as Trusted or Untrusted.

We're seeking to differentiate between the right to use a column for
queries and the right to see the value itself. Or put another way, you
can read the book, you just can't photocopy it and take the copy home.
Or, you can try on the new clothes to see if they fit, but you can't
take them home for free. Both of those examples have imprecise
security measures in place to control and reduce negative behaviours
and in every other industry this is known as "security".

In IT terms, we're looking at controlling and reducing improper access
to data by an otherwise Trusted person. The only problem is that some
actions on data items are allowed, others are not.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject:	Re: Column Redaction
Date:	2014-10-15 18:46:04
Message-ID:	CA+TgmoYOYtUyvKZHmgAXDs7ZeuTnEuhMnX30EUxA0LbmwAGBpg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 15, 2014 at 4:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 14 October 2014 17:43, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sat, Oct 11, 2014 at 3:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> As soon as you issue the above query, you have clearly indicated your
>>> intention to steal. Receiving information is no longer accidental, it
>>> is an explicit act that is logged in the auditing system against your
>>> name. This is sufficient to bury you in court and it is now a real
>>> deterrent. Redaction has worked.
>>
>> To me, this feels thin. It's true that this might be good enough for
>> some users, but I wouldn't bet on it being good enough for very many
>> users, and I really hope there's a better option. We have an existing
>> method of doing data redaction via security barrier views.
>
> I agree with "thin". There is a leak in the design, so let me coin the
> phrase "imprecise security". Of course, the leaks reduce the value of
> such a feature; they just don't reduce it all the way to zero.
>
> Security barrier views or views of any kind don't do the required job.
>
> We are not able to easily classify people as Trusted or Untrusted.
>
> We're seeking to differentiate between the right to use a column for
> queries and the right to see the value itself. Or put another way, you
> can read the book, you just can't photocopy it and take the copy home.
> Or, you can try on the new clothes to see if they fit, but you can't
> take them home for free. Both of those examples have imprecise
> security measures in place to control and reduce negative behaviours
> and in every other industry this is known as "security".
>
> In IT terms, we're looking at controlling and reducing improper access
> to data by an otherwise Trusted person. The only problem is that some
> actions on data items are allowed, others are not.

Sure, I don't disagree with any of that as a general principle. I
just think we should look for some ways of shoring up your proposal
against some of the more obvious attacks, so as to have more good and
less bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject:	Re: Column Redaction
Date:	2014-10-15 18:57:34
Message-ID:	CA+U5nMJC8jbOeQrr1KbV8gsUdG_Kuhd8o3Fx4YPbtftUS3-G0g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15 October 2014 19:46, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> In IT terms, we're looking at controlling and reducing improper access
>> to data by an otherwise Trusted person. The only problem is that some
>> actions on data items are allowed, others are not.
>
> Sure, I don't disagree with any of that as a general principle. I
> just think we should look for some ways of shoring up your proposal
> against some of the more obvious attacks, so as to have more good and
> less bad.

Suggestions welcome. I'm not in a rush to implement this, so we have
time to mull it over.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-15 19:41:24
Message-ID:	CAGTBQpb1y2gDD2j5MVzdC9L9Ee6jvKwE+JuPDSU+MVr+ePP-qA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 10 October 2014 16:45, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:
> Redaction prevents accidental information loss only, forcing any loss
> that occurs to be explicit. It ensures that loss of information can be
> tied clearly back to an individual, like an ink packet that stains the
> fingers of a thief.

That is not true.

It can only be tied to a session. That's very far from an individual
in court terms, if you ask a lawyer.

You need a helluva lot more to tie that to an individual.

> Redaction clearly relies completely on auditing before it can have any
> additional effect. And the effectiveness of redaction needs to be
> understood next to Rod's example.

It forces you to audit all of the queries issued by the otherwise trusted user.

That is, I believe, a far from optimal design. When you have to audit
everything, you end up auditing nothing, a haystack of false positives
can easily hide the needle that is the true positive.

What you want, is something that allows selective auditing of
leak-prone queries.

But we've seen that joining is already a leak-prone query, so clearly
you cannot allow simple joining if you want the above.

What I propose, needs a schema change and some preparedness from the
DBA. But, how can you assume that to be asking too much and not say
the same from thorough auditing?

So, what I propose, is to require explicit separation of concepts at
the schema level.

On Sat, Oct 11, 2014 at 10:43 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> For example, for a credit card type, you would output the last four
> digits, but is there any value to storing the non-visible digits? You
> can check the checksum of the digits, but that can be done on input and
> doesn't require the storage of the digits. Is there some function we
> could provide that would make that data type useful? Could we provide
> comparison functions with delays or increasing delays?

Basically, as said above, the point is to provide a data type that is
nigh-useless.

Imagine a redacted card number as a tuple (full_value_id, suffix).
Suffix is in cleartext, and prefix_id is just an id pointing to a
lookup table for the type.

Regular users can read any redacted_number column, but will only get
the id (useless unless they already know what that prefix is), and
suffix. Format for that type would be "**** suffix" and would serve
the purpose on the OP: it can be joined (equal value = equal id).
Moreover, the type can be design in one of two ways: equal values
contain equal id, or salted-values, where even equal values generated
from different computations (ie: not copied) have different ids. This
second mode would be the most secure, albeit a tad hard to use
perhaps.

But it would allow joining and everything. Only users that have access
to the lookup table would be allowed to resolve the full value, with a
non-security-defining function like:

extract_full_value(redacted_number)

Then you can audit all queries against the lookup table, and you have
rather strong security IMHO.

This can all be done without any new features to postgres. Maybe you
can add syntactic sugar, but you don't really need anything on the
core to accomplish the above.

The syntactic sugar can take the form of a new data type family (like
enum?) where you specify the redaction function, redacted data type,
output format, and from there everything else works atomagically, with
a

extract_full(any) -> any

function that somehow knows what to do.

On Wed, Oct 15, 2014 at 3:57 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 15 October 2014 19:46, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>>> In IT terms, we're looking at controlling and reducing improper access
>>> to data by an otherwise Trusted person. The only problem is that some
>>> actions on data items are allowed, others are not.
>>
>> Sure, I don't disagree with any of that as a general principle. I
>> just think we should look for some ways of shoring up your proposal
>> against some of the more obvious attacks, so as to have more good and
>> less bad.
>
> Suggestions welcome. I'm not in a rush to implement this, so we have
> time to mull it over.

Does the above work for your intended purposes?

Hard to know from what you've posted until now, but I believe it does.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-15 19:59:31
Message-ID:	CA+U5nMKSqTcmoeMfsNEDGs28rkfJ9Fy99DdiQprupe4Ch2kKUA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15 October 2014 20:41, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 10 October 2014 16:45, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:
>> Redaction prevents accidental information loss only, forcing any loss
>> that occurs to be explicit. It ensures that loss of information can be
>> tied clearly back to an individual, like an ink packet that stains the
>> fingers of a thief.
>
> That is not true.
>
> It can only be tied to a session. That's very far from an individual
> in court terms, if you ask a lawyer.
>
> You need a helluva lot more to tie that to an individual.

So you're familiar then with this process? So you know that an auditor
would trigger an investigation, resulting in deeper surveillance and
gathering of evidence that ends with various remedial actions, such as
court. How would that process start then, if not this way?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-15 20:03:15
Message-ID:	CAGTBQpYxpXG+durmgwM2VdqqfJEBjfxx6b9O-M09j-tEh2PLKw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 15, 2014 at 4:59 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 15 October 2014 20:41, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> On 10 October 2014 16:45, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:
>>> Redaction prevents accidental information loss only, forcing any loss
>>> that occurs to be explicit. It ensures that loss of information can be
>>> tied clearly back to an individual, like an ink packet that stains the
>>> fingers of a thief.
>>
>> That is not true.
>>
>> It can only be tied to a session. That's very far from an individual
>> in court terms, if you ask a lawyer.
>>
>> You need a helluva lot more to tie that to an individual.
>
> So you're familiar then with this process? So you know that an auditor
> would trigger an investigation, resulting in deeper surveillance and
> gathering of evidence that ends with various remedial actions, such as
> court. How would that process start then, if not this way?

I've seen lots of such investigations fail because the evidence wasn't
strong enough to link to a particular person, but rather a computer
terminal or something like that.

Unless you also physically restrict access to such terminal to a
single person through other means (which is quite uncommon practice
except perhaps in banks), that evidence is barely circumstantial.

But you'd have to ask a lawyer in your country to be sure. I can only
speak for my own experiences in my own country which is probably not
yours nor has the same laws. Law is a complex beast.

So, you really want actual information security in addition to that
deterrent you speak of. I don't say the deterrent is bad, I only say
it's not good enough on its own.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-15 23:59:16
Message-ID:	CA+U5nM+1B_f4k56d44DEa-5n0j-yxQTKFgxK8PGS8x5VLQ7SdA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15 October 2014 21:03, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:

>> So you're familiar then with this process? So you know that an auditor
>> would trigger an investigation, resulting in deeper surveillance and
>> gathering of evidence that ends with various remedial actions, such as
>> court. How would that process start then, if not this way?
>
> I've seen lots of such investigations fail because the evidence wasn't
> strong enough to link to a particular person, but rather a computer
> terminal or something like that.

So your solution to the evidence problem is to do nothing? Or you have
a better suggestion?

Nothing is certain, apart from doing nothing.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-16 00:29:24
Message-ID:	CAGTBQpaKzc+zpd08BxiB2gDJVMpo-kypokrTQ=Dbz9R1hfMhUA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 15, 2014 at 8:59 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 15 October 2014 21:03, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>
>>> So you're familiar then with this process? So you know that an auditor
>>> would trigger an investigation, resulting in deeper surveillance and
>>> gathering of evidence that ends with various remedial actions, such as
>>> court. How would that process start then, if not this way?
>>
>> I've seen lots of such investigations fail because the evidence wasn't
>> strong enough to link to a particular person, but rather a computer
>> terminal or something like that.
>
> So your solution to the evidence problem is to do nothing? Or you have
> a better suggestion?
>
> Nothing is certain, apart from doing nothing.

Is solving the evidence problem in scope of the postgresql project?

The solution is to not require evidence in order to be protected from
data theft.

Having evidence is nice, you can punish effective attacks, which is a
deterrent to any attacker as you pointed out, and may even include
financial compensation. It requires physical security as well as
software security, and I'm not qualified to solve that problem without
help from a lawyer (but I do know you need help from a lawyer to make
sure the evidence you gather is usable).

Not having usable evidence, however, could fail to deter knowledgeable
attackers (remember, in this setting, it would be an inside job, so it
would be a very knowledgeable attacker).

But in any case, if the deterrence isn't enough, and you get attacked,
anything involving redaction as fleshed out in the OP is good for
nothing. The damage has been done already. The feature doesn't
meaningfully slow down extraction of data, so anything you do can only
punish the attacker, not prevent further data theft or damaged
reputation/business.

Something that requires superuser privilege (or specially granted
privilege) in order to gain access to the unredacted value, on the
other hand, would considerably slow down the attacker. From my
proposal, only the second form (unnormalized redacted tuples) would
provide any meaningful data security in this sense, but even in the
other, less limiting form, it would still prevent unauthorized users
from extracting the value: you can no longer do binary search with
unredacted data, only a full brute-force search would work. That's
because the full value id (that I called prefix id, sorry, leftover
from an earlier draft) doesn't relate to the unredacted value, so
sorting comparisons (< <= > >=) don't provide usable information about
value space.

So, if there is a chance to implement redaction in a way that truly
protects redacted data... even if it costs a bit of performance
sometimes. Is avoiding the performance hit worth the risk?

I guess the potential users of such a feature are the only ones
qualified to answer, and the answer has great weight on how the
feature could be implemented.

Well, and of course, the quality of the implementation. If my proposal
has weaknesses I did not realize yet, it may be worthless. But that's
true of all proposals that aim for any meaningful level of security:
it's worth a lengthy look.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2014-10-31 14:35:11
Message-ID:	CA+U5nMKxx4hX12AhLn7LCqErhVOm9dfOCSt32cZFAC6xL+P0vQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 16 October 2014 01:29, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:

> But in any case, if the deterrence isn't enough, and you get attacked,
> anything involving redaction as fleshed out in the OP is good for
> nothing. The damage has been done already. The feature doesn't
> meaningfully slow down extraction of data, so anything you do can only
> punish the attacker, not prevent further data theft or damaged
> reputation/business.

Deterrence is exactly the goal.

"Only punishing the attacker" is exactly what this is for. This is not
the same thing as preventative security.

Redaction is designed to prevent authorized users from accidental
misuse. Your business already trusts these people. You know their
names, their addresses, their bank account details and you'll have
already run security scans on them.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Column Redaction
Date:	2022-06-22 18:56:01
Message-ID:	CALtqXTc_CVaBFXzVivpnaM75-dhzsRePfz14GQRXRGFbe7PCUA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 22, 2022 at 11:53 PM Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like '**** ***** ****
> 7733'
>
> We can't do all of the above with current facilities inside the database.
>
> The ability to mask output for data in certain cases, for the purpose
> of security, is known lately as data redaction, or column-level data
> redaction.
>
> The best way to support this requirement would be to allow columns to
> have an additional "output formatting function". This would be
> executed only when data is about to be returned by a query. All other
> uses of that would not restrict the data.
>
> This would have other uses as well, such as default report formats, so
> we can store financial amounts as NUMERIC, but format them on
> retrieval as $12,345.78 etc..
>
> Suggested user interface would be...
> FORMAT functionname(parameters, if any)
>
> e.g.
> CREATE TABLE customer
> ( id ...
> ...
> , stored_card_number NUMERIC FORMAT pci_card_number_redaction()
> ...
> );
>
> We'd need to implement something to allow pg_dump to ignore format
> functions. I suggest the best way to do that is by providing a BACKUP
> role that can be delegated to other users. We would then allow a
> parameter for SET output_formatting = on | off, which can only be set
> by superuser and BACKUP role, then have pg_dump issue SET
> output_formatting = off explicitly when it runs.
>
> Do we want redaction in PostgreSQL?
> Do we want it generalised into output format functions?
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
> Hi,
Do we still have some interest in this? People generally like that
the idea, if yes I am happy to work on that and can send the complete
design first.

--
Ibrar Ahmed