[Top] [All Lists]

Re: the thing with the binary zeroes

To: Olaf Frączyk <olaf@xxxxxxxxxxxxx>
Subject: Re: the thing with the binary zeroes
From: David J N Begley <d.begley@xxxxxxxxxx>
Date: Mon, 14 Feb 2005 21:16:02 +1100 (EST)
Cc: Daniel Moore <dxm@xxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <1108369746.3535.10.camel@xxxxxxxxxxxxxxxxxxx>
References: <Pine.GSO.4.58.0502131201290.26391@xxxxxxxxxxxxxxxxx> <20050211121829.GA30049@xxxxxxxxxxxxxxxxxxxxx> <m1sm43uu8h.fsf@xxxxxx> <20050211131546.GA32336@xxxxxxxxxxxxxxxxxxxxx><m1oeeruswr.fsf@xxxxxx> <20050211133558.GA32501@xxxxxxxxxxxxxxxxxxxxx><m1k6pfurpd.fsf@xxxxxx> <Pine. GSO.4.58.0502121642380.25840@xxxxxxxxxxxxxxxxx> <m1r7jmf0q7.fsf@xxxxxx> <200502130215.j1D2FY0w679915@xxxxxxxxxxxxxxxxxxxxxx> <1108369746.3535.10.camel@xxxxxxxxxxxxxxxxxxx>
Reply-to: David J N Begley <d.begley@xxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Earlier today, Olaf Fr?czyk wrote:

> On Sun, 2005-02-13 at 13:15 +1100, Daniel Moore wrote:
> > Hope that explains it. I think people shy away from answering this as
> The problem is that people don't understand the difference between the data
> and metadata journaling. As XFS doesn't do data journaling the whole
> discussion is pointless.

Errm, no and no - believe it or not full file contents/data journalling is not
always behind the question (in my case, it's already assumed to not be a
part of the equation).

The fact that this issue keeps being raised demonstrates that either those
asking the questions (myself included) are not using the right words or those
answering are too quickly skimming the questions, missing the underlying point
and thus answering some closely-related but different question.

People hear/read that XFS is a fast, reliable file system - indeed, that is
exactly many people's experiences with it.  Alas, there are ways in which XFS
behaves differently to the other journalling FS options in Linux and this has
led to data loss, which is where people cannot reconcile their experiences
with those of others (or the claims of XFS' reliability).  This is where the
problem (and thence the interminable questioning) begins.


(a) people's expectations of XFS are misaligned with XFS' intended work
    environment, in which case the documentaion needs to be updated to
    include a prominent notice in order to correct people's expectations
    (eg., cannot use XFS and expect minimal data loss unless apps are
    written a certain way, full hardware RAID is used and UPS guarantees
    no power loss);


(b) there are some ways in which XFS can be improved in which case the
    only way people are going to be able to contribute anything of value
    is if there exists an open and frank discussion about how to design
    and implement any such improvements (rather than dissmissing all
    questions as being about full data journalling).

Despite explicitly mentioning ext3's ordered journalling mechanism a number of
times (this is not a full data journal, like XFS it only journals metadata),
the responses are still fixated on full data journalling for some reason;  it
might be easier if I just quote Steve Lord of SGI:

  "The ordered mode in ext3 is almost certainly a better performing
  solution than this, it would be nice to get this into xfs, but
  probably quite a while before it could be worked on."


Previously I asked (clearly inelegantly) whether with such a mechanism (as
Steve noted would be good to get into XFS) the zero-page problem would still
be an issue at all?  Would that remove all further concern/confusion/doubt
regarding XFS?

> If an application is written in a sane way then you don't have "zeros
> problem" even on metadata only journaling filesystem either.

See point (a) above.

On Sun Feb 13 2005, Daniel Moore wrote:

> David J N Begley writes:
> => - either the metadata has not been updated, in which case you point to
> =>   the old file data (as opposed to "random" raw data from any file);
> =>   or,
> =>
> => - the meta data has been updated, and points to the correct new file
> =>   data (because it was flushed to disk before the metadata).
> XFS' journaling is there to protect the _metadata_. It seems a little

See first paragraph above - I know;  when I talked of metadata being
"updated", I meant committed from RAM to the journal on disk.  I didn't
mention anything about file contents/data also going through the journal, just
that it was flushed to the disk prior to its associated metadata.

> XFS will never (1) give you back the old crud left on disk - that would
> be a secuity hole. So whenever you're reading something that you
> haven't written, you'll get zeroes.

If by "old crud" you are referring to the old version of a rewritten file
(where the metadata has been updated but the associated file data has yet to
be flushed to disk from RAM), then it should be possible to make the zeroing
of files an option for those willing to carry the security risk (and thus kill
the whole problem/questions in a simple step).

If by "old crud" you are referring to seemingly random, unrelated data (either
on disk or from memory) then I agree, it is a security hole.

As noted above though - if data blocks were flushed from RAM prior to the
associated meta data being committed from RAM to the on-disk journal, would
not this solve both problems at once?  (That is, if the metadata has not been
updated then you will not get "random" data - but if it _has_ been updated
then you can be sure to get the correct data thus no zeros.)

If this is something that would run counter to XFS' delayed allocation or
other features, then _that's_ what needs to appear in the FAQ (along with see
point (a) above), not replies saying something along the lines of, "data is
not journalled so stop asking".

Hopefully this time I've been clearer (if somewhat long-winded).


<Prev in Thread] Current Thread [Next in Thread>