xfs
[Top] [All Lists]

Re: [patch 2/2/4] mm: try to distribute dirty pages fairly across zones

To: Johannes Weiner <jweiner@xxxxxxxxxx>
Subject: Re: [patch 2/2/4] mm: try to distribute dirty pages fairly across zones
From: Minchan Kim <minchan.kim@xxxxxxxxx>
Date: Wed, 28 Sep 2011 14:56:40 +0900
Cc: Andrew Morton <akpm@xxxxxxxxxx>, Mel Gorman <mgorman@xxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, Wu Fengguang <fengguang.wu@xxxxxxxxx>, Jan Kara <jack@xxxxxxx>, Rik van Riel <riel@xxxxxxxxxx>, Chris Mason <chris.mason@xxxxxxxxxx>, "Theodore Ts'o" <tytso@xxxxxxx>, Andreas Dilger <adilger.kernel@xxxxxxxxx>, xfs@xxxxxxxxxxx, linux-btrfs@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=xDeMNa5XDuPF6Lt7JYzPW477x24sIvjq9V4sewQBdBM=; b=LrcJegZd8WifW/lownjp5c+eGWNnt25CdyOxaRHWdV/72rqSXEdN1U8n9XnOxCzGMU haUVcSAqjpdcnw7n86qqJB9OFzzXkvcadEieGpIitXC8DrTQiBMgWDF00coRpBee8jvw U99yRzSqsxSDKUom0/SlVEDbeJuTlnITDXfF4=
In-reply-to: <20110923144248.GC2606@xxxxxxxxxx>
References: <1316526315-16801-1-git-send-email-jweiner@xxxxxxxxxx> <1316526315-16801-3-git-send-email-jweiner@xxxxxxxxxx> <20110921160226.1bf74494.akpm@xxxxxxxxxx> <20110922085242.GA29046@xxxxxxxxxx> <20110923144248.GC2606@xxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Sep 23, 2011 at 04:42:48PM +0200, Johannes Weiner wrote:
> The maximum number of dirty pages that exist in the system at any time
> is determined by a number of pages considered dirtyable and a
> user-configured percentage of those, or an absolute number in bytes.

It's explanation of old approach.

> 
> This number of dirtyable pages is the sum of memory provided by all
> the zones in the system minus their lowmem reserves and high
> watermarks, so that the system can retain a healthy number of free
> pages without having to reclaim dirty pages.

It's a explanation of new approach.

> 
> But there is a flaw in that we have a zoned page allocator which does
> not care about the global state but rather the state of individual
> memory zones.  And right now there is nothing that prevents one zone
> from filling up with dirty pages while other zones are spared, which
> frequently leads to situations where kswapd, in order to restore the
> watermark of free pages, does indeed have to write pages from that
> zone's LRU list.  This can interfere so badly with IO from the flusher
> threads that major filesystems (btrfs, xfs, ext4) mostly ignore write
> requests from reclaim already, taking away the VM's only possibility
> to keep such a zone balanced, aside from hoping the flushers will soon
> clean pages from that zone.

It's a explanation of old approach, again!
Shoudn't we move above phrase of new approach into below?

> 
> Enter per-zone dirty limits.  They are to a zone's dirtyable memory
> what the global limit is to the global amount of dirtyable memory, and
> try to make sure that no single zone receives more than its fair share
> of the globally allowed dirty pages in the first place.  As the number
> of pages considered dirtyable exclude the zones' lowmem reserves and
> high watermarks, the maximum number of dirty pages in a zone is such
> that the zone can always be balanced without requiring page cleaning.
> 
> As this is a placement decision in the page allocator and pages are
> dirtied only after the allocation, this patch allows allocators to
> pass __GFP_WRITE when they know in advance that the page will be
> written to and become dirty soon.  The page allocator will then
> attempt to allocate from the first zone of the zonelist - which on
> NUMA is determined by the task's NUMA memory policy - that has not
> exceeded its dirty limit.
> 
> At first glance, it would appear that the diversion to lower zones can
> increase pressure on them, but this is not the case.  With a full high
> zone, allocations will be diverted to lower zones eventually, so it is
> more of a shift in timing of the lower zone allocations.  Workloads
> that previously could fit their dirty pages completely in the higher
> zone may be forced to allocate from lower zones, but the amount of
> pages that 'spill over' are limited themselves by the lower zones'
> dirty constraints, and thus unlikely to become a problem.

That's a good justification.

> 
> For now, the problem of unfair dirty page distribution remains for
> NUMA configurations where the zones allowed for allocation are in sum
> not big enough to trigger the global dirty limits, wake up the flusher
> threads and remedy the situation.  Because of this, an allocation that
> could not succeed on any of the considered zones is allowed to ignore
> the dirty limits before going into direct reclaim or even failing the
> allocation, until a future patch changes the global dirty throttling
> and flusher thread activation so that they take individual zone states
> into account.
> 
> Signed-off-by: Johannes Weiner <jweiner@xxxxxxxxxx>

Otherwise, looks good to me.
Reviewed-by: Minchan Kim <minchan.kim@xxxxxxxxx>

-- 
Kinds regards,
Minchan Kim

<Prev in Thread] Current Thread [Next in Thread>