[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Problems running on a Solaris Zone



A few of us, myself included, threatened to write one a while ago, but you
know how it goes. No time.

Eric Meddaugh wrote Zonestat, which is on Xymonton
http://xymonton.trantor.org/doku.php/monitors:zonestat
But I haven't had a look at it yet, so I can't say if will do what you need.
(Again, time is my problem)

If you have the time, check it out, and let us know what you make of it.

Regards
      Vernon


On Thu, Feb 25, 2010 at 6:30 AM, James Wade <jameskipwade (at) gmail.com> wrote:

>  Thanks Vernon,
>
>
>
> This was a great summary. I was hoping to run the server in the container,
>
> but I think I’ll move it to the global zone.
>
>
>
> I will work on a separate client for solaris zones. Has anyone out there
>
> started writing one?
>
>
>
> James
>
>
>  ------------------------------
>
> *From:* Vernon Everett [mailto:everett.vernon (at) gmail.com]
> *Sent:* Tuesday, February 23, 2010 8:34 PM
> *To:* hobbit (at) hswn.dk
> *Subject:* Re: [hobbit] Problems running on a Solaris Zone
>
>
>
> Hi James
>
> I put a lot of effort into this recently, and there does not appear to be
> any real practical solution to the problem.
> The problem is caused by how zones use memory and kernel space.
>
> In sparse zones, all kernels are the same kernel. There is only one
> instance of the kernel running, and as a result, only one chunk of memory
> visible to the kernel.
>
> When you set a memory cap in your zone definition, and do a prtconf in the
> zone, it reports the value of the memory cap as the available memory.
> So far, so good.
>
> However, to determine free memory, we have to interrogate the kernel. This
> can be done a number of different ways. Xymon, by default uses vmstat.
> You can also use kstat -p unix:0:system_pages:freemem and I am sure there
> are others.
> However, the kernel in question, is the kernel running in the global zone!
> It's all one kernel.
> So the reported memory free is the free memory available to the kernel. It
> should be the same value in all the zones too.
>
> The error you are seeing occurs when free memory available to the global
> kernel is more than the memory cap you have placed on the zone.
> In C (and many other programming languages), if you subtract big numbers
> from smaller numbers, you sometimes get strange results depending on how
> your variables are defined. I think that's where your multi-Petabyte memory
> is coming from. Any programmers out there that can confirm this?
>
> The other problem this creates, is that any sane-looking zone memory
> percentages are meaningless. They do not represent the true memory
> utilisation within the zone. Your zone memory utilisation could be 100%, and
> you would not realise it, because your kernel is still seeing heaps of free
> memory, and reporting lots free.
> Imagine a 2gb cap, and the apps in the zone are using all 2gb.
> However, the kernel can see 1.8gb free.
> Do the maths. Xymon tells us your zone is only using 10% of memory, which
> is far from the truth.
>
> The only real way round it might not fit with your policies and methods.
> You need to remove all memory caps.
> This floats all memory, meaning that the memory "seen" in the zone, is the
> same as the kernel, and Solaris does the management of memory, ensuring all
> zones get enough.
> It also means that all of the zones will show identical memory graphs.
>
> The other way, which I haven't had time to do yet, is to use prstat -Z in
> the global zone.
> This gives a summary of what the zones are using, which might be worth
> tracking.
>
> As a short-term workaround, because we need memory caps for certain apps,
> we have skipped memory monitoring on the zones. (It's pretty meaningless
> anyway - see above)
> We have the global zone, and below it, all the zones, with the
> NOCOLUMNS:memory bb-hosts tag.
>
> It's not really ideal, but I hope to find time to revisit this in the near
> future.
>
> It would be nice to be able to disable just the memory test on these, and
> only keep an eye on swap. Swap is local to the zone, and if you start using
> heaps of it in the zone, or are doing lots of paging, chances are you are
> maxing out your memory allocation.
> So swap is probably a good indicator.
>
> Sorry I could not be of any more help.
>
> Regards
>      Vernon
>
>
>
>  On Wed, Feb 24, 2010 at 1:35 AM, James Wade <jkwade (at) futurefrontiers.com>
> wrote:
>  *Has anyone see this problem. I’ve just compiled 4.3.0.0.beta2 on a
> Solaris 10 system. I’m running on a Sun T5120 series in a Solaris
> sparse zone. * *When I run the server, I get the following on the memory
> test.
> Fyi.. I don’t have 4.2 peta bytes of memory *J *Has anyone seen similar
> problems. Running the client in the global zone works fine.* *Tue Feb 23
> 10:52:43 CST 2010 - Memory CRITICAL*
>
>    Memory              Used       Total  Percentage
>
> [image: red] Physical     4294966186M      26624M 4294967292%
>
> [image: green] Swap                148M      26623M          0%
>
>
>
>
>
>
>
> Thanks…James
>
>
>