GameMonkey Script

GameMonkey Script Forums
It is currently Mon Jan 21, 2019 12:11 am

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: GarbageCollector crash
PostPosted: Fri Sep 30, 2011 3:08 am 
Offline

Joined: Fri Jul 24, 2009 5:22 am
Posts: 14
Hi everybody,

I was tracking a bad GC crash in my game, when I end up with this small test script:

Code:
print("a");

local test_func = function(a,b)
{
    return {a,b};
};

print("b");
global Color = function()
{
    local cp =  {};
    sysCollectGarbage( true );
    return cp;
};

print("c");
local test = test_func( Color(), Color() );   
print("c1");
sysCollectGarbage( true ); 
print("c2");
sysCollectGarbage( true );

print("d");

while (1)
{
    print("e");
    yield();
}


I do not know if this is the problem of my game, but this script is making the GC crash (last print is "c2").
I checked it with the latest gme.exe and gme64.exe and those are crashing too when I run the script.
If I do not call the full GC inside the function Color everything is ok.
It seems that a GC call deleted a table created by the Color function, even if it is still referenced by the table test.
Has somebody already encountered this problem?

The target platform where I found the crash is a PS3 and I run gme on a Windows7 64Bit OS.

Any helps will be wellcome
Thanks


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 30, 2011 5:01 am 
Offline

Joined: Mon Dec 15, 2003 1:38 pm
Posts: 708
Thanks for that excellent repro case, and welcome to the forum Ghosto.
I'll have a look into that as soon as I can.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 30, 2011 9:20 am 
Offline

Joined: Mon Dec 15, 2003 1:38 pm
Posts: 708
Quick update: I think I've identified the cause but have not yet devised a solution. What is happening, is the ownership of the locally created tables are transferred from the stack to a newly allocated local table object. The placement of the GC cycles causes a scenario where the newly allocated table does not scan its children and thus they are classified as junk to be freed, when they are not.

I'm very impressed at your simple reproduction case. Well done! We'll nail this bug soon.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 30, 2011 9:40 am 
Offline

Joined: Fri Jul 24, 2009 5:22 am
Posts: 14
Hi, thanks very much for the fast response.
I glad that you are already watching at it.
I keep tuned


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 01, 2011 1:23 pm 
Offline

Joined: Mon Dec 15, 2003 1:38 pm
Posts: 708
Quick update: I have a solution I'll share soon for you to test with, however during my tests I've found another bug relating specifically to the 64bit address build. I'll try and fix that also before reporting back. I should have noticed this other bug earlier as most of the samples won't run with gme64.exe.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 02, 2011 3:09 am 
Offline

Joined: Mon Dec 15, 2003 1:38 pm
Posts: 708
Please try v1.28.2 and tell me how you go. Also let me know if performance changes much.
A couple of shameful 'how did this ever work' moments in this fix set :oops:.


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 03, 2011 3:28 am 
Offline

Joined: Fri Jul 24, 2009 5:22 am
Posts: 14
Hi, Greg

Thanks for your update.
I did not finish all my tests yet , but the script is working now, thanks.
One thing, I started to receive an assert in "DestructSomeFreeObjects" because the GC is trying to discard a persistent object,
and if I put this sourcecode:

Code:
#if GM_GC_KEEP_PERSISTANT_SEPARATE
    GM_ASSERT(!a_obj->GetPersist());
#endif //GM_GC_KEEP_PERSISTANT_SEPARATE


inside your new function "GrayThisRootObject", it asserts.
I don't know very well, maybe we are just doing bad stuff inside our userobjects, but shouldn't this function ignore the persistent objects?

I did not check the performances yet... do you think that this fix is affecting negatively the GC speed?
Thanks again for your support


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 03, 2011 4:29 am 
Offline

Joined: Mon Dec 15, 2003 1:38 pm
Posts: 708
Ghosto wrote:
...
One thing, I started to receive an assert in "DestructSomeFreeObjects" because the GC is trying to discard a persistent object, and if I put this sourcecode:...

Good find, persistent objects are stored separately and should not be processed. I've incorporated it into my next build v1.28.3 which is now available.
Ghosto wrote:
...I did not check the performances yet... do you think that this fix is affecting negatively the GC speed?

The new root scan can't assume anything about the objects, so it naively re-inserts them into a list. This is done once per full GC cycle, and only when the GC runs, which it should only do intermittently. I don't expect the new code to significantly reduce performance. The 64bit fixes add an extra compare to the table access, which also should not show up as significant. It it did, it is a candidate for optimization (simple platform specific specialization).


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 03, 2011 10:09 am 
Offline

Joined: Fri Jul 24, 2009 5:22 am
Posts: 14
Hi,

I think that I found another problem:

Code:
print("A");

global MakeControllerUnion = function( _Controllers, _Available, _All_connected )
{    
    sysCollectGarbage( true );
    local obj =
   {
      _controllers = _Controllers,
      _available_num_list = _Available,
        _all_connected = _All_connected,
   };          
   return obj;
};

print("B");
global g_local_controllers = { };
print("C");
g_local_controllers[0] = {};
print("D");
g_local_controllers[1] = {};
print("E");
global g_local_controller_union = MakeControllerUnion( g_local_controllers, { 0,1 } );
print("F");

while(1)
{
    print("G");
    sysCollectGarbage( true );
    yield();
}


The old test script is now working with the version 1_28_2, but this one is still making gme.exe crashs, I moved the full GC call.
I investigated a bit, and I think that the problem can be in "GrayAWhite" where the shaded state of the object is checked before move it in the gray list.
For some reasons that I do not know GC is thinking that the object is already in the gray list, when it is not, only the color is matching.
As test I changed the number of bit used to colored an object:

Code:
/// \brief Toggle bit used to represent 'colored'
  inline void ToggleCurShadeColor()               {m_curShadeColor = (m_curShadeColor+1)%3;}  // use 0,1,2 as generation color instead of only 0,1


and it works, but it is just a test and I do not know if it the right why to fix it or even if the problem is really there.

Thanks


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 03, 2011 11:27 am 
Offline

Joined: Mon Dec 15, 2003 1:38 pm
Posts: 708
Thanks Ghosto, I'll look into this as soon as I can.
The work around you describe is not valid, but certainly avoid the issue anyway you can to reduce disruption to your project while we work this out.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 04, 2011 8:01 am 
Offline

Joined: Mon Dec 15, 2003 1:38 pm
Posts: 708
Please try new beta 1.28.4. I have some confidence that I've addressed the issue properly while my previous attempt was not sound.
The common cause relates to local objects transferring ownership. I have added a write barrier to the pop stack frame which should preserve the tricolor invariance regardless of how old or new the local or connected objects are.


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 05, 2011 12:08 am 
Offline

Joined: Fri Jul 24, 2009 5:22 am
Posts: 14
Hi Greg,

thanks for the new fix, I checked it and the two samples are working now.
I checked for performance problems, but I did not see any notable change in the frame rate.
Sadly, I am still able to make the GC crash (maybe another edge case), but I am not sure about it.
This time the sample is not independent and you can not run it (I will prepare one if is possible), so maybe it is a bug in our object, but I will report it here so you can give a look:

Code:
local test_call = function( a, b )
{
    return array( a, b );
};

global Color = function()
{
    sysCollectGarbage( true );   
    local cp = {};
    sysCollectGarbage( true );   
    return cp;
};

local test = array( Color(), Color() );

while (1)
{
    yield();
}

This code is crashing, array is a revised version of the sample gmArrayLib.

Instead if I replace "array( Color..." with "test_call( Color..." it works.
It works even by changing the order, position or number of "sysCollectGarbage".
Code:
local test_call = function( a, b )
{
    return array( a, b );
};

global Color = function()
{
    sysCollectGarbage( true );   
    local cp = {};
    sysCollectGarbage( true );   
    return cp;
};

local test = test_call( Color(), Color() );

while (1)
{
    yield();
}


Obviously this is not a normal scenario, but I am thinking that it is what happening to our game when it is going from in-game to menu, time where a lot of objects are calling "sysCollectGarbage( true )" to free low level resources. The crash is quite rare, and it is strange that we noticed it only now (all our PS3 PixelJunk games are based on GM).

Thanks again


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 05, 2011 1:34 am 
Offline

Joined: Fri Jul 24, 2009 5:22 am
Posts: 14
Quick update:
Maybe the problem is the native call "array", in this case the StackFrame is not pushed and popped, but the stack is altered.


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 05, 2011 1:56 am 
Offline

Joined: Fri Jul 24, 2009 5:22 am
Posts: 14
Probably I fixed it by adding those lines:

Code:
   gmGarbageCollector* gc = m_machine->GetGC();
   test_printf(gc,"thread:PushStackFrame native\n");
   for(int index = m_base; index < m_top - 1; ++index)
    {
      if(m_stack[index].IsReference())
      {
        gmObject * object = GM_MOBJECT(m_machine, m_stack[index].m_value.m_ref);
        gc->WriteBarrier(object);
      }
    }


inside the native call branch of "PushStackFrame"


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 05, 2011 2:57 am 
Offline

Joined: Fri Jul 24, 2009 5:22 am
Posts: 14
I found that probably even BC_SETLOCAL should set a barrier.
I am afraid that all this new check could impact on the performances...


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group