NN self play performance expectations?
So a few days ago I went down the rabbit hole and started building an engine with the goal of building it on a NN utilizing a MCST and my own Bitboard implementation.
I have gotten it to the point where games are being played on its own... those games are being logged to a db...it can pick up where its left off by loading batches from db... it seems like all the rules of chess are being followed.... the training engine is pretty dumb so a lot of random moves so far so I put a hybrid evaluation in place so it at least knows which pieces mean more to it etc when it decides to capture or not....I have done some memory management so it seems like it only ever runs off 0-2ish GB before handling and trying to clear up memory for performance reasons... It seems early game moves generally take 5 seconds and late game moves can take anywhere from 20 seconds to 100 seconds...
It is still running single threaded on CPU and I have not attempted to add anything to target GPU yet... But now I am starting to wonder if I made some critical mistakes in choosing C# and a tensorflow network....
Games in current configuration take way too long to actually believe I will be able to self train 100s of thousands of games. I know its a bit of a marathon and not a sprint but I am wondering if anyone has experience and what type of performance anyone on their own has achieved with a NN implementation.... I am sure that multithreading and potentially targeting gpu will help quite a bit or at least get multiple games done in the time it takes to currently done one but I am wondering if it will all be in vain anyways.
Two big features that I have not added yet and I am sure will help the overall performance is an opening book and an end game db... I have set up the db and connection to opening book table but I have not gone about populating it yet and its just playing from start to end at the moment. But again that is only going to help for so many moves while the bulk of them will still be on its own. I have also not profiled functions yet either but currently working on efficiency before going to at least multithreading. And I still am running it with console output so I can analyze the output to see which moves how long moves are taking and verifying I am not seeing anything out of the ordinary on my board's tostring representation as I am still in the early days of it all working up to this point....
I guess I am just looking for any shred of hope or goal in mind on what is possible performance wise on a personal PC without having to rent time to train it eventually.
My own computer specs are i9 13900ks 64gb of ram and a 4090...