I'm not going to tackle all aspects of your post. But I do want to talk about percussion. Having individual modules that play one voice of percussion tends to get expensive in terms of real estate (case HP) and cost ($$).
For example, if you construct a kit out of TipTop percussion modules, you will also need a sub-mixer to to coral them into. You will need as many trigger outputs as you have percussion modules as well... so the sequencer will have to be pretty beefy. 8 voices of percussion means 8 trigger outs.
A sample playback module might be something worth having especially in the beginning. Something like the 1010 Music BitBox mkII might be prudent in the beginning. Each voice can be triggered with a patched trigger OR you can use MIDI (on 3.5mm TRS cables). You can sample in your own sounds or fill the SD card up with whatever else you want.
It can also record loops if you send it a clock or MIDI clock.