I’m not a lawyer, but I know tech companies run social media platforms to create data models about users for ad platforms. It seems to me that they could attempt to integrate themselves into a fediverse network and still harvest data, and not even provide services. So perhaps a software license could require that content posted to the platform by users is by default licensed under CC-BY-SA-NC or something that would prevent this.
CC-BY-SA-NC blocks adapting or republishing for commercial purposes. There is no general legal mechanism to stop a corporation from downloading your data and using it internally in whatever way they wish, although GDPR and the california equivalent CCPA give people some additional rights here. Anyway all of this is moot with the advent of LLMs hoovering up data from wherever they can and ignoring all licensing then blending it all together in the “training” process so it can’t be deleted.