Easy on the English part -- it's much easier for me professionally, too, that's part of the reason why I try to speak Ukrainian about data science whenever I can -- 'cause _somebody_ has got to do it, why not me :) But you are definitely not my target audience on the enforcing Ukrainian in professional settings :)
Scala here is one of the "allowed" languages (the other two being spark sql and pyspark) to access a big and ugly data warehouse. Really big and really-really ugly. One of the components of that ugliness is that you don't have a proper sandbox (unless you fully set one up yourself, which is beyond what I know how to do and care to do right), and accessing the actual warehouse can be done via big web-UI thing that start the batch job. Basically, they put you in a sequence of queues: first, some container job checks your dependencies, then, some other job looks for your permissions, then you actually wait for a cluster, competing with thousands if not tens of thousands of ongoing jobs... Hence the 5-10 mins -- that's what you get when trying to run stuff with lowest resource demand but without setting run priorities reserved for, well, high priority jobs. Guess, that's the flip side of being a big company -- people have set rules that became immutable. At least, at my current employer's the most important skill here is knowing how to navigate constraints that are specific to this company and don't exist elsewhere. I once worked with a DS formerly from Facebook -- same thing, he seemed to be good at stuff that mattered at FB but didn't matter elsewhere.
no subject
Scala here is one of the "allowed" languages (the other two being spark sql and pyspark) to access a big and ugly data warehouse. Really big and really-really ugly. One of the components of that ugliness is that you don't have a proper sandbox (unless you fully set one up yourself, which is beyond what I know how to do and care to do right), and accessing the actual warehouse can be done via big web-UI thing that start the batch job. Basically, they put you in a sequence of queues: first, some container job checks your dependencies, then, some other job looks for your permissions, then you actually wait for a cluster, competing with thousands if not tens of thousands of ongoing jobs... Hence the 5-10 mins -- that's what you get when trying to run stuff with lowest resource demand but without setting run priorities reserved for, well, high priority jobs.
Guess, that's the flip side of being a big company -- people have set rules that became immutable. At least, at my current employer's the most important skill here is knowing how to navigate constraints that are specific to this company and don't exist elsewhere. I once worked with a DS formerly from Facebook -- same thing, he seemed to be good at stuff that mattered at FB but didn't matter elsewhere.