malyj_gorgan | тут є любителі Скали

По роботі, через примху начальства, доводиться переписувати пару шматків коду з SQL на Скалу, тої Скали не знаючи зовсім. Але потрошки, потрошки... Проблема в тому, що це для внутрішнього нашого датастора, там компіляція/перевірка на помилки триває хвилину, а запуск найтривіальнішого коду займає хвилин 5-10, тому все помаленько дуже. Але, 80% уже готово...
...І раптом вперся в помилку, яку уже дві години не можу виправити. З того всього виглядає, що в dataframe.column.when() немає логічного "OR". Воно тупо ламається на значку "||"

Якщо це не баг, а фіча, поясніть мені, тупому, кому така фіча могла прийти в голову. І, основне, НАФІГА?

P.S. Цікаво, скільки з тих, хто користується Скалою, знає безсмертні рядки про
Лупайте сю скалу! Нехай ні жар, ні холод
Не спинять вас. Зносіть і труд, і спрагу, й голод
Бо вам призначено скалу сесю розбить.

Flat | Top-Level Comments Only

From:

juan_gandhi

Не, ну надо различать скалу и всякие библиотеки, от самых дебильных до самых продвинутых. Большой разброс. 5-10 хвилин - это точно как-то многовато. Я такое видел в одной дебильной конторе, в их дебильной джаве, когда одну и ту же апликацию инициализировали шесть раз за каким-то хреном... man, can we switch to English to discuss the problem? I'm really clueless about how to talk about all this in Russian or Ukrainian.

malyj_gorgan

Easy on the English part -- it's much easier for me professionally, too, that's part of the reason why I try to speak Ukrainian about data science whenever I can -- 'cause _somebody_ has got to do it, why not me :) But you are definitely not my target audience on the enforcing Ukrainian in professional settings :)

Scala here is one of the "allowed" languages (the other two being spark sql and pyspark) to access a big and ugly data warehouse. Really big and really-really ugly. One of the components of that ugliness is that you don't have a proper sandbox (unless you fully set one up yourself, which is beyond what I know how to do and care to do right), and accessing the actual warehouse can be done via big web-UI thing that start the batch job. Basically, they put you in a sequence of queues: first, some container job checks your dependencies, then, some other job looks for your permissions, then you actually wait for a cluster, competing with thousands if not tens of thousands of ongoing jobs... Hence the 5-10 mins -- that's what you get when trying to run stuff with lowest resource demand but without setting run priorities reserved for, well, high priority jobs.
Guess, that's the flip side of being a big company -- people have set rules that became immutable. At least, at my current employer's the most important skill here is knowing how to navigate constraints that are specific to this company and don't exist elsewhere. I once worked with a DS formerly from Facebook -- same thing, he seemed to be good at stuff that mattered at FB but didn't matter elsewhere.

I see. Shit happens. "Immutable rules" may be a big problem, but that's life.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Малий Ґорґан

тут є любителі Скали

тут є любителі Скали

no subject

no subject

no subject

Profile

June 2025

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags