ще одна задачка
Mar. 19th, 2021 03:44 pmЯ тут пару разів писав всякі співбесідні штучки, і вони, o tempora, o, mores, викликали більше інтересу ніж майже будь яка інша тема, крім холіварів про дискримінацію і здорогвий глузд. Що ж, будемо популістами, додаю теґ про співбесіди і пощу ще одну задачку з минулорічних. На неї давали хвилин 5, ну, 10, якщо переписувати код, а не лише критикувати. А потім мені сказали, що лише двоє кандидатів нормально відповіли. (Нас і найняли, гиги.) Враховуючи, що це найперше завдання в довгому процесі відсіяння, навіть цікаво, мо, справді, хороша задачка?Enjoy, значиться...
Let’s say you’re running a query that looks like this:
WITH cte AS (
SELECT DISTINCT t1.x, t2.abc, t2.bcd, t2.cde,t2.def
FROM t1 LEFT JOIN t2 ON t1.x = t2.x
WHERE t1.y LIKE 'some_string%' AND SUBSTRING(t1.y, 1, 4) = 'some'
ORDER BY t1.x ASC, t2.abc DESC
)
SELECT COUNT(DISTINCT cte.x) AS num_x,
t3.some_date,
COUNT(DISTINCT t3.z)
FROM cte LEFT JOIN t3 ON cte.x = t3.x
WHERE LOWERCASE(t3.x) LIKE '%some_other_string%'
AND LOWERCASE(cte.x) NOT LIKE '%yet_another_string%'
GROUP BY t3.some_date
HAVING t3.some_date > some_start_date AND t3.some_date < some_end_date
ORDER BY num_x ASC
LIMIT 10
It is taking a really long time, and showing no signs of finishing even after you took a lunch break. You realize that t1, t2 and t3 all have several billion rows each. What would you potentially change to speed up the query, and why?
Let’s say you’re running a query that looks like this:
WITH cte AS (
SELECT DISTINCT t1.x, t2.abc, t2.bcd, t2.cde,t2.def
FROM t1 LEFT JOIN t2 ON t1.x = t2.x
WHERE t1.y LIKE 'some_string%' AND SUBSTRING(t1.y, 1, 4) = 'some'
ORDER BY t1.x ASC, t2.abc DESC
)
SELECT COUNT(DISTINCT cte.x) AS num_x,
t3.some_date,
COUNT(DISTINCT t3.z)
FROM cte LEFT JOIN t3 ON cte.x = t3.x
WHERE LOWERCASE(t3.x) LIKE '%some_other_string%'
AND LOWERCASE(cte.x) NOT LIKE '%yet_another_string%'
GROUP BY t3.some_date
HAVING t3.some_date > some_start_date AND t3.some_date < some_end_date
ORDER BY num_x ASC
LIMIT 10
It is taking a really long time, and showing no signs of finishing even after you took a lunch break. You realize that t1, t2 and t3 all have several billion rows each. What would you potentially change to speed up the query, and why?