B - 共同研究できる研究室 / Laboratories That Can Collaborate Editorial by admin
GPT 5.2 HighOverview
For the set of research themes held by each laboratory, we count the number of laboratory pairs \((i<j)\) whose number of common themes is at least \(K\).
Analysis
The key challenge is efficiently counting the “number of common elements between two sets (size of the intersection).”
A naive approach would be to exhaustively check whether theme strings match for each pair \((i,j)\).
For example, comparing all themes \(M_i\) of laboratory \(i\) with all themes \(M_j\) of laboratory \(j\) using a double loop takes \(O(M_i M_j)\). Doing this for all pairs results in at most $\( O\left(\sum_{i<j} M_i M_j\right) \approx O(N^2 \cdot 50^2) \)$ which risks being barely feasible or TLE in Python (string comparison is also expensive).The solution approach consists of two key ideas:
- Convert theme strings to integer IDs to make comparisons lightweight (compress using a dictionary).
- Sort the theme IDs for each laboratory and count common elements using the two-pointer technique.
This allows counting in \(O(M_i + M_j)\) per pair.
Furthermore, by terminating early when the common count reaches \(K\), or when “even if all remaining elements match, we still can’t reach \(K\),” we eliminate unnecessary traversal.
Algorithm
Coordinate compression of themes (ID assignment)
- For all theme strings, assign a new integer ID to each string encountered for the first time.
- Each laboratory holds a “list of theme IDs.”
- Sort the ID list for each laboratory.
Check all pairs and count common elements using two pointers
- Let \(a, b\) be the sorted ID arrays of laboratories \(i\) and \(j\).
- Advance pointers \(p, q\) from the beginning:
- If \(a[p] == b[q]\), it’s a common theme, so increment the count and advance both pointers
- Otherwise, advance the pointer pointing to the smaller value (same idea as merging sorted sets)
- If the count reaches \(K\), this pair “can collaborate,” so add it to the answer and break.
- Also, if during the process $\( \text{cnt} + \min(\text{remaining }a,\ \text{remaining }b) < K \)\( then even continuing further cannot reach \)K$, so terminate early.
Exclude laboratories that are too small from the start
- Laboratories with \(M_i < K\) can never reach \(K\) common themes with anyone, so skip them.
- For pairs, also skip if \(\min(M_i, M_j) < K\).
(Example)
- \(K=2\), Laboratory A: {a,b,c}, Laboratory B: {b,c,d}
After ID assignment, comparing the sorted arrays reveals that b and c match, immediately reaching \(2\) → collaboration is possible.
Complexity
- Time complexity:
- ID assignment and sorting: \(\displaystyle O\left(\sum_i M_i \log M_i\right)\) (lightweight since each \(M_i \le 50\))
- Pair checking: worst case \(\displaystyle O\left(\sum_{i<j} (M_i+M_j)\right) = O(N^2 \cdot 50)\)
Overall approximately \(O(N^2 \cdot 50)\) (often reduced further by early termination)
- ID assignment and sorting: \(\displaystyle O\left(\sum_i M_i \log M_i\right)\) (lightweight since each \(M_i \le 50\))
- Space complexity:
- Laboratory ID arrays and dictionary: \(\displaystyle O\left(\sum_i M_i\right)\)
Implementation Notes
Input is read all at once using
sys.stdin.buffer.read().split()to ensure speed (theme strings are kept as bytes and used directly as dictionary keys for efficiency).Converting themes to integer IDs before processing makes comparisons lightweight.
The early termination condition during two-pointer traversal
cnt + min(la-p, lb-q) < K
means “even if all remaining elements match, we can’t reach \(K\),” preventing unnecessary traversal.Breaking immediately when the common count reaches \(K\) to confirm the pair makes the solution fast on average.
Source Code
import sys
def main():
data = sys.stdin.buffer.read().split()
it = iter(data)
N = int(next(it))
K = int(next(it))
theme_id = {}
nxt_id = 0
labs = []
for _ in range(N):
Mi = int(next(it))
ids = []
for _ in range(Mi):
w = next(it) # bytes
if w in theme_id:
ids.append(theme_id[w])
else:
theme_id[w] = nxt_id
ids.append(nxt_id)
nxt_id += 1
ids.sort()
labs.append(ids)
ans = 0
for i in range(N):
a = labs[i]
la = len(a)
if la < K:
continue
for j in range(i + 1, N):
b = labs[j]
lb = len(b)
if lb < K or (la if la < lb else lb) < K:
continue
p = q = cnt = 0
while p < la and q < lb:
# early stop if even taking all remaining can't reach K
if cnt + (la - p if la - p < lb - q else lb - q) < K:
break
ap = a[p]
bq = b[q]
if ap == bq:
cnt += 1
if cnt == K:
ans += 1
break
p += 1
q += 1
elif ap < bq:
p += 1
else:
q += 1
sys.stdout.write(str(ans))
if __name__ == "__main__":
main()
This editorial was generated by gpt-5.2-high.
posted:
last update: