Official

B - 共同研究できる研究室 / Laboratories That Can Collaborate Editorial by admin

GPT 5.2 High

Overview

For the set of research themes held by each laboratory, we count the number of laboratory pairs \((i<j)\) whose number of common themes is at least \(K\).

Analysis

The key challenge is efficiently counting the “number of common elements between two sets (size of the intersection).”

  • A naive approach would be to exhaustively check whether theme strings match for each pair \((i,j)\).
    For example, comparing all themes \(M_i\) of laboratory \(i\) with all themes \(M_j\) of laboratory \(j\) using a double loop takes \(O(M_i M_j)\). Doing this for all pairs results in at most $\( O\left(\sum_{i<j} M_i M_j\right) \approx O(N^2 \cdot 50^2) \)$ which risks being barely feasible or TLE in Python (string comparison is also expensive).

  • The solution approach consists of two key ideas:

    1. Convert theme strings to integer IDs to make comparisons lightweight (compress using a dictionary).
    2. Sort the theme IDs for each laboratory and count common elements using the two-pointer technique.
      This allows counting in \(O(M_i + M_j)\) per pair.

Furthermore, by terminating early when the common count reaches \(K\), or when “even if all remaining elements match, we still can’t reach \(K\),” we eliminate unnecessary traversal.

Algorithm

  1. Coordinate compression of themes (ID assignment)

    • For all theme strings, assign a new integer ID to each string encountered for the first time.
    • Each laboratory holds a “list of theme IDs.”
    • Sort the ID list for each laboratory.
  2. Check all pairs and count common elements using two pointers

    • Let \(a, b\) be the sorted ID arrays of laboratories \(i\) and \(j\).
    • Advance pointers \(p, q\) from the beginning:
      • If \(a[p] == b[q]\), it’s a common theme, so increment the count and advance both pointers
      • Otherwise, advance the pointer pointing to the smaller value (same idea as merging sorted sets)
    • If the count reaches \(K\), this pair “can collaborate,” so add it to the answer and break.
    • Also, if during the process $\( \text{cnt} + \min(\text{remaining }a,\ \text{remaining }b) < K \)\( then even continuing further cannot reach \)K$, so terminate early.
  3. Exclude laboratories that are too small from the start

    • Laboratories with \(M_i < K\) can never reach \(K\) common themes with anyone, so skip them.
    • For pairs, also skip if \(\min(M_i, M_j) < K\).

(Example) - \(K=2\), Laboratory A: {a,b,c}, Laboratory B: {b,c,d}
After ID assignment, comparing the sorted arrays reveals that b and c match, immediately reaching \(2\) → collaboration is possible.

Complexity

  • Time complexity:
    • ID assignment and sorting: \(\displaystyle O\left(\sum_i M_i \log M_i\right)\) (lightweight since each \(M_i \le 50\))
    • Pair checking: worst case \(\displaystyle O\left(\sum_{i<j} (M_i+M_j)\right) = O(N^2 \cdot 50)\)
      Overall approximately \(O(N^2 \cdot 50)\) (often reduced further by early termination)
  • Space complexity:
    • Laboratory ID arrays and dictionary: \(\displaystyle O\left(\sum_i M_i\right)\)

Implementation Notes

  • Input is read all at once using sys.stdin.buffer.read().split() to ensure speed (theme strings are kept as bytes and used directly as dictionary keys for efficiency).

  • Converting themes to integer IDs before processing makes comparisons lightweight.

  • The early termination condition during two-pointer traversal
    cnt + min(la-p, lb-q) < K
    means “even if all remaining elements match, we can’t reach \(K\),” preventing unnecessary traversal.

  • Breaking immediately when the common count reaches \(K\) to confirm the pair makes the solution fast on average.

    Source Code

import sys

def main():
    data = sys.stdin.buffer.read().split()
    it = iter(data)
    N = int(next(it))
    K = int(next(it))

    theme_id = {}
    nxt_id = 0
    labs = []

    for _ in range(N):
        Mi = int(next(it))
        ids = []
        for _ in range(Mi):
            w = next(it)  # bytes
            if w in theme_id:
                ids.append(theme_id[w])
            else:
                theme_id[w] = nxt_id
                ids.append(nxt_id)
                nxt_id += 1
        ids.sort()
        labs.append(ids)

    ans = 0
    for i in range(N):
        a = labs[i]
        la = len(a)
        if la < K:
            continue
        for j in range(i + 1, N):
            b = labs[j]
            lb = len(b)
            if lb < K or (la if la < lb else lb) < K:
                continue

            p = q = cnt = 0
            while p < la and q < lb:
                # early stop if even taking all remaining can't reach K
                if cnt + (la - p if la - p < lb - q else lb - q) < K:
                    break
                ap = a[p]
                bq = b[q]
                if ap == bq:
                    cnt += 1
                    if cnt == K:
                        ans += 1
                        break
                    p += 1
                    q += 1
                elif ap < bq:
                    p += 1
                else:
                    q += 1

    sys.stdout.write(str(ans))

if __name__ == "__main__":
    main()

This editorial was generated by gpt-5.2-high.

posted:
last update: