Official

F - Balanced Rectangles Editorial by en_translator


This problem has a special constraint: \(H \times W \le 300000\).
This implies \(\min(H,W) \le \sqrt{300000} \approx 548\).
By transforming the input so that \(H \le W\), a solution with time complexity like \(O(H^2W)\) is likely to complete within the time limit, as plugging in the parameters yields a cost of \(1.7 \times 10^8\).

From now on, we assume that the input is transformed so that \(H \le W\).
We may rotate the grid by \(90\) degrees, or flip it as \(S'_{i,j} = S_{j,i}\). In any case, the answer remains the same.

First, fix the top and bottom edges of the rectangular region. There are \(O(H^2)\) choices. The problem can be solved if the problem for fixed upper and lower edges can be solved in \(O(W)\) time.

When the topmost and bottommost rows are \(u\) and \(d\), the problem is rephrased as follows:

Define \(C_j\) as follows:

  • Let \(G\) be the region consisting of \(u\)-th through \(d\)-th rows from the top, and \(1\)-st through \(W\)-th columns from the left.
  • Let \(C_j\) be (the number of # contained in the \(j\)-th column of \(G\)) \(-\) (the number of . contained in the \(j\)-th column of \(G\)).

Then the sought value is the number of pairs \((l,r)\) such that \(C_l+C_{l+1}+\dots+C_{r}=0\) and \(l \le r\).

This problem is nothing but Zero-Sum Ranges. However, solving in the same way as its editorial costs \(O(W \log W)\) time, for a total of \(O(H^2 W \log W)\) time, yielding a cost of \(1.5 \times 10^{9}\). Unless the constant factor is super lightweight, it is unlikely to finish within the execution time limit. (Conversely, for this amount of cost there is still hope that a very careful implementation may be accepted.)
How can we optimize it?

Here, notice the domain of \(C_l+C_{l+1}+\dots+C_{r}\). By definition of \(C_i\), the value always lies between \(-HW\) and \(HW\). When counting subarrays, we may avoid using a data structure like map or sort that is subject to a \(\log\), but simply use an array to eliminate this \(\log\).
Here, computing \(C_i\) for every step will impose additional cost, so you need to update \(C_i\) incrementally while iterating \(u\) and \(d\).

After getting rid of \(\log\), the time complexity is now \(O(H^2W)\) as desired initially, so the problem has been solved.

Sample code (C++):

#include<bits/stdc++.h>

using namespace std;

vector<string> flip(vector<string> &s){
  int H=s.size(),W=s[0].size();
  vector<string> res(W);
  for(int i=0;i<W;i++){
    for(int j=0;j<H;j++){
      res[i].push_back(s[j][i]);
    }
  }
  return res;
}

int main(){
  int t;
  cin >> t;
  while(t>0){
    t--;
    int H,W;
    cin >> H >> W;
    vector<string> s(H);
    for(auto &nx : s){cin >> nx;}
    if(H>W){s=flip(s);}
    H=s.size();
    W=s[0].size();

    int ofs=H*W;
    vector<int> bk(2*H*W+1,0);
    long long res=0;
    for(int u=0;u<H;u++){
      vector<int> C(W,0);
      for(int d=u;d<H;d++){
        for(int i=0;i<W;i++){
          if(s[d][i]=='#'){C[i]++;}
          else{C[i]--;}
        }
        int h;
        h=0;
        bk[h+ofs]++;
        for(int i=0;i<W;i++){
          h+=C[i];
          res+=bk[h+ofs];
          bk[h+ofs]++;
        }

        // reset bk
        h=0;
        bk[h+ofs]=0;
        for(int i=0;i<W;i++){
          h+=C[i];
          bk[h+ofs]=0;
        }
      }
    }
    cout << res << "\n";
  }
  return 0;
}

posted:
last update: