動機

manga loader是神器,來試著加加新的網站

第一次: 從範例找使用方法

因為要加的是新網站,所以要先知道怎麼帶domain

先找有沒有domain,就會看到像下面

{
  name: 'geh-and-exh',
  match: "^https?://(e-hentai|exhentai).org/s/.*/.*",
  img: '.sni > a > img, #img',
  next: '.sni > a, #i3 a',
  numpages: 'div.sn > div > span:nth-child(2)',
  curpage: 'div.sn > div > span:nth-child(1)'
}

之後很幸運,有註解

Sample Implementation:
{
    name: 'something' // name of the implementation
  , match: "^https?://domain.com/.*" // the url to react to for manga loading
  , img: '#image' // css selector to get the page's manga image
  , next: '#next_page' // css selector to get the link to the next page
  , numpages: '#page_select' // css selector to get the number of pages. elements like (select, span, etc)
  , curpage: '#page_select' // css selector to get the current page. usually the same as numPages if it's a select element
  , numchaps: '#chapters' // css selector to get the number of chapters in manga
  , curchap: '#chapters' // css selector to get the number of the current chapter
  , nextchap: '#next_chap' // css selector to get the link to the next chapter
  , prevchap: '#prev_chap' // same as above except for previous
  , wait: 3000 // how many ms to wait before auto loading (to wait for elements to load), or a css selector to keep trying until it returns an elem
  , pages: function(next_url, current_page_number, callback, extract_function) {
    // gets called requesting a certain page number (current_page_number)
    // to continue loading execute callback with img to append as first parameter and next url as second parameter
    // only really needs to be used on sites that have really unusual ways of loading images or depend on javascript
  }

  Any of the CSS selectors can be functions instead that return the desired value.
}

所以看來只要帶css selector就好,雖然有看到其他範例是帶function,但這一次用不到所以放著

第二次: tracing javascript is painful

到了這回,同一個頁面居然有多個圖片!!

所以要trace裡面怎麼做的,從使用的地方開始會去找,這次要找function的範例

除了pages之外,function都只吃ctx,看了裡面的用法,ctx因該是dom 在看pages,有4個參數,之前的註解有說大概是什麼,但不清楚也看不懂

所以要回去看怎麼被call,

if (imp.pages) {
  imp.pages(url, curPage, addAndLoad, ex, getPageInfo);
} else {
  var colonIdx = url.indexOf(':');
  if(colonIdx > -1) {
    url = location.protocol + url.slice(colonIdx + 1);
  }
  xhr.open('get', url);
  imp.beforexhr && imp.beforexhr(xhr);
  xhr.onload = getPageInfo;
  xhr.onerror = function() {
    log('failed to load page, aborting', 'error');
  };
  xhr.send();
}

如果有給pages就call沒有就xhr,再去看onload

getPageInfo = function() {
  var page = d.body;
  d.body.innerHTML = xhr.response;
  try {
    // find image and link to next page
    addAndLoad(ex('img', imp.imgmod, page), ex('next', null, page));
  } catch (e) {
    if (xhr.status == 503 && retries > 0) {
      log('xhr status ' + xhr.status + ' retrieving ' + xhr.responseURL + ', ' + retries-- + ' retries remaining');
      window.setTimeout(function() {
        xhr.open('get', xhr.responseURL);
        xhr.send();
      }, 500);
    } else {
      log(e);
      log('error getting details from next page, assuming end of chapter.');
    }
  }
}

res往addAndLoad灌

addAndLoad = function(img, next) {
  if(!img) throw new Error('failed to retrieve img for page ' + curPage);
  updateStats();
  addImage(img, UI.images, curPage, function() {
          pagesLoaded += 1;
          updateStats();
  });
      if(!next && curPage < numPages) throw new Error('failed to retrieve next url for page ' + curPage);
      loadNextPage(next);
}

這邊的問題是img與next是什麼,如果有type應該就不會這麼麻煩了 他們是dom還是url,所以還要往下看,但不想再貼code,直接據透,是url

處理完type,再來是看到了只能一張圖的理由,加一張圖就下一頁,所以這邊要改,多一個flag去擋

addAndLoad = function(img, next, inc_page=true) {
  if(!img) throw new Error('failed to retrieve img for page ' + curPage);
  updateStats();
  addImage(img, UI.images, curPage, function() {
      if (inc_page) {
          pagesLoaded += 1;
          updateStats();
      }
  });
  if (inc_page) {
      if(!next && curPage < numPages) throw new Error('failed to retrieve next url for page ' + curPage);
      loadNextPage(next);
  }
}

之後把addAndLoad與pages的知識加起來,所以要自己寫個pages大概像

  1. 發xhr
  2. 把所有img拉出來
  3. 用addImage塞圖
  4. 最後一個再把flag丟回去
var getPageInfo2 = function(xhr,addAndLoad, img_css, next_css) {
    return function() {
        var ctx = document.implementation.createHTMLDocument();
        ctx.body.innerHTML = xhr.response;
        try {
            // find image and link to next page
            var nextUrl = getEl(next_css, ctx).href;
            var imgs = getEls(img_css, ctx).map(function(page) { return page.src; });
            for(let i=0;i<imgs.length-1;i++) {
                addAndLoad(imgs[i],'whatever',false);
            }
            addAndLoad(imgs[imgs.length-1],nextUrl,true);
        } catch (e) {
            if (xhr.status == 503 && retries > 0) {
                log('xhr status ' + xhr.status + ' retrieving ' + xhr.responseURL + ', ' + retries-- + ' retries remaining');
                window.setTimeout(function() {
                    xhr.open('get', xhr.responseURL);
                    xhr.send();
                }, 500);
            } else {
                log(e);
                log('error getting details from next page, assuming end of chapter.');
            }
        }
    };
};

{
    name: 'taotu55',
    match: "^https?://www.taotu55.net/w/.*/.*",
    img: 'body > div.bcen > div:nth-child(1) > div.content > img',
    next: 'body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:last-child > a',
    numpages: function(ctx) {
        var last = getEl('body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:nth-last-child(1) > a',ctx).text;
        if (!isNaN(last)) {
            return parseInt(last,10);
        } else {
            return parseInt(getEl('body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:nth-last-child(2) > a',ctx).text, 10);
        }
    },
    pages: function(url, num, cb, ex,idontcare,ctx) {
        var colonIdx = url.indexOf(':');
        var xhr = new XMLHttpRequest();
        if(colonIdx > -1) {
            url = location.protocol + url.slice(colonIdx + 1);
        }
        xhr.open('get', url);
        xhr.onload = getPageInfo2(xhr,cb, this.img, this.next);
        xhr.onerror = function() {
            log('failed to load page, aborting', 'error');
        };
        xhr.send();
    },
    curpage: 'body > div.bcen > div:nth-child(2) > div.NewPages > ul > li.thisclass > a'
}

心得

  1. 要抱著問題去追,不然會被雜訊干擾
  2. 一定要搞懂每個變數實際時的東西(type)
  3. 最後一定要能夠知道每個變數是怎麼關聯,為什麼是這個type,他們是怎麼完成要處理的問題(目的)
  4. 之後確定(堅定)好自己怎麼做,從之前看的知識組合解法,不然就是回去重新找哪邊有自己需要的資料(變數),再釐清新的架構
  5. 從起點到終點,如果找哪邊有自己需要的資料(變數)時,要先確認與起點(進入點)的距離(怎麼從進入到這裡),再看終點在哪!!

如果想要改Manga-Loader

簡單的case可以用css selector來完成,只要給

  1. img: 圖片
  2. next: 連結
  3. numpages: 數字
  4. curpage: 數字

有點複雜的case可以把上面的selector換成function,至於會餵什麼要去看extractInfo

主要的function call是 waitAndLoad -> loadManga -> addAndLoad -> loadNextPage -> getPageInfo -> addAndLoad ...

其中最重要的是loadNextPage中會看imp的pages在不在也就是像

{
    name: 'taotu55',
    match: "^https?://www.taotu55.net/w/.*/.*",
    img: 'body > div.bcen > div:nth-child(1) > div.content > img',
    next: 'body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:last-child > a',
    numpages: function(ctx) {
        var last = getEl('body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:nth-last-child(1) > a',ctx).text;
        if (!isNaN(last)) {
            return parseInt(last,10);
        } else {
            return parseInt(getEl('body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:nth-last-child(2) > a',ctx).text, 10);
        }
    },
    pages: function(url, num, cb, ex,idontcare,ctx) {
        // ...
    },
    curpage: 'body > div.bcen > div:nth-child(2) > div.NewPages > ul > li.thisclass > a'
}

就會把調用getPageInfoaddAndLoad交給pages的函數,這樣控制下一頁就是pages來做了,這樣應該可以處理許多case。