動機
manga loader是神器,來試著加加新的網站
第一次: 從範例找使用方法
因為要加的是新網站,所以要先知道怎麼帶domain
先找有沒有domain,就會看到像下面
{
name: 'geh-and-exh',
match: "^https?://(e-hentai|exhentai).org/s/.*/.*",
img: '.sni > a > img, #img',
next: '.sni > a, #i3 a',
numpages: 'div.sn > div > span:nth-child(2)',
curpage: 'div.sn > div > span:nth-child(1)'
}
之後很幸運,有註解
Sample Implementation:
{
name: 'something' // name of the implementation
, match: "^https?://domain.com/.*" // the url to react to for manga loading
, img: '#image' // css selector to get the page's manga image
, next: '#next_page' // css selector to get the link to the next page
, numpages: '#page_select' // css selector to get the number of pages. elements like (select, span, etc)
, curpage: '#page_select' // css selector to get the current page. usually the same as numPages if it's a select element
, numchaps: '#chapters' // css selector to get the number of chapters in manga
, curchap: '#chapters' // css selector to get the number of the current chapter
, nextchap: '#next_chap' // css selector to get the link to the next chapter
, prevchap: '#prev_chap' // same as above except for previous
, wait: 3000 // how many ms to wait before auto loading (to wait for elements to load), or a css selector to keep trying until it returns an elem
, pages: function(next_url, current_page_number, callback, extract_function) {
// gets called requesting a certain page number (current_page_number)
// to continue loading execute callback with img to append as first parameter and next url as second parameter
// only really needs to be used on sites that have really unusual ways of loading images or depend on javascript
}
Any of the CSS selectors can be functions instead that return the desired value.
}
所以看來只要帶css selector就好,雖然有看到其他範例是帶function,但這一次用不到所以放著
第二次: tracing javascript is painful
到了這回,同一個頁面居然有多個圖片!!
所以要trace裡面怎麼做的,從使用的地方開始會去找,這次要找function的範例
除了pages之外,function都只吃ctx,看了裡面的用法,ctx因該是dom 在看pages,有4個參數,之前的註解有說大概是什麼,但不清楚也看不懂
所以要回去看怎麼被call,
if (imp.pages) {
imp.pages(url, curPage, addAndLoad, ex, getPageInfo);
} else {
var colonIdx = url.indexOf(':');
if(colonIdx > -1) {
url = location.protocol + url.slice(colonIdx + 1);
}
xhr.open('get', url);
imp.beforexhr && imp.beforexhr(xhr);
xhr.onload = getPageInfo;
xhr.onerror = function() {
log('failed to load page, aborting', 'error');
};
xhr.send();
}
如果有給pages就call沒有就xhr,再去看onload
getPageInfo = function() {
var page = d.body;
d.body.innerHTML = xhr.response;
try {
// find image and link to next page
addAndLoad(ex('img', imp.imgmod, page), ex('next', null, page));
} catch (e) {
if (xhr.status == 503 && retries > 0) {
log('xhr status ' + xhr.status + ' retrieving ' + xhr.responseURL + ', ' + retries-- + ' retries remaining');
window.setTimeout(function() {
xhr.open('get', xhr.responseURL);
xhr.send();
}, 500);
} else {
log(e);
log('error getting details from next page, assuming end of chapter.');
}
}
}
res往addAndLoad灌
addAndLoad = function(img, next) {
if(!img) throw new Error('failed to retrieve img for page ' + curPage);
updateStats();
addImage(img, UI.images, curPage, function() {
pagesLoaded += 1;
updateStats();
});
if(!next && curPage < numPages) throw new Error('failed to retrieve next url for page ' + curPage);
loadNextPage(next);
}
這邊的問題是img與next是什麼,如果有type應該就不會這麼麻煩了 他們是dom還是url,所以還要往下看,但不想再貼code,直接據透,是url
處理完type,再來是看到了只能一張圖的理由,加一張圖就下一頁,所以這邊要改,多一個flag去擋
addAndLoad = function(img, next, inc_page=true) {
if(!img) throw new Error('failed to retrieve img for page ' + curPage);
updateStats();
addImage(img, UI.images, curPage, function() {
if (inc_page) {
pagesLoaded += 1;
updateStats();
}
});
if (inc_page) {
if(!next && curPage < numPages) throw new Error('failed to retrieve next url for page ' + curPage);
loadNextPage(next);
}
}
之後把addAndLoad與pages的知識加起來,所以要自己寫個pages大概像
- 發xhr
- 把所有img拉出來
- 用addImage塞圖
- 最後一個再把flag丟回去
var getPageInfo2 = function(xhr,addAndLoad, img_css, next_css) {
return function() {
var ctx = document.implementation.createHTMLDocument();
ctx.body.innerHTML = xhr.response;
try {
// find image and link to next page
var nextUrl = getEl(next_css, ctx).href;
var imgs = getEls(img_css, ctx).map(function(page) { return page.src; });
for(let i=0;i<imgs.length-1;i++) {
addAndLoad(imgs[i],'whatever',false);
}
addAndLoad(imgs[imgs.length-1],nextUrl,true);
} catch (e) {
if (xhr.status == 503 && retries > 0) {
log('xhr status ' + xhr.status + ' retrieving ' + xhr.responseURL + ', ' + retries-- + ' retries remaining');
window.setTimeout(function() {
xhr.open('get', xhr.responseURL);
xhr.send();
}, 500);
} else {
log(e);
log('error getting details from next page, assuming end of chapter.');
}
}
};
};
{
name: 'taotu55',
match: "^https?://www.taotu55.net/w/.*/.*",
img: 'body > div.bcen > div:nth-child(1) > div.content > img',
next: 'body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:last-child > a',
numpages: function(ctx) {
var last = getEl('body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:nth-last-child(1) > a',ctx).text;
if (!isNaN(last)) {
return parseInt(last,10);
} else {
return parseInt(getEl('body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:nth-last-child(2) > a',ctx).text, 10);
}
},
pages: function(url, num, cb, ex,idontcare,ctx) {
var colonIdx = url.indexOf(':');
var xhr = new XMLHttpRequest();
if(colonIdx > -1) {
url = location.protocol + url.slice(colonIdx + 1);
}
xhr.open('get', url);
xhr.onload = getPageInfo2(xhr,cb, this.img, this.next);
xhr.onerror = function() {
log('failed to load page, aborting', 'error');
};
xhr.send();
},
curpage: 'body > div.bcen > div:nth-child(2) > div.NewPages > ul > li.thisclass > a'
}
心得
- 要抱著問題去追,不然會被雜訊干擾
- 一定要搞懂每個變數實際時的東西(type)
- 最後一定要能夠知道每個變數是怎麼關聯,為什麼是這個type,他們是怎麼完成要處理的問題(目的)
- 之後確定(堅定)好自己怎麼做,從之前看的知識組合解法,不然就是回去重新找哪邊有自己需要的資料(變數),再釐清新的架構
- 從起點到終點,如果找哪邊有自己需要的資料(變數)時,要先確認與起點(進入點)的距離(怎麼從進入到這裡),再看終點在哪!!
如果想要改Manga-Loader
簡單的case可以用css selector來完成,只要給
- img: 圖片
- next: 連結
- numpages: 數字
- curpage: 數字
有點複雜的case可以把上面的selector換成function,至於會餵什麼要去看extractInfo
主要的function call是
waitAndLoad -> loadManga -> addAndLoad -> loadNextPage -> getPageInfo -> addAndLoad ...
其中最重要的是loadNextPage
中會看imp
的pages在不在也就是像
{
name: 'taotu55',
match: "^https?://www.taotu55.net/w/.*/.*",
img: 'body > div.bcen > div:nth-child(1) > div.content > img',
next: 'body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:last-child > a',
numpages: function(ctx) {
var last = getEl('body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:nth-last-child(1) > a',ctx).text;
if (!isNaN(last)) {
return parseInt(last,10);
} else {
return parseInt(getEl('body > div.bcen > div:nth-child(2) > div.NewPages > ul > li:nth-last-child(2) > a',ctx).text, 10);
}
},
pages: function(url, num, cb, ex,idontcare,ctx) {
// ...
},
curpage: 'body > div.bcen > div:nth-child(2) > div.NewPages > ul > li.thisclass > a'
}
就會把調用getPageInfo
與addAndLoad
交給pages的函數,這樣控制下一頁就是pages來做了,這樣應該可以處理許多case。