# WarcDB - Web crawl data as SQLite DB

> Clean Markdown view of GeekNews topic #6807. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=6807](https://news.hada.io/topic?id=6807)
- GeekNews Markdown: [https://news.hada.io/topic/6807.md](https://news.hada.io/topic/6807.md)
- Type: news
- Author: [xguru](https://news.hada.io/@xguru)
- Published: 2022-06-22T09:49:01+09:00
- Updated: 2022-06-22T09:49:01+09:00
- Original source: [github.com/Florents-Tselai](https://github.com/Florents-Tselai/WarcDB)
- Points: 14
- Comments: 0

## Topic Body

- 웹 크롤링한 데이터를 SQL로 쿼리하기 쉽게 만든 SQLite DB 기반 파일 포맷   
- wget 및 WebRecorder 등에서 사용하는 표준 Web ARChive (.warc) 파일을 .warcdb 로 import 가능   
- sqlite-utils 명령어를 그대로 사용   
```  
wget --warc-file tselai "https://tselai.com"  
warcdb import archive.warcdb tselai.warc.gz  
  
// 모든 reponse header 가져오기   
sqlite3 archive.warcdb <<SQL  
select  json_extract(h.value, '$.header') as header,   
        json_extract(h.value, '$.value') as value  
from response,  
     json_each(http_headers) h  
SQL  
```

## Comments



_No public comments on this page._
