Метод python split, операции с текстом

В python split — метод, позволяющий оперировать выводом, в частности — извлечь часть строки, отделенную каким-то образом от остального содержимого. Это аналог awk в bash. Разделителем может быть запятая, двоеточие, пробел и т.п.

python split

Для примера рассмотрим строку с несколькими абстрактными значениями, разделенными запятыми.

> string_with_comas="thing1, thing2, thing3"

> string_with_comas.split(",")

['thing1', ' thing2', ' thing3']

После применения метода split данные представляют собой список (list).

>type(string_with_comas)

<class 'str'>

> type(string_with_comas.split(","))

<class 'list'>

Методу split передается разделитель — запятая,вертикальная черта, тире, двоеточие или что-то иное.

Если аргумент не указывать — разбиение произойдет по пробелам.

> string_with_comas.split()

['thing1,', 'thing2,', 'thing3']

> long_string="Filler text is text that shares some characteristics of a real written text, but is random or otherwise generated. It may be used to display a sample of fonts, generate text for testing, or to spoof an e-mail spam filter."

> long_string.split()

['Filler', 'text', 'is', 'text', 'that', 'shares', 'some', 'characteristics', 'of', 'a', 'real', 'written', 'text,', 'but', 'is', 'random', 'or', 'otherwise', 'generated.', 'It', 'may', 'be', 'used', 'to', 'display', 'a', 'sample', 'of', 'fonts,', 'generate', 'text', 'for', 'testing,', 'or', 'to', 'spoof', 'an', 'e-mail', 'spam', 'filter.']

Поскольку на выходе лист, с его элементами можно работать обращаясь к ним по индексу:

> long_string.split()[7]

'characteristics'

Метод splitlines

С многострочным текстом нужно работать иначе

>> long_string=''' ... Filler text is text that shares some characteristics of a real ... written text, but is random or otherwise generated. It may be used to display a ... sample of fonts, generate text for testing, or to spoof an e-mail spam filter.'''

split нужного результат не даст и чтобы получить list с отдельными словами требуется выполнить два действия:

Применить метод splitlines

> long_string.splitlines()

[», 'Filler text is text that shares some characteristics of a real', 'written text, but is random or otherwise generated. It may be used to display a', 'sample of fonts, generate text for testing, or to spoof an e-mail spam filter.']

На выходе будет list из содержимого строк

Далее каждая строка разбирается в цикле for

> for line in long_string.splitlines():
… print (line.split())

[]
['Filler', 'text', 'is', 'text', 'that', 'shares', 'some', 'characteristics', 'of', 'a', 'real']
['written', 'text,', 'but', 'is', 'random', 'or', 'otherwise', 'generated.', 'It', 'may', 'be', 'used', 'to', 'display', 'a']
['sample', 'of', 'fonts,', 'generate', 'text', 'for', 'testing,', 'or', 'to', 'spoof', 'an', 'e-mail', 'spam', 'filter.']

Про методы find и index в Python.